P@k (Precision at k)
Manning, Raghavan, Schütze (2008) IIR §8.4
Open citationThis audit records the v1.2.0 candidate database's standing on Criterion 1 (good search) restricted to query classes A (canonical lookup) and B (framework-root) per the design's §1.4 taxonomy. It is an absolute baseline; future ranking changes are measured against this single-system snapshot using the same harness in paired mode. The classes C-H from the taxonomy are out of scope per design §3 (NG6).
Each card opens its own page. The headline and charts above are all you need at a glance; the cards are for the why and how.
P@5 looks low next to MRR. Reason: the 50 queries each have exactly one canonical right answer in this design, so P@5 has a ceiling of 0.2 per query if at most one match is in the top 5.
Read details →The four queries that did not yield a top-1 match. Each is informative.
Read details →50 canonical-lookup queries each paired with a right-answer URI regex. For each query, cupertino search "<query>" --limit 10 was invoked via the develop-tip binary with cupertino.config.json set to baseDirectory: ~/.cupe…
Read details →Per docs/design/search-quality-eval.md §1.5 (the two-criteria framing):
Read details →When evaluating a future ranking change (BM25F weight tweak, new tokenizer, schema change), re-run the same 50-query corpus on both the unchanged binary/DB and the changed binary/DB, use the paired-comparison mode (/tmp/…
Read details →Every metric and method this audit relies on, with a link to the foundational source. Auto-collected from the audit text.
Manning, Raghavan, Schütze (2008) IIR §8.4
Open citationVoorhees (1999), TREC-8 QA Report
Open citationJärvelin & Kekäläinen (2002)
Open citationCormack, Clarke, Büttcher (2009), SIGIR
Open citationWilcoxon (1945), Biometrics Bulletin
Open citation