Mean Reciprocal Rank
Voorhees (1999), TREC-8 QA Report
Open citationThis is the Phase 1.8 version-to-version comparison KPI specified in issue #830, applied to the v1.1.0 → v1.2.0 jump. End-to-end measurement (binary + DB both swap between arms) so it captures the full user-felt delta, not a binary-held-constant or schema-held-constant slice.
Each card opens its own page. The headline and charts above are all you need at a glance; the cards are for the why and how.
Single-token canonical-lookup queries (Class A) + framework-root queries (Class B) from scripts/eval/search-quality-phase1.py's CANONICAL_QUERIES corpus.
Read details →Same caveats as search-quality-versiondiff-v1.0.2-to-v1.2.0.md:
Read details →flowchart TD Q["50 canonical-lookup query strings<br/>(Class A + B, in-source)"]:::input R["per-query regex pattern<br/>apple-docs://<framework>/<concept>($|/...)"]:::input Q --> H[scripts/eval/search-quality…
Read details →Every metric and method this audit relies on, with a link to the foundational source. Auto-collected from the audit text.
Voorhees (1999), TREC-8 QA Report
Open citationManning, Raghavan, Schütze (2008) IIR §8.4
Open citationJärvelin & Kekäläinen (2002)
Open citationWilcoxon (1945), Biometrics Bulletin
Open citationMcNemar (1947), Psychometrika
Open citation