`docs/design/search-quality-eval.md` Phase 1 (Class A canonical lookup + Class B framework-root, …

Search-quality version diff: v1.1.0 → v1.2.0

This is the Phase 1.8 version-to-version comparison KPI specified in issue #830, applied to the v1.1.0 → v1.2.0 jump. End-to-end measurement (binary + DB both swap between arms) so it captures the full user-felt delta, not a binary-held-constant or schema-held-constant slice.

Measured 2026-05-21·Strong

Headline result

+20 / 50 queries newly rank-1

Method & sourceMean Reciprocal RankVoorhees (1999), TREC-8 QA Report

Read in detail

Each card opens its own page. The headline and charts above are all you need at a glance; the cards are for the why and how.

Method Recap

Single-token canonical-lookup queries (Class A) + framework-root queries (Class B) from scripts/eval/search-quality-phase1.py's CANONICAL_QUERIES corpus.

Read details →

What This Measurement Does Not Capture

Same caveats as search-quality-versiondiff-v1.0.2-to-v1.2.0.md:

Read details →

Pipeline

flowchart TD Q["50 canonical-lookup query strings<br/>(Class A + B, in-source)"]:::input R["per-query regex pattern<br/>apple-docs://<framework>/<concept>($|/...)"]:::input Q --> H[scripts/eval/search-quality…

Read details →