Back to dashboard
`docs/design/search-quality-eval.md` Phase 1 (single-system mode, no Phase 2 human judging)

Search-quality baseline: v1.2.0 candidate `search.db`

This audit records the v1.2.0 candidate database's standing on Criterion 1 (good search) restricted to query classes A (canonical lookup) and B (framework-root) per the design's §1.4 taxonomy. It is an absolute baseline; future ranking changes are measured against this single-system snapshot using the same harness in paired mode. The classes C-H from the taxonomy are out of scope per design §3 (NG6).

Measured 2026-05-20·Strong

Read in detail

Each card opens its own page. The headline and charts above are all you need at a glance; the cards are for the why and how.

Sources cited in this measurement

Every metric and method this audit relies on, with a link to the foundational source. Auto-collected from the audit text.

P@k (Precision at k)

Manning, Raghavan, Schütze (2008) IIR §8.4

Open citation

Mean Reciprocal Rank

Voorhees (1999), TREC-8 QA Report

Open citation

Reciprocal Rank Fusion (k=60)

Cormack, Clarke, Büttcher (2009), SIGIR

Open citation

Wilcoxon signed-rank test

Wilcoxon (1945), Biometrics Bulletin

Open citation