What This Baseline Does Not Measure · Search-quality baseline: v1.2.0 candidate `search.db`

Per docs/design/search-quality-eval.md §1.5 (the two-criteria framing):

Criterion 1 classes C-H (acronym, CamelCase fragment, deprecation-aware, cross-source canonical, prose, symbol-attribute). Each needs its own corpus and metric.
Criterion 2 (anti-hallucination): does an LLM agent given cupertino's top-K results actually produce correct Swift? This is the actual success measure; this baseline is at best a precondition. The Phase 1.7 agent-eval (design §14.4, not yet written) is where Criterion 2 gets measured.

A MRR-0.9467 baseline is necessary but not sufficient for high-quality agent grounding. An agent can still hallucinate even when the right doc is at rank 1.