| Baseline | Class | Headline |
|---|---|---|
search-quality-baseline-v1.2.0.md | A + B | MRR 0.9467, P@1 perfect 46/50 |
search-quality-deprecation-baseline-v1.2.0.md | E | Swift 30/30, p = 0.0078 |
search-quality-crosssource-baseline-v1.2.0.md | F | 19/19 conditional, p = 1.9 × 10⁻⁶ |
search-quality-fragment-baseline-v1.2.0.md | D | P@1 = 1.0, P@5 = 0.92 |
search-quality-acronym-baseline-v1.2.0.md | C | 4/22 (18%); mechanism not effective |
search-quality-prose-baseline-v1.2.0.md (this doc) | G | 4/15 any-top-3 (26.7%) strict; estimated 8-10/15 (53-67%) human-adjusted. Hardest class to evaluate programmatically; honest measurement requires Phase 2 human qrels. |
Six of eight Phase 1.x classes from §1.4 now have documented baselines. One remains: H (symbol-attribute). Plus Phase 1.7 (anti-hallucination agent-end-to-end).