Back to Search-quality baseline: prose / conceptual (Phase 1.5, v1.2.0 candidate)
Search-quality baseline: prose / conceptual (Phase 1.5, v1.2.0 candidate)

What This Audit Measures Vs What It Doesn't

Measures:

  • Strict programmatic-ground-truth match rate on top-3 (26.7%)
  • That the ranker often surfaces page tangentially related to the question (visible in the miss listing)
  • The methodology limit for class G specifically

Does not measure:

  • Whether the surfaced pages, taken together, would be useful for an LLM agent constructing a Swift code answer
  • Whether human-judged relevance differs materially from the regex
  • Whether re-running with broadened regex would substantially change the result (worth doing as a follow-up if the test is rerun later)