Back to Search-quality baseline: cross-source canonical (Phase 1.2, v1.2.0 candidate)
Search-quality baseline: cross-source canonical (Phase 1.2, v1.2.0 candidate)

Method Recap

25 (query, expected_source, rationale) triples. For each: run cupertino search "<query>" --limit 10, extract top-1 URI, derive its source via the URI prefix, compare against expected. Aggregate: count matches; binomial test on the subset where expected source IS present in top-10.

Harness source: /tmp/cupertino-search-eval-crosssource.py (not yet versioned in repo). Full JSON dump (all 25 top-10 lists + sources): /tmp/cupertino-search-eval-crosssource-20260520.json.