Aggregate · Search-quality baseline: acronym / synonym recall (Phase 1.4, v1.2.0 candidate)

Metric	Value
Synonyms tested	22
Canonical framework at top-1	4 / 22 (18.2%)
Canonical in top-5	11 / 22
Canonical in top-10	13 / 22
Canonical missing from top-10	9 / 22
MRR	0.2562

Binomial test (top-1 == canonical vs chance 0.5): k = 4 of n = 22, one-sided p = 0.9996. The null hypothesis (synonyms-mechanism has no effect beyond chance) is not rejected; if anything, the data is consistent with the synonyms-mechanism actively producing worse results than random for top-1 retrieval.

Compare to the other class baselines on the same DB:

Class	Top-1 hit rate
A canonical lookup (`search-quality-baseline`)	46/50 (92%)
E deprecation-aware (`search-quality-deprecation`)	30/30 (100%)
F cross-source canonical (`search-quality-crosssource`)	19/19 conditional (100%)
D CamelCase fragment (`search-quality-fragment`)	20/20 (100%) at any-match
C acronym / synonym (this doc)	4/22 (18%)

This is the worst-performing class baseline by a wide margin.