| Metric | Value |
|---|---|
| N queries | 50 |
| Right answer at rank 1 (P@1 perfect) | 46 / 50 (92%) |
| Right answer not in top 10 | 1 / 50 |
| MRR | 0.9467 |
| P@1 | 0.9200 |
| P@5 | 0.3280 |
| NDCG@10 | 1.7385 |
P@5 looks low next to MRR. Reason: the 50 queries each have exactly one canonical right answer in this design, so P@5 has a ceiling of 0.2 per query if at most one match is in the top 5. The observed 0.328 reflects queries whose right-answer regex also matches lower-ranked sibling URIs (e.g., the framework-root patterns like apple-docs://swiftui($|/[^/]*$) legitimately match many pages). The metric is correctly reported but it is not the headline number.
NDCG@10 > 1 is possible here for the same reason (multi-match patterns sum gains). Per design §8.2 this is a known accounting quirk and the metric remains useful for paired comparison, just not as an absolute on the [0,1] scale.
Headline number is MRR = 0.9467. A new ranking change has to maintain or improve this on the same 50-query corpus to claim no regression.