Following the feedback_code_changes_as_ideas_for_future rule:
- Broaden the harness's regex. Many of the misses above are arguably-acceptable results the regex rejected. A second pass with looser per-query patterns would tighten the methodology and give a more honest number. Future audit work.
- Add a
--profile proseranking mode. A user (or agent) issuing a prose query could opt into BM25F weights that favourcontentovertitle/symbols. Not a default-behavior change; an opt-in. - Phase 2 pooled human judgments specifically for prose. This is the design's own recommendation; would replace the regex with TREC-style qrels for the 15 queries here. Cost: a few hours of human time.
None proposed as immediate work.