Grep-first was the wrong instinct — Blog

Two weeks ago a routine `just ctx-browse` blew up on me.

the failing command

Loading context docs from "~\stardust-engine\docs/context"...
Failed to read index.json: The system cannot find the file specified. (os error 2)
error: process didn't exit successfully: `target\debug\context_browser.exe` (exit code: 1)
error: Recipe `ctx-browse` failed on line 29 with exit code 1

Named file. Named binary. Named recipe. Looks like a broken script. My first instinct was to debug it as one — open the justfile, find the recipe, grep for `index.json`, trace what produces it.

That instinct cost me five tool calls and was heading for another five before I caught myself. The answer was sitting in my own context docs the whole time and would have been a two-call retrieval. This is a post-mortem of that detour — what each path costs, what each returns, and the heuristic I should have already internalized.

What I actually ran

Before I caught the mistake, here's what my agent ran, in order:

1. — `ls docs/context/ | head -30` — confirm the directory, check for `index.json`. Missing.
2. — `ls justfile Justfile` — locate the recipe file.
3. — `Grep ctx-browse|context_browser` in `justfile` — find the recipe body. It runs `cargo run -p claude-tools --bin context_browser`.
4. — `Grep index\.json|load_index|read_index` in `context_browser.rs` — find where the binary expects the file. Line 256 in a function called `load_graph_data`.
5. — `ls meta-index.json && git log -- docs/context/index.json` — check whether the file was once tracked.

Five calls in, I had the local mechanics but none of the story. If I had continued — and I was about to — the minimum remaining calls to reach the same architectural picture:

6. — `git log --diff-filter=D -- docs/context/index.json` to find the deletion commit.
7. — `git show <sha>` to read the deletion commit message.
8. — `Grep "index.json"` repo-wide to find any remaining producer.
9. — Search for the now-deleted binary that used to generate the file. Grep + `git log` again.
10. — Read whatever README or CLAUDE.md fragment finally explained the architectural shift.

Call ten, best case. And I would still only know **what changed** — not **what replaced it**, or **why the new system is shaped the way it is**. That context lives somewhere no `git log` line has ever lived: in human-written docs that explain intent, not delta.

What two MCP calls gave me instead

After catching the mistake I went back to the retrieval path:

two calls, ~8K tokens

context_browse "context browser index.json generation docs/context"
  → 15 ranked sections with trigger phrases (~2K tokens)

context_fetch "context browser binary index.json visual graph D3 ctx-browse, JSON docs compile to Lance table, lancedb context engine key files"
  → ~12 sections of full content + 1-hop graph expansion (~6K tokens)

That's it. Two calls — plus a one-time schema warmup for the MCP tools. ~8K tokens total. What came back:

The editorial fact I could not have grepped. — *"The old `context_build` standalone binary has been removed."* Written by a human, explaining intent, not derivable from a diff without the surrounding architectural narrative.
The replacement architecture in one shot. — `stardust-context-server` on port 3031 with LanceDB + BM25 full-text search + vector ANN + RRF reranker. `stardust-context-client` as the thin HTTP/MCP frontend. `meta-index.json` as the surviving cluster-assignment file. Role of each component spelled out.
A third remediation option I would not have proposed. — Repoint the D3 browser at the running `context_server`'s HTTP API instead of expecting a file on disk. From a bash-only investigation I'd have offered only "regenerate `index.json`" or "delete the recipe." The docs knew there was a better architecture hiding in the question.
Adjacent knowledge for free. — The rebuild command (`cargo run -p stardust-context-client -- rebuild`), the LanceDB table location (`.context_db/context_sections.lance`), the fact that 200+ curated JSON docs and 6,000+ hand-authored trigger phrases back the retrieval.

Grep could have told me what `index.json` **was**. It could not have told me what the system **is now**. That distinction is the whole argument.

The numbers, side by side

tool-call cost comparison

path                                    calls       tokens     reasoning hops to diagnosis
--------------------------------------  ----------  ---------  ---------------------------
Bash / Grep / Read (first instinct)     ~8–12       5–15K      ~6–10
context_browse + context_fetch          2 (+1 warm) ~8K        2

What I actually did: 5 grep/bash calls, then 2 MCP calls after I caught the mistake, then synthesis. ~7 calls instead of ~4, plus the cost of reconstructing my mental model midway through.

What I got wrong, diagnosed

The symptom looked **local and concrete** — a missing file, a known recipe, a known binary. I pattern-matched to "debug a broken script" rather than "understand a subsystem." My own project's CLAUDE.md says browse-first for any codebase question. I implicitly excused myself because the error message looked self-explanatory.

It wasn't. The reason `index.json` is missing is an **architectural fact** — the file's producer was deliberately removed during a system migration — not a local bug. Any task whose root cause is *"the system around this code changed"* is a context-query task, even when it presents as a one-line error.

There's a conditioning trap here that I suspect applies to most engineers who got fluent on grep before they got fluent on retrieval: an error that names a specific file and a specific binary feels like it has a local fix. But the question *"why was this file expected and why is it gone"* has no local answer. It lives in the architecture story.

Honest cons of context-query

Before this reads as a sales pitch, the limits matter.

Docs lag disk state. — The context engine describes the world as documented, not as currently in the filesystem. If someone committed an hour ago and hasn't updated the relevant context doc, the engine will confidently tell you the old truth.
Coverage is uneven. — Newly added or undocumented systems won't show up. Trigger phrases are authored by hand; if nobody wrote one, queries miss. The engine is only as good as the docs behind it.
It can't read a specific file for you. — It gives you pointers, not current contents. For "what does this exact line do right now," grep and read are still correct.

The right play is a **hybrid in the opposite order from the one I ran**:

1. — `context_browse` + `context_fetch` first — learn the system. (~8K tokens, 2 calls.)
2. — One targeted Grep + Read afterwards — confirm the specific failing call site is what the architecture story says it should be. (~1-2K tokens, 1-2 calls.)

Total: ~3-4 calls, ~10K tokens, complete diagnosis with three informed remediation options. Half what I actually spent, with the overarching "what moved and why" context that grep-first would have missed entirely.

How the retrieval layer actually works

Briefly, because it matters for why this wins.

200+ structured JSON docs — in `docs/context/`, each with sections, each section with authored triggers. Authoring format is the source of truth — LanceDB tables are compiled from these on rebuild.
BM25 full-text search on triggers — — not on content, not on filenames. Triggers are hand-written phrases like "move entity to position" that pack synonym groups ("change modify velocity speed") into a single trigger string. They are the primary routing surface.
Vector ANN on trigger embeddings — via AllMiniLmL6V2 — catches queries whose vocabulary doesn't overlap with the hand-written triggers.
RRF reranker — merges FTS + vector results. Sections appearing in both lists get boosted.
1-hop graph expansion — — after the top-N direct hits, follow each doc's hand-curated `related` field, pull the Overview section of each neighbor, append as `=== RELATED (1 hop) ===`.
Section-level retrieval, not chunk-level. — Each section is one authored unit. No sliding-window chunks that slice a concept in half.

None of this is novel individually. The combination matters. BM25 + vector + RRF + section-level is a 2024 hybrid-search playbook. What makes it work on a 136K-line Rust codebase is the 6,000+ human-authored trigger phrases, because **the query vocabulary usually doesn't match the code vocabulary**. A human asking "why is this broken" doesn't type the symbol names. Triggers bridge the gap; embeddings catch the rest.

The heuristic

If the plausible answer to *"why is this broken?"* is *"because something else moved,"* browse first.

Local-only errors — typo, off-by-one, null deref in code I'm currently editing — stay grep-first.

The tell: **would the explanation make sense to someone who only sees this one file?** If no, browse. If yes, grep.

In my failing `ctx-browse` case, no explanation of `Failed to read index.json` makes sense without the LanceDB migration context. That was the signal I missed.

Why this is worth writing down

I suspect the grep-first instinct is common and under-examined. We spend so long training those reflexes that even an error message pointing at a file in a directory called `docs` feels like it has a local fix. The framing I needed was different: every retrieval question has a disk layer and a design layer, and most debugging sessions start on the wrong one.

The context engine described here runs locally as part of the Stardust Engine project — a 136K-line Rust game engine where autonomous agents plan, implement, review, and merge changes against production code.