Skip to content

Context Management — Cross-Tool Synthesis

Context Management — Cross-Tool Synthesis

Section titled “Context Management — Cross-Tool Synthesis”

Research synthesis across all analyzed tools and papers. Updated as individual ANALYSIS-*.md files are added and promoted.


Populated as analyses are added.

Tool / PaperApproachCompressionToken budget modelBenchmarksNotesOverlap & recommendation
context-modeMCP-layer output interception + FTS5 knowledge base95–100% (summarization, verified); 44–93% (retrieval, as reported)Implicit: agent selects toolPartially verified; cold start 1–4s/call undisclosedPreCompact hook extends session ~30 min → ~3 hr (as reported); ELv2 licenseNo direct peer for MCP output sandboxing. Overlaps n2-arachne on budget enforcement — prefer this (better license, two-speed retrieval). Pair with codebase-memory-mcp for structural navigation.
codebase-memory-mcpAST-to-SQLite knowledge graph; structural graph queries replace file reads~90–99% vs grep (directional; 5 live queries ~1,095 tokens verified)None — result set size is the boundNo runnable harness; live queries verifiedDynamic language edges heuristic; no auth on MCP/UI; MITOverlaps code-review-graph, codegraph, jcodemunch-mcp (AST-graph family). Prefer code-review-graph for breadth (22 tools, community detection). Use this for pure-SQLite graph with no Python/Docker dependency.
code-review-graphTree-sitter AST → SQLite; blast-radius + community detection + hybrid search; 22 tools8.2× average (as reported, range 0.7×–16.4×); 49× “daily tasks” unverifiedNone — result set sizeevaluate/ runner exists; not reproduced; MRR 0.35 (stated, low)7,624 stars; Python 3.10+; active community; MITBest of the AST-graph family for breadth. Overlaps codebase-memory-mcp, codegraph, jcodemunch-mcp. Prefer over codegraph (no README integrity issue) and jcodemunch-mcp (MIT vs non-OSI, richer toolset).
codegraphTree-sitter AST → SQLite; single codegraph_explore blast-radius tool94% fewer tool calls / 77% faster (as reported, own eval runner — unverified)None — traversal result set variesevaluation/runner.ts exists; not reproduced; 8.2× table is CRG’s dataWASM bundled; zero native deps; README integrity issue; 412 stars; MITOverlaps code-review-graph and jcodemunch-mcp. Prefer code-review-graph unless zero-dep WASM bundle is a hard requirement. README integrity issue (benchmark table copied from CRG) warrants caution.
graphifyPrompt-orchestrated multi-modal knowledge graph (skill.md drives Python CLI); Tree-sitter AST + LLM semantic extraction; Leiden community detection71.5× token reduction (as reported; single curated 52-file corpus; extreme baseline)None — graph query cost vs raw file readsNo standalone harness; computed inline during /graphify runsMulti-modal: code + PDF + image + video; persistent graph.json; 7-tool MCP server mode; 3.7k+ stars; MITOverlaps code-review-graph and codebase-memory-mcp for code graph building. Unique for mixed-media corpora (code + PDFs + images). Prefer graph tools for pure-code use cases; prefer graphify only if multi-modal ingestion is required.
Understand-AnythingMulti-agent LLM pipeline → structural + domain graph dashboardN/A — developer comprehension focus; no token reduction claimNoneNone documented8,081 stars; TypeScript/Node.js; MITNo overlap with token-reduction tools — orthogonal value proposition. Choose only if the goal is domain mapping and comprehension, not context compression.
git-semantic-bunLocal vector index over git commit messagesN/A — retrieval, not summarizationNonegsb benchmark requires user-provided queries; no published figuresNo MCP; pre-stable; 3 stars; MITNo overlap — semantic git-history search is unique in this survey. Use only if querying commit history by meaning is the specific requirement.
qmd8-step hybrid query: BM25 probe → LLM query expansion → vec search → RRF fusion → chunk selection → reranking → score blend → dedupN/A — retrieval, not output compressionImplicit: caller sets result limitFull qmd bench harness; no published results; vitest eval suite (6 docs, 24 queries)20.3k stars; custom 1.7B query expansion model (no training artifacts); no HTTP auth; MITOverlaps jdocmunch-mcp for markdown section retrieval. Prefer this for dynamic query workloads — most sophisticated pipeline in the survey. Prefer jdocmunch-mcp only for O(1) access to known-section structured docs (and only if non-commercial license is acceptable).
cavemanClaude Code skill enforcing caveman-speak output + compress sub-tool for explicit compression~75% output-token reduction; ~45% input-token reduction (as reported; updated from 65% triage)Implicit: agent output style + budgetOffline evals/measure.py against committed snapshot (10 prompts, reproducible); benchmarks/run.py requires API key6 intensity levels incl. Wenyan variants; auto-clarity escape for security warningsNo direct peer — output-style compression is unique in this survey. Use when output verbosity is the token budget bottleneck, not input context size. Complementary (not competing) with all input-side tools.
n2-arachneMCP server assembling token-budgeted payloads with fixed % allocations (10% structural / 30% dep / 40% semantic / 20% recent)Budget enforced (chars/3.5 heuristic — not a real tokenizer)Explicit: fixed % allocations across 4 layersNone — test script is a placeholder; CHANGELOG references missing harness fileNon-commercial-only license (NOASSERTION SPDX); 19.9× speedup headline describes non-hot-path functionOverlaps context-mode on budget enforcement. Prefer context-mode (ELv2 vs non-commercial, verified savings). Use n2-arachne only if the fixed-% allocation model is specifically required and non-commercial terms are acceptable.
jdocmunch-mcpSection-level markdown indexing; O(1) byte-offset retrieval110× byte reduction on structured docs (as reported; bytes not tokens; savings accounting flaw)None — returns matched sectionsNo harness; 3 narrative case studies generated by ClaudeOpt-out telemetry to j.gravelle.us; v1.7.1; non-commercial dual license ($79–$1,999 tiers)Overlaps qmd for markdown retrieval. Prefer qmd for general workloads (MIT, richer pipeline). Use this for O(1) access to stable, known-section documents only; non-commercial license and telemetry are material risks.
serenaLSP-backed symbol-path retrieval + editing; two backends (LSP + JetBrains); progressive fallback on oversized resultsNot quantified — qualitative onlyImplicit: _limit_length + shortened_result_factories progressive fallbackNone (analytics.py tracks usage; no baseline comparison)~30 tools; 55 LSP language servers; novel fallback mechanism; flat markdown memory (no TTL/search)No direct peer for LSP-backed symbol editing. Orthogonal to graph tools (editing precision vs graph traversal). Prefer for multi-language refactoring workflows where symbol-level accuracy matters; no overlap with context-mode or rtk.
jcodemunch-mcpTree-sitter AST → SQLite WAL; exact byte-span retrieval; 9 MCP tools95% token reduction (as reported; 3 small repos; range 79.7–99.8%)None — result set sizebenchmarks/harness/run_benchmark.py runnable; not independently reproducedNon-OSI license (paid commercial use); optional AI summarization sends code to external APIsOverlaps code-review-graph and codegraph (AST-graph family). Prefer code-review-graph (MIT, 22 tools, broader benchmark). Use this only if byte-span precision is required and a paid non-OSI license is acceptable.
rtkClaude Code hook-based CLI proxy; two-track filter pipeline (69 Rust handlers + 58 TOML filters)60–90% on dev commands (as reported; chars/4 heuristic)None — passthrough proxyscripts/benchmark.sh runnable; live fixtures; 80% improvement CI gatev0.35.0; Apache-2.0; TOML filter correctness enforced at compile timeNo direct peer — transparent CLI-proxy hook architecture is unique. Use if goal is passive token reduction on Claude Code dev commands without changing agent behavior. Complementary to all MCP-layer tools.
socraticodeQdrant-backed hybrid search (dense + BM25 via Qdrant RRF); AST-aware chunking (18+ languages); polyglot dependency graph61.5% (as reported; bytes not tokens; single live session; no harness)NoneNo harness; 3 narrative case studiesDocker required; Qdrant v1.15.2+ required; MITOverlaps codebase-memory-mcp and code-review-graph for code search. Prefer code-review-graph (no Docker/Qdrant dep, stronger benchmarks). Use socraticode only if Qdrant is already in the infrastructure stack and RRF hybrid search is needed.
sdl-mcpLadybugDB knowledge graph; Symbol Cards (~100 tokens/symbol, LLM summary, ETag re-fetch); Iris Gate Ladder 4-rung escalation; Delta Packs blast-radius; SCIP compiler-grade edges; 38 tool surfaces81% tools/list overhead (gateway mode, as reported); no end-to-end session figureImplicit: Iris Gate Ladder prompts cheapest-first retrieval; max-cards on slicesNo harness; no end-to-end benchmark; all figures author-runLLM summary cost at index time undisclosed; LadybugDB opaque (no schema/SQL/Cypher); source-available license; 12 languages (Rust indexer); 125 starsWatch. Iris Gate Ladder + ETag conditional re-fetch are architecturally novel — the most disciplined context-escalation model in this survey. But no end-to-end token savings figure exists, LadybugDB is opaque, and LLM index-time costs are undisclosed. Overlaps codebase-memory-mcp and code-review-graph for graph-based code intelligence; prefer those (MIT, transparent storage, broader languages) until SDL-MCP provides a reproducible end-to-end benchmark and documents summary costs.
osgrepnpm CLI; LanceDB vector store; Granite 30M dense + mxbai ColBERT 17M (int8) late-interaction reranking; tree-sitter AST chunking; FTS + vector → RRF → two-stage ColBERT pipeline~20% cost reduction / ~30% speedup (as reported; 10-query, single-codebase CSV; no answer quality assessment)None — result set size is the bound10-query CSV (opencode corpus, cost-only); internal MRR harness (eval.ts, 70+ cases, self-referential); not reproducedMCP server is a non-functional stub as of commit 9f2faf7; last push 2026-01-17; 1,128 stars; Apache-2.0Overlaps socraticode and codebase-memory-mcp for semantic code search. Unique: richest hybrid retrieval pipeline in the survey (dense + FTS + two-stage ColBERT) in a zero-external-service npm package. MCP stub rules out MCP-framework integration currently. Prefer socraticode if Qdrant is already in the stack and RRF hybrid search is sufficient; prefer codebase-memory-mcp if graph queries are needed. Use osgrep if the call-stack trace + skeleton compression commands are the target use case or if a pure-npm zero-dep installation is required.

Populated as analysis matures.


  1. context-mode — read first; establishes the MCP-layer interception pattern and the two-speed retrieval distinction (summarization vs searchable index) that all subsequent tool comparisons should reference.
  2. codebase-memory-mcp — read second; graph queries are complementary to context-mode (structural navigation vs output sandboxing); together they cover the two main sources of context bloat in coding sessions.
  3. code-review-graph — canonical AST-graph tool; community detection, wiki generation, and blast-radius analysis; use as the benchmark baseline for all other graph-based tools.
  4. codegraph — read alongside code-review-graph; architecturally similar but WASM-bundled and single-tool; important README integrity caveat (benchmark table copied from code-review-graph).
  5. graphify — read next in the graph family; prompt-orchestrated multi-modal variant (LLM drives Python CLI); 71.5× headline figure is on a curated 52-file corpus with an extreme baseline — understand the methodology before comparing to peers.
  6. sdl-mcp — read after graphify; caps the graph-intelligence family with the most disciplined context-escalation model (Iris Gate Ladder + ETag conditional re-fetch + Delta Packs). Watch verdict: novel architecture, no end-to-end benchmark, opaque proprietary storage.
  7. serena — LSP-backed approach is orthogonal to AST-graph tools; the progressive fallback on oversized results is a novel pattern worth understanding for any tool that returns variable-size context.
  8. jcodemunch-mcp — same AST-graph family as code-review-graph; narrowest benchmark (3 repos, range 79.7–99.8%); non-OSI license is a material risk.
  9. rtk — representative of the CLI-proxy category; simplest architecture in the survey; important caveat that all figures use a chars/4 heuristic, not a real tokenizer.
  10. n2-arachne — read for the fixed-percentage budget allocation model; chars/3.5 heuristic and non-commercial license are the two primary risks.
  11. jdocmunch-mcp — read for the O(1) byte-offset retrieval pattern; note the savings accounting flaw (counts all sections, not returned) and opt-out telemetry before adopting.
  12. socraticode — Qdrant-backed hybrid search (dense + BM25 via RRF); Docker required; the 61.5% figure is bytes not tokens from a single session — important methodological caveat shared with jdocmunch.
  13. osgrep — read alongside socraticode; richest hybrid retrieval pipeline in the survey (FTS + dense + two-stage ColBERT reranking) with zero external service dependencies; key caveat: MCP server is a non-functional stub as of last commit, and the 10-query benchmark covers cost only with no answer quality assessment.
  14. qmd — most sophisticated retrieval pipeline in this survey (8 steps: BM25 probe → LLM query expansion → vec → RRF → rerank); relevant primarily when the agent’s knowledge base is markdown, not code.
  15. caveman — output-style compression is a different category from all others; useful when output verbosity rather than input retrieval is the token budget bottleneck.
  16. Understand-Anything — read if developer comprehension (domain mapping, not token reduction) is the target; different value proposition from every other tool in this list.
  17. git-semantic-bun — borderline scope; read only if semantic retrieval from git commit history is specifically needed.