Analysis — jcodemunch-mcp
ANALYSIS: jcodemunch-mcp
Section titled “ANALYSIS: jcodemunch-mcp”Summary
Section titled “Summary”jcodemunch-mcp is a local-first MCP server that indexes a codebase once with tree-sitter AST parsing and serves symbol-level retrieval to AI agents, avoiding whole-file reads. The published benchmark numbers have been updated by the author since the original triage: the current canonical figure is 99.6% aggregate reduction (not 95%) across 15 task-runs on 3 repos, with larger repo snapshots than originally reported (expressjs/express: 165 files, fastapi/fastapi: 951 files, gin-gonic/gin: 98 files). This figure is verified from benchmarks/results.md in the vendored source. The benchmark methodology is verified from benchmarks/harness/run_benchmark.py and benchmarks/METHODOLOGY.md: baseline = all source files concatenated (a lower bound), jcodemunch workflow = search_symbols (top 5) + get_symbol_source × 3 per query, tokenised with tiktoken cl100k_base. All numbers are still author-produced; no independent third-party reproduction exists.
The architecture is sound: tree-sitter extracts deterministic, language-aware symbol metadata including byte offsets; SQLite WAL stores the index; MCP tools serve exact source spans on demand. The tool set has grown substantially beyond basic retrieval into import-graph analysis, dead-code detection, session routing, and complexity metrics — a broad surface that raises maintenance risk but also distinguishes the tool from simple file-reading wrappers.
Source review note (2026-04-13): The vendored source is v1.36.0 (up from v1.24.5 at the time of initial analysis). Several implementation details from the original triage and earlier analysis were corrected during source review — see the “Source review” section below.
What it does (verified from source)
Section titled “What it does (verified from source)”Core mechanism
Section titled “Core mechanism”The retrieval loop has three layers (verified from ARCHITECTURE.md and pyproject.toml):
-
Parse layer —
tree-sitter-language-pack>=0.7.0provides pre-compiled grammars for Python, JavaScript, TypeScript, TSX, Go, Rust, Java, C, C++, C#, Ruby, PHP, Swift, Kotlin, Scala, and others (text-search-only for some). Each file is parsed to an AST; a per-language registry maps node types to symbol kinds (function, class, method, constant, type). Extracted per symbol: kind, name, qualified name, byte-start, byte-end, signature, and docstring (withdocstring -> AI batch -> signaturefallback chain). -
Storage layer — One SQLite database per indexed repository, stored at
~/.code-index/(configurable viaCODE_INDEX_PATH). WAL mode allows concurrent reads during writes. Schema tables (verified fromsqlite_store.py):meta,symbols,files. The original triage listedimports,raw_cache, andcontent_blobas additional tables — these do not exist in the current schema. Raw source files are cached in a flat content directory alongside the DB; symbol byte offsets point into these cached files, enabling exact retrieval by direct byte-seeking without reparsing. -
Retrieval layer — MCP tool surface (see Interface / API). The token reduction mechanism is simple: instead of returning an entire file, the server returns only the matched symbol source span plus a JSON metadata envelope.
get_ranked_contextassembles multi-symbol bundles within a caller-supplied token budget using Weighted Reciprocal Rank (WRR) fusion across four channels: lexical BM25, structural PageRank, embedding cosine similarity, and identity/exact-match (verified fromsrc/jcodemunch_mcp/retrieval/signal_fusion.py). Default channel weights: identity=2.0, lexical=1.0, similarity=0.8, structural=0.4. The earlier description of “BM25 + PageRank” was a simplification — the full fusion pipeline uses all four channels.
The benchmark methodology (verified from benchmarks/METHODOLOGY.md and benchmarks/harness/run_benchmark.py) defines:
- Baseline: all indexed source files concatenated, tokenised with
tiktoken cl100k_base— explicitly a lower bound; real agents that re-read files produce higher actual baselines. - jcodemunch workflow:
search_symbols(top 5 results) +get_symbol_sourcex 3 hits. Token count = serialised JSON response tokens, which includes field-name overhead and slightly understates the reduction relative to raw-source comparisons. - Reduction formula:
(1 - jmunch_tokens / baseline_tokens) * 100.
Interface / API
Section titled “Interface / API”MCP tool groups (verified from ARCHITECTURE.md):
| Group | Key tools |
|---|---|
| Indexing / repo management | index_repo, index_folder, index_file, list_repos, resolve_repo, invalidate_cache |
| Discovery / outlines | get_repo_outline, get_file_tree, get_file_outline, suggest_queries |
| Retrieval | get_file_content, get_symbol_source, get_context_bundle, get_ranked_context |
| Search | search_symbols, search_text, search_columns |
| Relationship / impact | find_importers, find_references, get_dependency_graph, check_references, get_related_symbols, get_class_hierarchy, get_blast_radius, get_symbol_diff |
| Session / routing | plan_turn, get_symbol_importance, find_dead_code, get_untested_symbols, get_changed_symbols |
CLI mirrors the MCP surface: list, index, outline, search, get, text, file, invalidate, watch, watch-claude.
Dependencies
Section titled “Dependencies”Core (verified from pyproject.toml):
- Python >= 3.10
mcp>=1.10.0,<2.0.0tree-sitter-language-pack>=0.7.0,<1.0.0httpx>=0.27.0pathspec>=0.12.0,pyyaml>=6.0
Optional extras: anthropic, gemini, openai (AI-assisted summarisation); watchfiles (watch mode); uvicorn/starlette (HTTP/SSE transport); sentence-transformers (semantic/hybrid search).
Scope / limitations
Section titled “Scope / limitations”- Macro-generated or dynamically defined symbols are invisible to the parser.
- Anonymous arrow functions without assigned names are not indexed in JavaScript.
- Deep inner-class nesting may be flattened in Java.
- AI-assisted summarisation backends (Anthropic, Gemini, OpenAI) send code to external APIs — relevant for private-codebase deployments.
- No versioned grammar lockfile beyond the semver range in
pyproject.toml; tree-sitter-language-pack updates can silently alter extraction behaviour. - Source now vendored at
tools/jgravelle-jcodemunch-mcp/; the prior caveat about 404s on raw.githubusercontent.com no longer applies.
Benchmark claims — verified vs as-reported
Section titled “Benchmark claims — verified vs as-reported”Important: The benchmark figures in the original triage (95% aggregate) and the in-source results.md (99.6% aggregate) differ substantially. Both sets of numbers are from the same author; the difference reflects either an updated index state or re-running the harness on larger repo snapshots. The figures below are from the vendored benchmarks/results.md (verified from source, run 2026-03-28).
| Metric | Value | Status |
|---|---|---|
| expressjs/express: files indexed | 165 | verified from results.md (triage claimed 34 — contradicted) |
| expressjs/express: symbols | 181 | verified from results.md (triage claimed 117 — contradicted) |
| expressjs/express: baseline tokens | 137,978 | verified from results.md (triage claimed 73,838 — contradicted) |
| expressjs/express: avg reduction | 99.4% | verified from results.md (triage claimed 98.4% — superseded) |
| fastapi/fastapi: files indexed | 951 | verified from results.md (triage claimed 156 — contradicted) |
| fastapi/fastapi: symbols | 5,325 | verified from results.md (triage claimed 1,359 — contradicted) |
| fastapi/fastapi: baseline tokens | 699,425 | verified from results.md (triage claimed 214,312 — contradicted) |
| fastapi/fastapi: avg reduction | 99.8% | verified from results.md (triage claimed 92.7% — superseded) |
| gin-gonic/gin: files indexed | 98 | verified from results.md (triage claimed 40 — contradicted) |
| gin-gonic/gin: symbols | 1,489 | verified from results.md (triage claimed 805 — contradicted) |
| gin-gonic/gin: baseline tokens | 187,018 | verified from results.md (triage claimed 84,892 — contradicted) |
| gin-gonic/gin: avg reduction | 99.4% | verified from results.md (triage claimed 98.0% — superseded) |
| Grand total aggregate | 99.6% (5,122,105 -> 19,406 tokens, 15 runs) | verified from results.md (triage claimed 95% / 1,865,210 tokens — contradicted) |
| Per-query range | 99.2% – 99.9% (current results.md) | verified from results.md; triage range 79.7%–99.8% is outdated |
| Tokeniser | tiktoken cl100k_base | verified from source (METHODOLOGY.md, run_benchmark.py) |
| Benchmark harness exists and is public | benchmarks/harness/run_benchmark.py + benchmarks/tasks.json | verified from source |
| Baseline definition | lower bound (single-pass full concatenation) | verified from METHODOLOGY.md |
| Query corpus size | 5 queries x 3 repos = 15 task-runs | verified from tasks.json |
| Real-world A/B test result | 20% token savings (Wilcoxon p=0.0074) on a Vue3+Firebase codebase, 50-iteration test by @Mharbulous | verified from results.md; independent contributor, not author |
| Independent third-party reproduction of headline figure | none found | unverified |
| Retrieval quality / precision measurement | not in harness; tracked separately in jMunchWorkbench | verified from METHODOLOGY.md |
The benchmark methodology is honest about its own limits: the baseline is a lower bound, the query corpus is small (5 queries), and retrieval quality is not measured. The current 99.6% figure comes from larger repo snapshots than the original triage. The real-world A/B test (20% savings, p=0.0074) is a more meaningful production signal — it reflects end-to-end cost on a real codebase including all fixed overhead, not just symbol retrieval.
Architectural assessment
Section titled “Architectural assessment”What’s genuinely novel
Section titled “What’s genuinely novel”- Byte-offset symbol retrieval: storing start/end byte offsets at index time and using direct byte-seeking at retrieval time eliminates reparsing. The exact-span guarantee matters for agents that must not hallucinate code.
- Token-budgeted context assembly (
get_ranked_context): BM25 + PageRank scoring within a caller-supplied token budget is a principled approach to context window management that goes beyond naive top-k retrieval. - Session-aware routing (
plan_turn, turn budgets, negative evidence): framing context assembly as a per-turn budget problem with negative-evidence avoidance is a more sophisticated abstraction than most retrieval tools expose. - AST-derived call graph and import graph:
find_importers,get_blast_radius,get_dependency_graph, and class hierarchy traversal from tree-sitter parse results without requiring an LSP process is a meaningful differentiator for codebases that cannot run a language server. - watch-claude mode: automatic Claude Code worktree discovery and incremental hot-reindexing provides practical integration that raw file-reading tools lack.
Gaps and risks
Section titled “Gaps and risks”- Self-reported benchmarks only: the harness is public, but all canonical numbers come from the author. The three benchmark repos are small-to-medium (98–951 files), and the 99.6% headline figure comes from the author’s own index state. The real-world A/B test by @Mharbulous (20% savings, p=0.0074) is a better signal, but still a single codebase. Large monorepos where task queries require broad cross-file context are not represented.
- Benchmark measures tokens, not quality: a tool that returns 100% fewer tokens but misses the relevant symbol produces worse outcomes than the baseline. Retrieval precision is tracked in a separate tool (jMunchWorkbench) and not reported alongside token savings.
- Broad capability surface: the tool set has grown to 40+ tools including complexity metrics, dead-code detection, hotspot analysis, and architectural enforcement hooks. Each additional tool adds schema tokens per turn (mitigated by
disabled_toolsconfig) and maintenance surface. - Commercial license terms (verified from
LICENSEin source): theLICENSEfile is present and defines three commercial tiers: Builder $79 (1 developer), Studio $349 (up to 5 developers), Platform $1,999 (org-wide). The triage figure of $2,249 is incorrect. The PyPIlicensemetadata field may still beNone— the license text is in the repo file, not the package metadata. - Anonymous telemetry (verified from
src/jcodemunch_mcp/storage/token_tracker.py): the tool sends an anonymous{delta, anon_id}payload tohttps://j.gravelle.us/APIs/savings/post.phpon each session flush (every 3 tool calls by default). Only byte-approximated token-savings counts and a UUID are sent — no code or paths. Opt-out viaJCODEMUNCH_SHARE_SAVINGS=0. This was not noted in the original triage or analysis. - Private-codebase data exposure: optional AI summarisation backends (Anthropic, Gemini, OpenAI, MiniMax, ZhipuAI, OpenRouter) send symbol text to external APIs. This is documented but easy to activate inadvertently. The auto-detect order checks for API keys in the environment at startup (verified from
src/jcodemunch_mcp/summarizer/batch_summarize.py). - Grammar version drift: the
tree-sitter-language-pack>=0.7.0,<1.0.0range permits minor-version bumps that may change AST node types and silently break symbol extraction.
Recommendation
Section titled “Recommendation”Adopt with caveats for retrieval-heavy workflows on medium-sized codebases. The architecture is sound, the benchmark methodology is transparent, and the harness is reproducible. Token savings of 99%+ on structured web-framework codebases are credible for symbol-targeted queries. The 99.6% headline figure reflects larger repo snapshots than originally reported and should be read against the real-world A/B test (20% end-to-end savings, p=0.0074), which is a more representative production figure.
Do not deploy on private commercial codebases without a paid license, explicit opt-out of AI summarisation backends, and opt-out of anonymous telemetry (JCODEMUNCH_SHARE_SAVINGS=0). The non-commercial restriction, optional external-API summarisation, and default-on telemetry are three distinct compliance risks.
Independent benchmark reproduction is the outstanding gap. The harness is runnable; reproducing against the same three repos with an independent install would confirm whether the 99.6% figure holds on a clean environment distinct from the author’s own index state.
Source review (2026-04-13)
Section titled “Source review (2026-04-13)”Source at tools/jgravelle-jcodemunch-mcp/ is v1.36.0 (pyproject.toml). Key findings from primary source inspection:
Architecture — critical path
Section titled “Architecture — critical path”- Entry point:
src/jcodemunch_mcp/server.py(201 KB) — async MCP dispatcher, CLI subcommand routing, auth/rate-limit middleware. - Parse:
src/jcodemunch_mcp/parser/extractor.py(285 KB) — importstree_sitter_language_pack.get_parser, dispatches to per-languageLanguageSpec(defined inlanguages.py). Each spec maps AST node types to symbol kinds and defines docstring extraction strategy (python:next_sibling_string; JS/TS:first_child_comment; Go/Rust/Java:preceding_comment). Custom regex extractors for Erlang, Fortran, SQL/dbt, and Razor bypass tree-sitter for those languages. - Storage:
src/jcodemunch_mcp/storage/sqlite_store.py—PRAGMA journal_mode = WALapplied once at DB creation; ongoing connections usePRAGMA synchronous = NORMAL,PRAGMA wal_autocheckpoint = 1000, 256 MB mmap. Schema: 3 tables (meta,symbols,files) plus indexes — not 6 tables as stated in the original triage. Theraw_cacheandcontent_blobtables from the triage do not exist in the current schema. File content for byte-offset retrieval is stored in a flat content directory alongside the DB, referenced via paths inCodeIndex.source_files. - Index version:
INDEX_VERSION = 8(verified fromindex_store.py). Migrations v4→v8 are implemented insqlite_store.py. - Symbol lookup: O(1) via
CodeIndex._symbol_indexdict built in thepost_initconstructor method (verified fromindex_store.py). - Retrieval ranking:
signal_fusion.py— Weighted Reciprocal Rank (WRR) across 4 channels. The description “BM25 + PageRank” in earlier analysis was a simplification. WRR formula:score(s) = sum(weight[c] / (k + rank(c, s)))with default smoothing k=60. - Summarization:
summarizer/batch_summarize.py— 3-tier: docstring extraction → AI provider (Anthropic/Gemini/OpenAI/MiniMax/ZhipuAI/OpenRouter) → signature fallback. Auto-detect order checks env vars at startup. - Telemetry:
storage/token_tracker.py— anonymous{delta, anon_id}batches posted tohttps://j.gravelle.us/APIs/savings/post.phpvia a background daemon thread. Default-on; disable withJCODEMUNCH_SHARE_SAVINGS=0.
Key data structures
Section titled “Key data structures”CodeIndex(index_store.py): dataclass withsymbols: list[dict],source_files: list[str],file_hashes,imports,file_mtimes. Post-init builds_symbol_index: dict[str, dict]for O(1) lookup and_bm25_cache(lazy).Symbol(parser/symbols.py): id, name, kind, file, byte_offset, byte_length, signature, docstring, qualified_name, language, decorators, keywords, cyclomatic, max_nesting, param_count.ChannelResult/FusedResult(retrieval/signal_fusion.py): per-channel ranked list merged by WRR.
Claims corrected by source
Section titled “Claims corrected by source”| Claim (original triage / prior analysis) | Source finding | Status |
|---|---|---|
| SQLite schema has 6 tables: meta, symbols, files, imports, raw_cache, content_blob | Schema SQL defines 3 tables: meta, symbols, files. No raw_cache or content_blob table. | contradicted |
| Retrieval uses BM25 + PageRank | Retrieval uses WRR fusion: lexical BM25 + structural PageRank + embedding similarity + identity (4 channels) | partially contradicted (BM25+PageRank are present but are 2 of 4 channels) |
| Commercial tiers: $79–$2,249 | LICENSE file: $79 (Builder), $349 (Studio), $1,999 (Platform) | corrected — $2,249 figure is wrong |
| No mention of telemetry | Anonymous savings telemetry is enabled by default; opt-out via JCODEMUNCH_SHARE_SAVINGS=0 | new finding |
| Benchmark: express=34 files/117 symbols, fastapi=156/1,359, gin=40/805 | results.md: express=165/181, fastapi=951/5,325, gin=98/1,489 | contradicted — repo snapshots are substantially larger |
| Benchmark aggregate: 95% (1,865,210 → 92,515 tokens) | results.md: 99.6% (5,122,105 → 19,406 tokens) | contradicted — superseded by larger repo run |
Comparison hooks (for ANALYSIS.md matrix)
Section titled “Comparison hooks (for ANALYSIS.md matrix)”| Dimension | jcodemunch-mcp |
|---|---|
| Approach | Tree-sitter AST parse -> SQLite symbol index -> byte-offset retrieval |
| Compression | Symbol-span extraction (only matched code spans returned) |
| Token budget model | Explicit per-call budget in get_ranked_context; per-turn budget via plan_turn |
| Injection strategy | MCP tool responses replace whole-file reads; agent controls what to fetch |
| Eviction | No eviction; index is persistent; incremental reindex on file change |
| Benchmark harness | Public (benchmarks/harness/run_benchmark.py); tiktoken cl100k_base; 5 queries x 3 repos |
| License | Free non-commercial; paid tiers $79/$349/$1,999 for commercial use (verified from LICENSE) |
| Maturity | v1.36.0 (vendored source); active development |