Analysis — CodeGraph
ANALYSIS: CodeGraph
Section titled “ANALYSIS: CodeGraph”Summary
Section titled “Summary”CodeGraph is a Node.js MCP server that pre-indexes a codebase into a Tree-sitter AST-backed SQLite knowledge graph. It exposes 8 MCP tools (not 1 as previously stated) including codegraph_explore for deep exploration, codegraph_context as the primary lightweight tool, plus codegraph_search, codegraph_callers, codegraph_callees, codegraph_impact, codegraph_node, codegraph_status, and codegraph_files. The original triage claim of “a single MCP tool” is incorrect — source review contradicts this.
Self-reported metrics of 94% fewer tool calls and 77% faster exploration are published in a detailed README benchmark table with per-codebase data. The README has been substantially updated since the original triage: the 8.2× token-reduction table and CRG eval command previously flagged as copied content are no longer present in the vendored source. The README now contains CodeGraph-specific benchmark data with named codebases and token counts. However, these figures remain unverified by independent reproduction.
README integrity re-assessment (source review)
Section titled “README integrity re-assessment (source review)”The original triage flagged the README for copying from tirth8205/code-review-graph. Source review of the vendored commit (19532a81) shows:
- No 8.2× token-reduction table — absent from README. (verified from source)
- No
code-review-graph eval --allcommand — absent from README. (verified from source) - No
diagrams/image references — no broken image links found. (verified from source) - The README now contains a per-codebase benchmark table with named open-source repos (VS Code, Excalidraw, Alamofire, Swift Compiler) and specific token counts, which are plausible and internally consistent.
The integrity issue documented in the triage appears to have been resolved in this or a prior commit. The original README damage notice should not be carried forward as a current concern for this version of the tool.
What it does (verified from source)
Section titled “What it does (verified from source)”Indexing pipeline (verified)
Section titled “Indexing pipeline (verified)”The repository is parsed by WASM-bundled web-tree-sitter + tree-sitter-wasms (verified from package.json and src/extraction/ directory). No native C extension is required — the SQLite layer also uses node-sqlite3-wasm with better-sqlite3 as an optional native alternative. The graph is persisted to .codegraph/codegraph.db in the project root. (verified)
Node kinds (verified from src/types.ts): file, module, class, struct, interface, trait, protocol, function, method, property, field, variable, constant, enum, enum_member, type_alias, namespace, parameter, import, export, route, component
Edge kinds (verified from src/types.ts): contains, calls, imports, exports, extends, implements, references, type_of, returns, instantiates, overrides, decorates
Edge provenance field (verified): edges carry a provenance attribute (one of tree-sitter, scip, or heuristic) so downstream tooling can distinguish high-confidence AST edges from heuristic inferences.
Database schema (verified from src/db/schema.sql)
Section titled “Database schema (verified from src/db/schema.sql)”Four core tables:
nodes— symbols with full location, visibility, async/static/abstract flags, decorators, type paramsedges— relationships with source/target FK cascade delete, kind, line/column, provenancefiles— tracked files with content hash for incremental change detectionunresolved_refs— cross-file references pending resolution post-indexing
FTS5 virtual table nodes_fts on (name, qualified_name, docstring, signature) with INSERT/UPDATE/DELETE triggers to keep it in sync. Symbol search is backed by SQLite FTS5 — no external embedding server. (verified)
Language support (verified from src/types.ts and DEFAULT_CONFIG)
Section titled “Language support (verified from src/types.ts and DEFAULT_CONFIG)”19 languages: TypeScript, JavaScript, TSX, JSX, Python, Go, Rust, Java, C, C++, C# (csharp), PHP, Ruby, Swift, Kotlin, Dart, Svelte, Liquid (Shopify), Pascal/Delphi. The unknown language value is also valid in the type system. (verified)
MCP interface (verified — corrects prior triage)
Section titled “MCP interface (verified — corrects prior triage)”The tool exposes 9 MCP tools, not 1. (verified from src/mcp/tools.ts)
| Tool | Purpose |
|---|---|
codegraph_explore | Deep exploration — returns source sections + relationship map in one call |
codegraph_context | PRIMARY tool — lightweight context for a task |
codegraph_search | Symbol lookup by name (returns locations, no code) |
codegraph_callers | Find all callers of a symbol |
codegraph_callees | Find all callees of a symbol |
codegraph_impact | Impact radius analysis for a symbol |
codegraph_node | Details + optional source for one symbol |
codegraph_status | Index statistics |
codegraph_files | File tree from the index (replaces Glob/ls) |
All tools support a projectPath parameter for cross-project queries — CodeGraph instances for other projects are opened on-demand and cached. (verified from ToolHandler.getCodeGraph())
The codegraph_explore tool dynamically adjusts its description to include a call budget scaled to project file count (e.g., 1 call for <500 files, up to 5 for >25k files). (verified from getExploreBudget())
Output is capped at 35,000 characters for codegraph_explore and 15,000 characters for other tools to prevent context bloat. (verified from EXPLORE_MAX_OUTPUT and MAX_OUTPUT_LENGTH constants)
Critical path: agent call → token-reduced output (verified)
Section titled “Critical path: agent call → token-reduced output (verified)”For codegraph_explore:
ToolHandler.handleExplore()receives query string and optionalmaxFiles(default 12).- Calls
cg.findRelevantContext(query, {searchLimit:8, traversalDepth:3, maxNodes:200, minScore:0.2}). findRelevantContext(inContextBuilder) runs hybrid search: a. Extracts potential symbol names from query via regex (CamelCase, snake_case, SCREAMING_SNAKE, acronyms, dot.notation, lowercase identifiers ≥3 chars) with a stopword filter. b. Exact name lookup againstnodestable for extracted symbols. c. FTS5 semantic search onnodes_ftsfor remaining budget. d. Merges results, applies co-location boosting. e. BFS traversal from entry-point nodes (prioritisingcontainsedges, thencalls, then others).- Groups resulting nodes by file; scores files (entry-point nodes = 10pts, directly-connected = 3pts, others = 1pt); filters to files with score ≥ 3.
- Sorts files: query-term matches first, deprioritises test/icon/i18n files, then by score.
- Clusters nearby symbol ranges within 15-line gap threshold; reads contiguous file sections with 3-line padding.
- Emits markdown: relationship map grouped by edge kind, then per-file code fences.
- Hard-truncates at 35k characters.
For codegraph_context (lighter path):
- Calls
cg.buildContext(task, {maxNodes:20, maxCodeBlocks:5, maxCodeBlockSize:1500, traversalDepth:1, searchLimit:3}). - Same hybrid search but shallower traversal (depth 1 vs 3) and tighter node budget.
- Outputs markdown summary with code snippets for up to 5 key nodes.
- Appends
"Ask user: UX preferences, edge cases, acceptance criteria"reminder if query appears to be a feature request (heuristic keyword match). (verified)
Incremental sync (verified)
Section titled “Incremental sync (verified)”FileWatcher (in src/sync/) uses native OS file-system events (FSEvents on macOS, inotify on Linux, ReadDirectoryChangesW on Windows) via Node.js fs.watch, with debouncing. The MCP server calls cg.watch() on startup and logs sync activity to stderr. Content-hash-based change detection in the files table avoids re-indexing unchanged files. (verified from MCP server startWatching() and README)
Dependencies (verified from package.json)
Section titled “Dependencies (verified from package.json)”Runtime: web-tree-sitter, tree-sitter-wasms, node-sqlite3-wasm, commander, @clack/prompts, picomatch. Optional: better-sqlite3 (native, faster SQLite). No cloud or embedding dependency. (verified)
Node.js engine constraint: >=18.0.0 <25.0.0. (verified)
Benchmark claims — verified vs as-reported
Section titled “Benchmark claims — verified vs as-reported”README benchmark table (as reported — not independently reproduced)
Section titled “README benchmark table (as reported — not independently reproduced)”The vendored README (19532a81) presents a per-codebase benchmark table for 6 codebases. Key figures:
| Metric | Value | Status |
|---|---|---|
| Fewer tool calls vs no graph (average) | 92% | as reported — not independently reproduced |
| Faster exploration (average) | 71% | as reported — not independently reproduced |
| Per-repo tool call reduction | 84–96% fewer | as reported — not independently reproduced |
| VS Code TypeScript (4,002 files, 59,377 nodes) | 3 calls vs 52 | as reported — not independently reproduced |
| Swift Compiler (25,874 files, 272,898 nodes) | 6 calls vs 37 | as reported — not independently reproduced |
| Token usage with CG (typical query) | 40.8k–77.4k | as reported — not independently reproduced |
| Token usage without CG (typical query) | 52.4k–99.1k | as reported — not independently reproduced |
The README claims agents “never fell back to reading files” when using CodeGraph. This is consistent with the 35k character explore output cap and the instruction in the installed CLAUDE.md that tells agents not to re-read files returned by codegraph_explore. (verified that the mechanism exists; outcome claim as reported)
Note: the average figures in the headline (94% / 77%) differ slightly from the per-row averages (92% / 71%) in the detailed table — the headline appears to use a different aggregation. This inconsistency is as-reported.
Own eval runner (verified harness exists — output not reproduced)
Section titled “Own eval runner (verified harness exists — output not reproduced)”The eval runner is at tests/evaluation/runner.ts. (verified)
- Runs against a pre-indexed codebase (
EVAL_CODEBASEenv var or CLI arg). - Requires
.codegraph/codegraph.dbto exist. - Executes 12 test cases: 6
searchNodescases (exact symbol lookup) and 6findRelevantContextcases (exploration quality). - Metrics: recall (found/expected symbols), MRR (mean reciprocal rank for search), edge density.
- Pass threshold: recall ≥ 0.5. (verified from
scoring.ts) - Saves JSON report to
tests/evaluation/results/. - Test cases are hardcoded against an Elasticsearch/OpenSearch-like codebase (symbols:
TransportService,RestController,AllocationService, etc.) — the harness is not a general benchmark; it is a regression harness for one specific target codebase. (verified fromtest-cases.ts)
No stored results exist in the vendored snapshot (results/ directory absent). The harness measures recall and MRR for symbol retrieval — it does not directly measure tool-call count reduction or exploration speed (those figures come from the README benchmark, not this runner). (verified)
What the 8.2× table is (update)
Section titled “What the 8.2× table is (update)”The 8.2× token-reduction table that was flagged in the original triage as copied CRG content is not present in the vendored source (19532a81). The README has been updated. The prior warning about CRG-copied content is no longer applicable to this version. (verified from source)
Source review
Section titled “Source review”Architecture (verified)
Section titled “Architecture (verified)”src/ index.ts Main CodeGraph class — public API types.ts All TypeScript interfaces (Node, Edge, Subgraph, TaskContext, …) db/ schema.sql 4 tables + FTS5 virtual table + 13 indexes queries.ts QueryBuilder with prepared statements extraction/ index.ts ExtractionOrchestrator (41.7k — largest file) tree-sitter.ts Universal AST parser (86k — language dispatch) grammars.ts Grammar detection + WASM loading languages/ Per-language grammar WASM files resolution/ index.ts ReferenceResolver (post-indexing cross-file resolution) frameworks/ Framework-specific patterns (React, Express, Laravel, …) graph/ traversal.ts BFS + DFS + getCallers/getCallees/getCallGraph/getImpactRadius queries.ts High-level graph queries context/ index.ts ContextBuilder — hybrid search + BFS expansion + formatter formatter.ts Markdown/JSON output sync/ FileWatcher (debounced fs.watch) mcp/ tools.ts 9 tool definitions + ToolHandler execution index.ts MCPServer — JSON-RPC over stdio bin/codegraph.ts CLI entry pointKey data structures (verified from source)
Section titled “Key data structures (verified from source)”Node: id (hash of path+qualifiedName), kind, name, qualifiedName, filePath, language, startLine, endLine, docstring, signature, visibility, isExported, isAsync, isStatic, isAbstract, decorators (JSON array), typeParameters (JSON array), updatedAt.Edge: source (nodeId), target (nodeId), kind (EdgeKind), metadata (JSON), line, column, provenance (one oftree-sitter,scip, orheuristic).Subgraph: nodes (Map<id, Node>), edges (Edge[]), roots (string[]).TaskContext: query, subgraph, entryPoints (Node[]), codeBlocks (CodeBlock[]), relatedFiles (string[]), summary, stats.
Context compression mechanism (verified)
Section titled “Context compression mechanism (verified)”CodeGraph does not use token counting or semantic similarity embeddings. Compression is structural:
- Graph traversal prunes the result to symbols reachable from query entry points within a depth budget.
- Node-kind filters exclude
import/exportnodes from output (high noise, low information density). - File scoring (entry 10pts, depth-1 3pts, others 1pt) ensures only structurally-relevant files appear.
- Symbol clustering merges nearby code ranges, skipping large gaps, so output is dense sections not sparse snippets.
- Hard output caps (35k explore, 15k others) prevent unbounded context inflation.
There is no embedding model, vector store, or LLM-based summarisation in the pipeline — the entire pipeline is deterministic FTS5 + BFS traversal + file I/O. (verified)
Architectural assessment
Section titled “Architectural assessment”What’s genuinely novel (updated from source)
Section titled “What’s genuinely novel (updated from source)”-
WASM-bundled tree-sitter: zero native module dependencies. (verified)
-
Tiered tool surface: The claim of “single-tool surface” in the triage was incorrect. The tool surface is 9 tools with a clear hierarchy:
codegraph_explorefor deep agent use (via spawned Explore subagents),codegraph_contextfor inline main-session use, and focused tools (callers, callees, impact, node, search, files, status) for targeted lookups. This is a richer and more ergonomic design than the triage described. -
Context-budget-aware explore:
codegraph_exploredynamically adjusts its call budget recommendation based on project file count — the MCP tool description itself contains “Budget: make at most N calls for this project (X files indexed)”. This allows agents to self-regulate without hardcoded limits. (verified fromgetExploreBudget()) -
Contiguous section reading over snippet extraction:
codegraph_explorereads contiguous file sections (with a 15-line gap merge threshold) rather than extracting individual function bodies. This preserves surrounding context and avoids the “decontextualised snippet” failure mode. (verified) -
Hybrid search combining exact symbol lookup with FTS5: The
findRelevantContextpipeline first extracts potential symbol names from the query via regex, performs exact DB lookups for those symbols, then supplements with FTS5 search for concept matching. Exact matches are prioritised. (verified) -
Installed CLAUDE.md injection: The installer adds instructions to
~/.claude/CLAUDE.mdthat tell the main session to never callcodegraph_exploredirectly (to prevent context bloat) and to instead spawn Explore subagents. The main session is restricted to lightweight tools. This is a deliberate architectural pattern for managing context pressure at the session level. (verified from README and installer)
Gaps and risks (updated from source)
Section titled “Gaps and risks (updated from source)”- Eval harness measures retrieval quality, not token reduction: The bundled eval runner measures symbol recall and MRR against a fixed Elasticsearch-like codebase. The 92%/94% tool-call reduction figures in the README come from a separate manual benchmark, not the automated runner. The runner cannot validate the headline claims without running CodeGraph against the same codebases used for those benchmarks.
- Test cases hardcoded to one codebase:
test-cases.tsuses Elasticsearch/OpenSearch-specific symbol names. The harness is a regression guard for one target, not a general benchmark suite. - No stored eval results: The
results/directory is absent from the vendored snapshot. The runner has never been publicly executed against the benchmark codebases with output preserved. - Incremental update latency on large monorepos:
startWatching()fires sync on every file change. On a codebase with 25k files (Swift Compiler), debounce and batch size behaviour is undocumented. The README claims “<4 minutes for initial index” but ongoing sync latency is unquantified. - No MRR or recall metric for explore queries: The eval harness measures recall for
findRelevantContextbut only tests 6 hardcoded queries. There is no reported aggregate recall figure forcodegraph_exploreover a broad query distribution. - Cross-project cache has no eviction policy:
ToolHandler.projectCacheaccumulates open CodeGraph instances with no LRU eviction. Long-running MCP sessions with many cross-project queries will hold all instances open indefinitely. (verified from source)
Recommendation
Section titled “Recommendation”Conditionally adopt — the tool is more capable than the triage suggested. The 9-tool surface, contiguous section reading, dynamic call budgets, and CLAUDE.md injection pattern are all genuine engineering decisions that address context management at multiple levels. The WASM-bundled install is the lowest-friction AST-backed graph tool in this survey.
The README integrity concern from the original triage is resolved in this version. The benchmark table is CodeGraph-specific. However, no token-reduction figures are independently verified — use the bundled eval runner against target codebases before deploying in token-sensitive pipelines.
Prefer codebase-memory-mcp if ad-hoc graph query access is required. codebase-memory-mcp provides Cypher query access and a richer tool surface for complex structural analysis. CodeGraph’s advantage is zero-native-dep install, the contiguous section reading strategy, and the CLAUDE.md-injection context management pattern.
Comparison hooks (for ANALYSIS.md matrix)
Section titled “Comparison hooks (for ANALYSIS.md matrix)”| Dimension | CodeGraph |
|---|---|
| Approach | Tree-sitter AST → SQLite (FTS5) graph; hybrid FTS + BFS traversal |
| Compression (tool calls) | 92% fewer avg across 6 codebases (as reported — not independently reproduced) |
| Compression (speed) | 71% faster avg (as reported — not independently reproduced) |
| Token budget model | Dynamic per-project call budget in tool description; hard output caps (35k explore, 15k others) |
| Injection strategy | Main-session CLAUDE.md injection restricts agent to lightweight tools; explore calls delegated to subagents |
| Eviction | N/A — no context injection pipeline; output cap + node budget limit result size |
| Benchmark harness | Yes — tests/evaluation/runner.ts; recall+MRR; hardcoded to one Elasticsearch-like codebase; no stored results |
| License | MIT |
| Maturity | v0.7.2; 412 stars; created 2026-01-18 |
| README integrity | Resolved in v0.7.2 — CRG-copied content removed; own benchmark data present |
| MCP tools | 9 (explore, context, search, callers, callees, impact, node, status, files) |
| Native deps | Zero — WASM-bundled tree-sitter + SQLite |