Analysis — socraticode
ANALYSIS: socraticode
Section titled “ANALYSIS: socraticode”Summary
Section titled “Summary”SocratiCode is a TypeScript MCP server that wraps Qdrant’s native hybrid query API (dense vector + BM25 sparse vector, fused with RRF) behind an auto-managed Docker deployment and an AST-aware chunking pipeline powered by ast-grep. The hybrid search mechanism is verified from source: qdrant.ts issues a single query() call with two prefetch legs (dense cosine + BM25 text model) and fusion: "rrf". AST-aware chunking is also verified: indexer.ts dispatches 18+ language grammars via @ast-grep/napi to extract function/class-level declaration boundaries before falling back to line-based chunking for unsupported languages.
The 61.5% token reduction claim is directionally plausible but unverified independently: the README benchmark table header explicitly labels the columns “bytes” (not tokens), confirming the prior finding that this figure measures raw bytes exchanged during a live Claude Opus 4.6 session — not LLM input tokens. The README itself states “61.5% less data consumed” and separately notes this “directly reduces token costs,” but token costs are not measured directly. No benchmark harness is committed to the repository. The polyglot dependency graph (20+ languages with full AST support) and the non-code context artifact indexing are genuine differentiators absent from competing tools in this survey.
The vendored source is v1.4.1 (not v1.3.2 as originally triaged). Version 1.4.0 added three significant features after the original triage: linked-project multi-collection search with client-side RRF fusion, branch-aware collection naming, and JVM multi-module import resolution. These are documented below.
What it does (verified from source)
Section titled “What it does (verified from source)”Core mechanism
Section titled “Core mechanism”Hybrid search pipeline (verified from src/services/qdrant.ts)
Section titled “Hybrid search pipeline (verified from src/services/qdrant.ts)”SocratiCode delegates vector storage entirely to Qdrant. Each indexed chunk is upserted as a Qdrant point with two vector fields:
dense: a float32 cosine vector generated by the local embedding provider (default:nomic-embed-textvia Ollama, 768 dimensions).bm25: a Qdrant-native sparse vector populated server-side via theqdrant/bm25model; text is passed as a payload field at upsert time and scored against BM25-IDF at query time.
At query time, searchChunks() issues a single Qdrant query() call with two prefetch legs (each fetching max(limit * 3, 30) candidates) and query: { fusion: "rrf" }. RRF merging and result deduplication happen inside Qdrant, not in the Node.js process. This means single-collection RRF is a thin client wrapper around Qdrant’s built-in hybrid query feature introduced in Qdrant v1.15.2 — not a custom implementation.
Multi-collection search (verified from src/services/qdrant.ts, added in v1.4.0): When includeLinked: true is passed to codebase_search, searchMultipleCollections() queries each linked-project collection independently in parallel, sharing a single pre-computed dense embedding vector, then merges results client-side using a custom RRF implementation (mergeMultiCollectionResults(), RRF_K = 60). This second RRF layer runs in Node.js, not in Qdrant, and deduplicates by label::relativePath key. This is a genuine custom RRF implementation, not delegated to Qdrant.
AST-aware chunking (verified from src/services/indexer.ts)
Section titled “AST-aware chunking (verified from src/services/indexer.ts)”chunkFileContent() dispatches through three strategies in order:
- Character-based chunking for minified or bundled content (detected by average line length exceeding
MAX_AVG_LINE_LENGTH = 500characters). Splits at safe token boundaries (newline, space, tab, semicolon, comma). - AST-aware chunking (
findAstBoundaries()) when a grammar is available. Uses@ast-grep/napiwith per-languageTOP_LEVEL_KINDSmaps (function declarations, class declarations, interface/type/enum declarations, etc.) across 18+ languages. Declaration regions are extracted, merged to avoid overlaps, then packed into chunks that respect a soft minimum (MIN_CHUNK_LINES = 5) and hard maximum (MAX_CHUNK_LINES = 150) line count. Small adjacent declarations are merged; large ones are sub-chunked withCHUNK_SIZE = 100lines andCHUNK_OVERLAP = 10lines. A preamble region (imports, constants, comments before the first declaration) and an epilogue region (code after the last declaration) are emitted as their own chunks. - Line-based fallback for unsupported file types, using the same CHUNK_SIZE/CHUNK_OVERLAP parameters.
All three strategies apply a hard character cap (applyCharCap(), MAX_CHUNK_CHARS = 2000) as a final safety net before embedding. Note: 2000 characters is a tight cap that may truncate large function bodies at the embedding stage.
Polyglot dependency graph (verified from src/services/code-graph.ts and src/services/graph-imports.ts)
Section titled “Polyglot dependency graph (verified from src/services/code-graph.ts and src/services/graph-imports.ts)”buildCodeGraph() walks the project tree using the same ignore filter as the indexer, then calls extractImports() per file via the ast-grep grammar when available, or per-language regex fallbacks (Dart, Lua, R, TOML) when not. Supported languages with AST-based extraction include JS/TS/TSX, Python, Java, Kotlin, Scala, Go, Rust, C#, PHP, Ruby, Swift, C/C++, Bash, CSS/SCSS/Stylus, HTML, Svelte, and Vue. Svelte and Vue <script> blocks are re-parsed as TypeScript; CSS @import is extracted from <style> blocks. TypeScript path aliases (tsconfig.json/jsconfig.json compilerOptions.paths, including extends chains) are resolved via a separate loadPathAliases() pass. The resulting CodeGraph (nodes + edges) is serialized as a JSON payload in the Qdrant socraticode_metadata collection — not a separate graph collection.
Context artifacts (verified from src/services/context-artifacts.ts)
Section titled “Context artifacts (verified from src/services/context-artifacts.ts)”Users declare non-code artifacts (database schemas, OpenAPI specs, Terraform configs, architecture docs) in .socraticodecontextartifacts.json. Each artifact path is globbed, chunked, and embedded into a separate Qdrant collection (context_{projectId}) using the same hybrid dense + BM25 approach as code search. Staleness detection via content hashing triggers automatic re-indexing on next search.
Interface / API
Section titled “Interface / API”Thirteen MCP tools (verified from src/tools/ and src/index.ts):
codebase_index— full index with optional path, file extension, and watcher configuration.codebase_update— incremental re-index of changed files only.codebase_status— index health, chunk count, phase progress, active watcher state.codebase_search— hybrid semantic + BM25 search with optionalfileFilter,languageFilter,minScore, andincludeLinked(v1.4.0+: cross-project search against linked collections).codebase_stop— graceful cancellation of an in-flight indexing operation (stops at next batch boundary).codebase_graph_build/codebase_graph_query/codebase_graph_stats/codebase_graph_circular/codebase_graph_visualize/codebase_graph_status/codebase_graph_remove— dependency graph lifecycle and query.codebase_context/codebase_context_search/codebase_context_index— context artifact management.codebase_watch— start/stop filesystem watcher (debounced 2s, via@parcel/watcher).codebase_list_projects/codebase_remove— multi-project management.codebase_health/codebase_about— diagnostics.
Dependencies
Section titled “Dependencies”Runtime (from package.json, verified — v1.4.1):
@ast-grep/napi^0.40.5 and 13 language extension packages (@ast-grep/lang-bash,-c,-cpp,-csharp,-go,-java,-kotlin,-php,-python,-ruby,-rust,-scala,-swift) — AST parsing.@qdrant/js-client-rest^1.17.0 — Qdrant REST client. The pinned Docker image isqdrant/qdrant:v1.17.0, which is newer than the v1.15.2 minimum required for BM25 hybrid queries.ollama^0.5.14,openai^6.22.0,@google/generative-ai^0.24.1 — embedding backends.@modelcontextprotocol/sdk^1.26.0 — MCP server.@parcel/watcher^2.5.6 — cross-platform filesystem watching.proper-lockfile^4.1.2 — cross-process file locking for multi-agent index coordination.glob^11.0.1,ignore^7.0.3 — file traversal and gitignore handling.zod^3.24.2 — runtime validation.
Infrastructure (default deployment): Docker daemon with two auto-managed containers — Qdrant (vector store) and Ollama (embedding server). Both containers are started automatically on first use; no manual configuration is required. Ollama mode defaults to auto: probes localhost:11434 first (native Ollama, GPU-accelerated on Mac/Windows); falls back to a Docker container on port 11435 if no native instance is found. Qdrant always requires Docker in managed mode; QDRANT_MODE=external enables self-hosted or cloud Qdrant.
Scope / limitations
Section titled “Scope / limitations”- Static analysis only: no runtime tracing or dynamic call graph edges.
- Qdrant v1.15.2+ required: BM25 hybrid query support is a relatively recent Qdrant feature. The auto-managed Docker container pins
qdrant/qdrant:v1.17.0; self-hosted Qdrant instances must be at v1.15.2 or later. - Graph build is asynchronous:
buildCodeGraph()runs in the background and requires pollingcodebase_graph_statusfor completion on large repos. No streaming progress. - Recursive DFS for circular dependency detection:
findCircularDependencies()ingraph-analysis.tsuses a recursive DFS. It is guarded by avisitedset (preventing re-entry) but the call stack grows proportionally to the longest dependency chain and will overflow on very deep cycles in large monorepos. - Graph stored as JSON payload in Qdrant metadata collection, not as a graph database:
codebase_graph_queryreturns only direct imports/dependents for a given file path — there is no multi-hop traversal equivalent totrace_call_pathin codebase-memory-mcp. - Docker is required for the default mode: a pure in-process fallback is not available in v1.4.1. A native Ollama install can replace the Ollama container (auto-detected); Qdrant always requires Docker or a self-hosted external instance.
- AGPL-3.0: commercial embedding in proprietary products without source disclosure requires a separate commercial licence. The repo ships
LICENSE-COMMERCIALbut does not link to pricing or terms. - Hard 2000-character chunk cap:
MAX_CHUNK_CHARS = 2000truncates all chunk payloads regardless of strategy. Large function bodies (common in generated code, tests, or verbose languages) may be truncated before embedding. - BM25 text also capped at 32,000 characters (
MAX_BM25_TEXT_CHARS) before being forwarded to Qdrant’s server-side tokenizer. This is a separate limit from the chunk content cap.
Benchmark claims — verified vs as-reported
Section titled “Benchmark claims — verified vs as-reported”| Metric | Value | Status |
|---|---|---|
| ”Token” reduction vs grep baseline | 61.5% (250,510 → 96,485 bytes across 5 questions) | partially verified — byte figures confirmed from README table; README labels columns “bytes” not “tokens”; claim that this “directly reduces token costs” is inference, not measurement |
| Tool call reduction | 84% (31 → 5 calls across 5 questions) | as reported |
| Search latency | 60–90 ms vs 2–3.5 s (grep) | as reported |
| Test repo | VS Code, 2.45M lines, 5,300+ files, 55,437 chunks | as reported (README); no independent measurement |
| Benchmark model | Claude Opus 4.6 (live session, not scripted harness) | verified — README states “tested live with Claude Opus 4.6” |
| Scripted benchmark harness exists | No — no benchmark/, eval/, or evals/ directory; no benchmark script | verified from source tree |
| Hybrid search implemented as described | Yes — Qdrant prefetch + fusion: "rrf" for single-collection queries | verified from src/services/qdrant.ts |
| AST-aware chunking implemented as described | Yes — ast-grep boundaries + char/line fallbacks; 2000-char hard cap | verified from src/services/indexer.ts |
| RRF implemented client-side (single collection) | No — delegated to Qdrant’s built-in query API | verified from source |
| RRF implemented client-side (multi-collection, v1.4.0+) | Yes — mergeMultiCollectionResults() with RRF_K = 60 in Node.js | verified from src/services/qdrant.ts |
| Docker required | Yes — Qdrant always requires Docker (or external QDRANT_URL); Ollama is auto-detected and Docker is the fallback only if native Ollama absent | verified from src/services/startup.ts, src/constants.ts |
| SQLite for local mode | No — all persistence is in Qdrant; no SQLite anywhere in the codebase | verified — the triage claim of “SQLite + in-process HNSW” is incorrect |
| RepoEval / SWE-bench citations | Referenced in README but no direct paper links provided | unverifiable as cited |
Key correction from prior triage: The original triage described a “local (SQLite + in-process HNSW)” mode. This is incorrect. Qdrant is the sole persistence layer for all data (chunks, graph, metadata, context artifacts). There is no SQLite or in-process HNSW in the codebase. The only “local” option is Docker-managed Qdrant.
The benchmark measures raw bytes (not LLM tokens) exchanged between Claude and the tools during a live session. The README table column headers are explicitly labeled “Grep (bytes)” and “SocratiCode (bytes)”. The summary line claims “61.5% less data consumed — The AI agent processes ~150KB less context, which directly reduces token costs with any LLM” — this equates bytes with token costs without measuring actual tokenizer output. The grep baseline uses grep -rl to discover files then reads them in 200-line chunks — realistic but not adversarially optimized. A focused ripgrep or targeted file-read approach would consume fewer bytes, making the real savings vs optimized grep lower than 61.5%. The five questions (workspace trust, diff editor, extension lifecycle, terminal shells, command palette) are architectural queries well-suited to semantic search — they deliberately favor hybrid retrieval over exact-match grep.
Architectural assessment
Section titled “Architectural assessment”What’s genuinely novel
Section titled “What’s genuinely novel”-
Qdrant as the sole backing store for all indexes. Code chunks, dependency graph, context artifacts, and project metadata all use a single Qdrant instance. Chunks land in
codebase_{projectId}collections; graph and metadata land as JSON payloads in thesocraticode_metadatacollection. This eliminates the SQLite + separate vector DB split common in competing tools. The tradeoff is Docker as a hard infrastructure dependency. -
Zero-config auto-provisioning with intelligent Ollama detection. The server probes
localhost:11434for a native Ollama install first; if found, it uses it (GPU-accelerated on Mac/Windows). If not, it pulls Docker images, starts Qdrant and Ollama containers, and downloadsnomic-embed-texton first run with no user action required. Among tools in this survey this is the lowest-friction stateful local deployment. -
Context artifact indexing as a first-class feature. Treating database schemas, OpenAPI specs, Terraform configs, and architecture docs as searchable, hybrid-indexed artifacts alongside code is a distinct capability not present in any other tool in this survey. This directly addresses the common agent failure mode of lacking schema context when writing queries or migrations.
-
Multi-agent coordination via file locking.
proper-lockfilecoordinates cross-process access so multiple concurrent agent sessions share one index without corruption. One session indexes; all sessions search; stale locks are reclaimed automatically. -
Svelte and Vue import extraction. Re-parsing
<script>blocks as TypeScript and extracting CSS@importfrom<style>blocks covers frontend framework files that simpler regex-based import extractors miss. -
Linked-project cross-collection search (v1.4.0). Projects declare dependencies via
.socraticode.jsonorSOCRATICODE_LINKED_PROJECTS. WhenincludeLinked: trueis passed, a singlecodebase_searchcall queries all linked projects in parallel, merges results client-side with RRF, and labels each result with its source project. This enables monorepo and multi-repo search from a single agent call. -
Branch-aware collection naming (v1.4.0). When
SOCRATICODE_BRANCH_AWARE=true, the project’s git branch is appended to the collection name hash, giving each branch an isolated index. Linked-project cross-references use the branch-agnostic base hash so inter-repo links remain stable across branches.
Gaps and risks
Section titled “Gaps and risks”- No multi-hop graph traversal.
codebase_graph_queryreturns only direct imports and dependents for a given file (getFileDependencies()ingraph-analysis.ts). There is no equivalent totrace_call_path(multi-hop call graph) ordetect_changes(blast radius from a git diff) as found in codebase-memory-mcp. - Benchmark is a single-scenario, author-run session. Five architectural questions on VS Code (TypeScript-heavy, well-structured) is not a representative sample. Performance on polyglot monorepos, small repos, or dynamically typed Python/Ruby codebases is unreported. No scripted harness exists to reproduce the result.
- “61% token reduction” is bytes, not tokens. The README benchmark table measures raw bytes. The claim that fewer bytes “directly reduces token costs” is unverified — tokenized output depends on the model’s tokenizer and is not proportional to raw bytes for all content types. The actual LLM token savings figure is unknown.
- 2000-character chunk cap is aggressive. Functions longer than ~40 lines at typical column widths will be truncated at the embedding stage. This is not documented prominently and may silently degrade retrieval quality for large classes or generated code.
- BM25 quality is opaque. The
qdrant/bm25text model runs inside the Qdrant container; its tokenization and IDF corpus are not exposed. For non-English identifiers or heavily abbreviated codebases, BM25 quality may degrade unpredictably. - Recursive circular dependency DFS.
findCircularDependencies()ingraph-analysis.tsuses recursive DFS. Thevisitedset prevents infinite loops but call stack depth grows with the longest path; deep monorepo dependency chains may hit Node.js stack limits. - Graph queries are O(nodes) linear scans.
getFileDependencies()usesArray.find()over the full node list. On large repos this will degrade; there is no node index or hash map. - AGPL-3.0 and undisclosed commercial licence terms. Teams that cannot accept AGPL must negotiate a commercial licence whose pricing and conditions are not publicly available.
- RepoEval and SWE-bench figures are unlinked. The README cites recall and accuracy improvements from AST-aware chunking research without DOIs or paper titles. These figures cannot be verified as cited.
Recommendation
Section titled “Recommendation”Adopt for semantic codebase search in privacy-sensitive or air-gapped environments. The zero-config local deployment, hybrid search quality, and context artifact indexing make SocratiCode a strong choice for teams that cannot use cloud-hosted code intelligence. The benchmark figures are not independently reproducible but the underlying mechanism (Qdrant hybrid query with RRF) is sound and the implementation is clean.
Pair with codebase-memory-mcp for structural analysis. SocratiCode’s dependency graph supports only direct import/dependent queries. For multi-hop call graphs, blast radius analysis, or symbol-level structural queries, codebase-memory-mcp remains necessary.
Treat the 61.5% figure as an upper bound measured under favorable conditions (TypeScript monorepo, architectural queries, non-optimized grep baseline). Independent measurement on representative workloads is required before using this figure for resource planning.
Flag AGPL-3.0 for legal review before embedding in any commercial product. The dual-licence path exists but terms are not publicly disclosed.
Source review
Section titled “Source review”Reviewed version: v1.4.1 (vendored at tools/giancarloerra-socraticode/). The original triage targeted v1.3.2.
Architecture: critical path from agent call to token-reduced output
Section titled “Architecture: critical path from agent call to token-reduced output”Agent → MCP tool call: codebase_search(query, limit=10) └─ query-tools.ts: handleQueryTool() ├─ generateQueryEmbedding(query) # embeddings.ts → Ollama/OpenAI/Google ├─ [if includeLinked] searchMultipleCollections() │ ├─ for each collection: searchChunksWithVector() in parallel │ └─ mergeMultiCollectionResults() # client-side RRF (Node.js, RRF_K=60) └─ [single collection] searchChunks() └─ qdrant.query(prefetch=[dense, bm25], fusion: "rrf") # server-side RRF → returns top-N FileChunk[] with filePath, relativePath, content, startLine, endLine, scoreChunk content at return time is the raw stored payload — truncated to 2000 characters at index time by applyCharCap(). The agent receives code snippets with file path and line range, not full file contents, which is the primary mechanism for token reduction.
Data structures
Section titled “Data structures”- Qdrant point (code chunk):
{ id: UUID, vector: { dense: float32[], bm25: { text, model } }, payload: { filePath, relativePath, content, startLine, endLine, language, type, contentHash } } - Qdrant metadata point (project/graph/context metadata):
{ id: SHA256-derived UUID, vector: [0], payload: { collectionName, projectPath, lastIndexedAt, filesTotal, filesIndexed, fileHashes (JSON), indexingStatus } } - CodeGraph:
{ nodes: CodeGraphNode[], edges: CodeGraphEdge[] }— serialized as a single JSON string in the metadata collection payload.CodeGraphNodeholds{ relativePath, language, dependencies[], dependents[] }. - FileChunk:
{ id, filePath, relativePath, content, startLine, endLine, language, type }— the internal representation before upsert.
Key files
Section titled “Key files”| File | Purpose |
|---|---|
src/services/qdrant.ts | All Qdrant operations: upsert, search, metadata CRUD; client-side RRF for multi-collection |
src/services/indexer.ts | Chunking pipeline: char-based, AST-aware, line-based; full/incremental index |
src/services/code-graph.ts | Graph build orchestration, progress tracking, graph cache |
src/services/graph-imports.ts | Per-language import extraction via ast-grep or regex |
src/services/graph-analysis.ts | Graph query: direct deps, circular DFS, stats, Mermaid diagram |
src/services/context-artifacts.ts | Non-code artifact indexing from .socraticodecontextartifacts.json |
src/services/docker.ts | Docker lifecycle: pull images, start/stop/check Qdrant and Ollama containers |
src/services/embedding-config.ts | Embedding provider config: ollama (auto/docker/external), openai, google |
src/config.ts | Project ID hashing, branch-aware naming, linked-project resolution |
src/index.ts | MCP server entry point, tool schema registration |
Test coverage
Section titled “Test coverage”Three test tiers (verified from tests/):
- Unit tests (
tests/unit/, 21 files): pure function tests, no Docker required. Covers chunking, config, constants, graph analysis, graph imports, path resolution, watcher logic. - Integration tests (
tests/integration/, 8 files): require Docker + running Qdrant and Ollama. Covers full index/search/update cycle, context artifacts, embeddings, graph build, tools API. - E2E test (
tests/e2e/full-workflow.test.ts): exercises the complete 12-step lifecycle through the tool handler API with a fixture project.
Test runner: Vitest v4. Tests run sequentially (fileParallelism: false) because Docker resources and Qdrant collections are shared. Timeout: 120s per test.
Comparison hooks (for ANALYSIS.md matrix)
Section titled “Comparison hooks (for ANALYSIS.md matrix)”| Dimension | socraticode |
|---|---|
| Approach | Hybrid dense + BM25 (RRF) via Qdrant; AST-aware chunking; polyglot import graph |
| Compression (vs grep) | 61.5% bytes (verified as bytes measurement, not tokens; single TypeScript repo scenario) |
| Token budget model | None — result set bounded by limit parameter (default 10 chunks, 2000 chars each) |
| Injection strategy | On-demand MCP tool calls; no session-level injection |
| Eviction | N/A — no context injection pipeline |
| Benchmark harness | None — author-run live session documented in README; no scripted repro |
| License | AGPL-3.0 (dual-licence commercial option available, terms undisclosed) |
| Maturity | v1.4.1; actively maintained (last commit 2026-04-12); unit + integration + e2e test suite via Vitest |