Analysis — socraticode

ANALYSIS: socraticode

Summary

SocratiCode is a TypeScript MCP server that wraps Qdrant’s native hybrid query API (dense vector + BM25 sparse vector, fused with RRF) behind an auto-managed Docker deployment and an AST-aware chunking pipeline powered by ast-grep. The hybrid search mechanism is verified from source: qdrant.ts issues a single query() call with two prefetch legs (dense cosine + BM25 text model) and fusion: "rrf". AST-aware chunking is also verified: indexer.ts dispatches 18+ language grammars via @ast-grep/napi to extract function/class-level declaration boundaries before falling back to line-based chunking for unsupported languages.

The 61.5% token reduction claim is directionally plausible but unverified independently: the README benchmark table header explicitly labels the columns “bytes” (not tokens), confirming the prior finding that this figure measures raw bytes exchanged during a live Claude Opus 4.6 session — not LLM input tokens. The README itself states “61.5% less data consumed” and separately notes this “directly reduces token costs,” but token costs are not measured directly. No benchmark harness is committed to the repository. The polyglot dependency graph (20+ languages with full AST support) and the non-code context artifact indexing are genuine differentiators absent from competing tools in this survey.

The vendored source is v1.4.1 (not v1.3.2 as originally triaged). Version 1.4.0 added three significant features after the original triage: linked-project multi-collection search with client-side RRF fusion, branch-aware collection naming, and JVM multi-module import resolution. These are documented below.

What it does (verified from source)

Core mechanism

Hybrid search pipeline (verified from `src/services/qdrant.ts`)

SocratiCode delegates vector storage entirely to Qdrant. Each indexed chunk is upserted as a Qdrant point with two vector fields:

dense: a float32 cosine vector generated by the local embedding provider (default: nomic-embed-text via Ollama, 768 dimensions).
bm25: a Qdrant-native sparse vector populated server-side via the qdrant/bm25 model; text is passed as a payload field at upsert time and scored against BM25-IDF at query time.

At query time, searchChunks() issues a single Qdrant query() call with two prefetch legs (each fetching max(limit * 3, 30) candidates) and query: { fusion: "rrf" }. RRF merging and result deduplication happen inside Qdrant, not in the Node.js process. This means single-collection RRF is a thin client wrapper around Qdrant’s built-in hybrid query feature introduced in Qdrant v1.15.2 — not a custom implementation.

Multi-collection search (verified from src/services/qdrant.ts, added in v1.4.0): When includeLinked: true is passed to codebase_search, searchMultipleCollections() queries each linked-project collection independently in parallel, sharing a single pre-computed dense embedding vector, then merges results client-side using a custom RRF implementation (mergeMultiCollectionResults(), RRF_K = 60). This second RRF layer runs in Node.js, not in Qdrant, and deduplicates by label::relativePath key. This is a genuine custom RRF implementation, not delegated to Qdrant.

AST-aware chunking (verified from `src/services/indexer.ts`)

chunkFileContent() dispatches through three strategies in order:

Character-based chunking for minified or bundled content (detected by average line length exceeding MAX_AVG_LINE_LENGTH = 500 characters). Splits at safe token boundaries (newline, space, tab, semicolon, comma).
AST-aware chunking (findAstBoundaries()) when a grammar is available. Uses @ast-grep/napi with per-language TOP_LEVEL_KINDS maps (function declarations, class declarations, interface/type/enum declarations, etc.) across 18+ languages. Declaration regions are extracted, merged to avoid overlaps, then packed into chunks that respect a soft minimum (MIN_CHUNK_LINES = 5) and hard maximum (MAX_CHUNK_LINES = 150) line count. Small adjacent declarations are merged; large ones are sub-chunked with CHUNK_SIZE = 100 lines and CHUNK_OVERLAP = 10 lines. A preamble region (imports, constants, comments before the first declaration) and an epilogue region (code after the last declaration) are emitted as their own chunks.
Line-based fallback for unsupported file types, using the same CHUNK_SIZE/CHUNK_OVERLAP parameters.

All three strategies apply a hard character cap (applyCharCap(), MAX_CHUNK_CHARS = 2000) as a final safety net before embedding. Note: 2000 characters is a tight cap that may truncate large function bodies at the embedding stage.

Polyglot dependency graph (verified from `src/services/code-graph.ts` and `src/services/graph-imports.ts`)

buildCodeGraph() walks the project tree using the same ignore filter as the indexer, then calls extractImports() per file via the ast-grep grammar when available, or per-language regex fallbacks (Dart, Lua, R, TOML) when not. Supported languages with AST-based extraction include JS/TS/TSX, Python, Java, Kotlin, Scala, Go, Rust, C#, PHP, Ruby, Swift, C/C++, Bash, CSS/SCSS/Stylus, HTML, Svelte, and Vue. Svelte and Vue <script> blocks are re-parsed as TypeScript; CSS @import is extracted from <style> blocks. TypeScript path aliases (tsconfig.json/jsconfig.json compilerOptions.paths, including extends chains) are resolved via a separate loadPathAliases() pass. The resulting CodeGraph (nodes + edges) is serialized as a JSON payload in the Qdrant socraticode_metadata collection — not a separate graph collection.

Context artifacts (verified from `src/services/context-artifacts.ts`)

Users declare non-code artifacts (database schemas, OpenAPI specs, Terraform configs, architecture docs) in .socraticodecontextartifacts.json. Each artifact path is globbed, chunked, and embedded into a separate Qdrant collection (context_{projectId}) using the same hybrid dense + BM25 approach as code search. Staleness detection via content hashing triggers automatic re-indexing on next search.

Interface / API

Thirteen MCP tools (verified from src/tools/ and src/index.ts):

codebase_index — full index with optional path, file extension, and watcher configuration.
codebase_update — incremental re-index of changed files only.
codebase_status — index health, chunk count, phase progress, active watcher state.
codebase_search — hybrid semantic + BM25 search with optional fileFilter, languageFilter, minScore, and includeLinked (v1.4.0+: cross-project search against linked collections).
codebase_stop — graceful cancellation of an in-flight indexing operation (stops at next batch boundary).
codebase_graph_build / codebase_graph_query / codebase_graph_stats / codebase_graph_circular / codebase_graph_visualize / codebase_graph_status / codebase_graph_remove — dependency graph lifecycle and query.
codebase_context / codebase_context_search / codebase_context_index — context artifact management.
codebase_watch — start/stop filesystem watcher (debounced 2s, via @parcel/watcher).
codebase_list_projects / codebase_remove — multi-project management.
codebase_health / codebase_about — diagnostics.

Dependencies

Runtime (from package.json, verified — v1.4.1):

@ast-grep/napi ^0.40.5 and 13 language extension packages (@ast-grep/lang-bash, -c, -cpp, -csharp, -go, -java, -kotlin, -php, -python, -ruby, -rust, -scala, -swift) — AST parsing.
@qdrant/js-client-rest ^1.17.0 — Qdrant REST client. The pinned Docker image is qdrant/qdrant:v1.17.0, which is newer than the v1.15.2 minimum required for BM25 hybrid queries.
ollama ^0.5.14, openai ^6.22.0, @google/generative-ai ^0.24.1 — embedding backends.
@modelcontextprotocol/sdk ^1.26.0 — MCP server.
@parcel/watcher ^2.5.6 — cross-platform filesystem watching.
proper-lockfile ^4.1.2 — cross-process file locking for multi-agent index coordination.
glob ^11.0.1, ignore ^7.0.3 — file traversal and gitignore handling.
zod ^3.24.2 — runtime validation.

Infrastructure (default deployment): Docker daemon with two auto-managed containers — Qdrant (vector store) and Ollama (embedding server). Both containers are started automatically on first use; no manual configuration is required. Ollama mode defaults to auto: probes localhost:11434 first (native Ollama, GPU-accelerated on Mac/Windows); falls back to a Docker container on port 11435 if no native instance is found. Qdrant always requires Docker in managed mode; QDRANT_MODE=external enables self-hosted or cloud Qdrant.

Scope / limitations

Static analysis only: no runtime tracing or dynamic call graph edges.
Qdrant v1.15.2+ required: BM25 hybrid query support is a relatively recent Qdrant feature. The auto-managed Docker container pins qdrant/qdrant:v1.17.0; self-hosted Qdrant instances must be at v1.15.2 or later.
Graph build is asynchronous: buildCodeGraph() runs in the background and requires polling codebase_graph_status for completion on large repos. No streaming progress.
Recursive DFS for circular dependency detection: findCircularDependencies() in graph-analysis.ts uses a recursive DFS. It is guarded by a visited set (preventing re-entry) but the call stack grows proportionally to the longest dependency chain and will overflow on very deep cycles in large monorepos.
Graph stored as JSON payload in Qdrant metadata collection, not as a graph database: codebase_graph_query returns only direct imports/dependents for a given file path — there is no multi-hop traversal equivalent to trace_call_path in codebase-memory-mcp.
Docker is required for the default mode: a pure in-process fallback is not available in v1.4.1. A native Ollama install can replace the Ollama container (auto-detected); Qdrant always requires Docker or a self-hosted external instance.
AGPL-3.0: commercial embedding in proprietary products without source disclosure requires a separate commercial licence. The repo ships LICENSE-COMMERCIAL but does not link to pricing or terms.
Hard 2000-character chunk cap: MAX_CHUNK_CHARS = 2000 truncates all chunk payloads regardless of strategy. Large function bodies (common in generated code, tests, or verbose languages) may be truncated before embedding.
BM25 text also capped at 32,000 characters (MAX_BM25_TEXT_CHARS) before being forwarded to Qdrant’s server-side tokenizer. This is a separate limit from the chunk content cap.

Benchmark claims — verified vs as-reported

Metric	Value	Status
”Token” reduction vs grep baseline	61.5% (250,510 → 96,485 bytes across 5 questions)	partially verified — byte figures confirmed from README table; README labels columns “bytes” not “tokens”; claim that this “directly reduces token costs” is inference, not measurement
Tool call reduction	84% (31 → 5 calls across 5 questions)	as reported
Search latency	60–90 ms vs 2–3.5 s (grep)	as reported
Test repo	VS Code, 2.45M lines, 5,300+ files, 55,437 chunks	as reported (README); no independent measurement
Benchmark model	Claude Opus 4.6 (live session, not scripted harness)	verified — README states “tested live with Claude Opus 4.6”
Scripted benchmark harness exists	No — no benchmark/, eval/, or evals/ directory; no benchmark script	verified from source tree
Hybrid search implemented as described	Yes — Qdrant `prefetch` + `fusion: "rrf"` for single-collection queries	verified from `src/services/qdrant.ts`
AST-aware chunking implemented as described	Yes — `ast-grep` boundaries + char/line fallbacks; 2000-char hard cap	verified from `src/services/indexer.ts`
RRF implemented client-side (single collection)	No — delegated to Qdrant’s built-in query API	verified from source
RRF implemented client-side (multi-collection, v1.4.0+)	Yes — `mergeMultiCollectionResults()` with `RRF_K = 60` in Node.js	verified from `src/services/qdrant.ts`
Docker required	Yes — Qdrant always requires Docker (or external QDRANT_URL); Ollama is auto-detected and Docker is the fallback only if native Ollama absent	verified from `src/services/startup.ts`, `src/constants.ts`
SQLite for local mode	No — all persistence is in Qdrant; no SQLite anywhere in the codebase	verified — the triage claim of “SQLite + in-process HNSW” is incorrect
RepoEval / SWE-bench citations	Referenced in README but no direct paper links provided	unverifiable as cited

Key correction from prior triage: The original triage described a “local (SQLite + in-process HNSW)” mode. This is incorrect. Qdrant is the sole persistence layer for all data (chunks, graph, metadata, context artifacts). There is no SQLite or in-process HNSW in the codebase. The only “local” option is Docker-managed Qdrant.

The benchmark measures raw bytes (not LLM tokens) exchanged between Claude and the tools during a live session. The README table column headers are explicitly labeled “Grep (bytes)” and “SocratiCode (bytes)”. The summary line claims “61.5% less data consumed — The AI agent processes ~150KB less context, which directly reduces token costs with any LLM” — this equates bytes with token costs without measuring actual tokenizer output. The grep baseline uses grep -rl to discover files then reads them in 200-line chunks — realistic but not adversarially optimized. A focused ripgrep or targeted file-read approach would consume fewer bytes, making the real savings vs optimized grep lower than 61.5%. The five questions (workspace trust, diff editor, extension lifecycle, terminal shells, command palette) are architectural queries well-suited to semantic search — they deliberately favor hybrid retrieval over exact-match grep.

Architectural assessment

What’s genuinely novel

Qdrant as the sole backing store for all indexes. Code chunks, dependency graph, context artifacts, and project metadata all use a single Qdrant instance. Chunks land in codebase_{projectId} collections; graph and metadata land as JSON payloads in the socraticode_metadata collection. This eliminates the SQLite + separate vector DB split common in competing tools. The tradeoff is Docker as a hard infrastructure dependency.
Zero-config auto-provisioning with intelligent Ollama detection. The server probes localhost:11434 for a native Ollama install first; if found, it uses it (GPU-accelerated on Mac/Windows). If not, it pulls Docker images, starts Qdrant and Ollama containers, and downloads nomic-embed-text on first run with no user action required. Among tools in this survey this is the lowest-friction stateful local deployment.
Context artifact indexing as a first-class feature. Treating database schemas, OpenAPI specs, Terraform configs, and architecture docs as searchable, hybrid-indexed artifacts alongside code is a distinct capability not present in any other tool in this survey. This directly addresses the common agent failure mode of lacking schema context when writing queries or migrations.
Multi-agent coordination via file locking. proper-lockfile coordinates cross-process access so multiple concurrent agent sessions share one index without corruption. One session indexes; all sessions search; stale locks are reclaimed automatically.
Svelte and Vue import extraction. Re-parsing <script> blocks as TypeScript and extracting CSS @import from <style> blocks covers frontend framework files that simpler regex-based import extractors miss.
Linked-project cross-collection search (v1.4.0). Projects declare dependencies via .socraticode.json or SOCRATICODE_LINKED_PROJECTS. When includeLinked: true is passed, a single codebase_search call queries all linked projects in parallel, merges results client-side with RRF, and labels each result with its source project. This enables monorepo and multi-repo search from a single agent call.
Branch-aware collection naming (v1.4.0). When SOCRATICODE_BRANCH_AWARE=true, the project’s git branch is appended to the collection name hash, giving each branch an isolated index. Linked-project cross-references use the branch-agnostic base hash so inter-repo links remain stable across branches.

Gaps and risks

No multi-hop graph traversal. codebase_graph_query returns only direct imports and dependents for a given file (getFileDependencies() in graph-analysis.ts). There is no equivalent to trace_call_path (multi-hop call graph) or detect_changes (blast radius from a git diff) as found in codebase-memory-mcp.
Benchmark is a single-scenario, author-run session. Five architectural questions on VS Code (TypeScript-heavy, well-structured) is not a representative sample. Performance on polyglot monorepos, small repos, or dynamically typed Python/Ruby codebases is unreported. No scripted harness exists to reproduce the result.
“61% token reduction” is bytes, not tokens. The README benchmark table measures raw bytes. The claim that fewer bytes “directly reduces token costs” is unverified — tokenized output depends on the model’s tokenizer and is not proportional to raw bytes for all content types. The actual LLM token savings figure is unknown.
2000-character chunk cap is aggressive. Functions longer than ~40 lines at typical column widths will be truncated at the embedding stage. This is not documented prominently and may silently degrade retrieval quality for large classes or generated code.
BM25 quality is opaque. The qdrant/bm25 text model runs inside the Qdrant container; its tokenization and IDF corpus are not exposed. For non-English identifiers or heavily abbreviated codebases, BM25 quality may degrade unpredictably.
Recursive circular dependency DFS. findCircularDependencies() in graph-analysis.ts uses recursive DFS. The visited set prevents infinite loops but call stack depth grows with the longest path; deep monorepo dependency chains may hit Node.js stack limits.
Graph queries are O(nodes) linear scans. getFileDependencies() uses Array.find() over the full node list. On large repos this will degrade; there is no node index or hash map.
AGPL-3.0 and undisclosed commercial licence terms. Teams that cannot accept AGPL must negotiate a commercial licence whose pricing and conditions are not publicly available.
RepoEval and SWE-bench figures are unlinked. The README cites recall and accuracy improvements from AST-aware chunking research without DOIs or paper titles. These figures cannot be verified as cited.

Recommendation

Adopt for semantic codebase search in privacy-sensitive or air-gapped environments. The zero-config local deployment, hybrid search quality, and context artifact indexing make SocratiCode a strong choice for teams that cannot use cloud-hosted code intelligence. The benchmark figures are not independently reproducible but the underlying mechanism (Qdrant hybrid query with RRF) is sound and the implementation is clean.

Pair with codebase-memory-mcp for structural analysis. SocratiCode’s dependency graph supports only direct import/dependent queries. For multi-hop call graphs, blast radius analysis, or symbol-level structural queries, codebase-memory-mcp remains necessary.

Treat the 61.5% figure as an upper bound measured under favorable conditions (TypeScript monorepo, architectural queries, non-optimized grep baseline). Independent measurement on representative workloads is required before using this figure for resource planning.

Flag AGPL-3.0 for legal review before embedding in any commercial product. The dual-licence path exists but terms are not publicly disclosed.

Source review

Reviewed version: v1.4.1 (vendored at tools/giancarloerra-socraticode/). The original triage targeted v1.3.2.

Architecture: critical path from agent call to token-reduced output

Agent → MCP tool call: codebase_search(query, limit=10)
  └─ query-tools.ts: handleQueryTool()
       ├─ generateQueryEmbedding(query)          # embeddings.ts → Ollama/OpenAI/Google
       ├─ [if includeLinked] searchMultipleCollections()
       │     ├─ for each collection: searchChunksWithVector() in parallel
       │     └─ mergeMultiCollectionResults()    # client-side RRF (Node.js, RRF_K=60)
       └─ [single collection] searchChunks()
             └─ qdrant.query(prefetch=[dense, bm25], fusion: "rrf")  # server-side RRF
  → returns top-N FileChunk[] with filePath, relativePath, content, startLine, endLine, score

Chunk content at return time is the raw stored payload — truncated to 2000 characters at index time by applyCharCap(). The agent receives code snippets with file path and line range, not full file contents, which is the primary mechanism for token reduction.

Data structures

Qdrant point (code chunk): { id: UUID, vector: { dense: float32[], bm25: { text, model } }, payload: { filePath, relativePath, content, startLine, endLine, language, type, contentHash } }
Qdrant metadata point (project/graph/context metadata): { id: SHA256-derived UUID, vector: [0], payload: { collectionName, projectPath, lastIndexedAt, filesTotal, filesIndexed, fileHashes (JSON), indexingStatus } }
CodeGraph: { nodes: CodeGraphNode[], edges: CodeGraphEdge[] } — serialized as a single JSON string in the metadata collection payload. CodeGraphNode holds { relativePath, language, dependencies[], dependents[] }.
FileChunk: { id, filePath, relativePath, content, startLine, endLine, language, type } — the internal representation before upsert.

Key files

File	Purpose
`src/services/qdrant.ts`	All Qdrant operations: upsert, search, metadata CRUD; client-side RRF for multi-collection
`src/services/indexer.ts`	Chunking pipeline: char-based, AST-aware, line-based; full/incremental index
`src/services/code-graph.ts`	Graph build orchestration, progress tracking, graph cache
`src/services/graph-imports.ts`	Per-language import extraction via ast-grep or regex
`src/services/graph-analysis.ts`	Graph query: direct deps, circular DFS, stats, Mermaid diagram
`src/services/context-artifacts.ts`	Non-code artifact indexing from `.socraticodecontextartifacts.json`
`src/services/docker.ts`	Docker lifecycle: pull images, start/stop/check Qdrant and Ollama containers
`src/services/embedding-config.ts`	Embedding provider config: ollama (auto/docker/external), openai, google
`src/config.ts`	Project ID hashing, branch-aware naming, linked-project resolution
`src/index.ts`	MCP server entry point, tool schema registration

Test coverage

Three test tiers (verified from tests/):

Unit tests (tests/unit/, 21 files): pure function tests, no Docker required. Covers chunking, config, constants, graph analysis, graph imports, path resolution, watcher logic.
Integration tests (tests/integration/, 8 files): require Docker + running Qdrant and Ollama. Covers full index/search/update cycle, context artifacts, embeddings, graph build, tools API.
E2E test (tests/e2e/full-workflow.test.ts): exercises the complete 12-step lifecycle through the tool handler API with a fixture project.

Test runner: Vitest v4. Tests run sequentially (fileParallelism: false) because Docker resources and Qdrant collections are shared. Timeout: 120s per test.

Comparison hooks (for ANALYSIS.md matrix)

Dimension	socraticode
Approach	Hybrid dense + BM25 (RRF) via Qdrant; AST-aware chunking; polyglot import graph
Compression (vs grep)	61.5% bytes (verified as bytes measurement, not tokens; single TypeScript repo scenario)
Token budget model	None — result set bounded by `limit` parameter (default 10 chunks, 2000 chars each)
Injection strategy	On-demand MCP tool calls; no session-level injection
Eviction	N/A — no context injection pipeline
Benchmark harness	None — author-run live session documented in README; no scripted repro
License	AGPL-3.0 (dual-licence commercial option available, terms undisclosed)
Maturity	v1.4.1; actively maintained (last commit 2026-04-12); unit + integration + e2e test suite via Vitest