Analysis — osgrep
Summary
Section titled “Summary”osgrep is a Node.js CLI (and partial Claude Code plugin) that indexes a codebase into a local LanceDB vector store and answers natural-language queries against it. Its search pipeline chains three models: Granite 30M for dense 384-dim embeddings, mxbai-edge-colbert-v0 17M (int8 ONNX) for late-interaction reranking, and LanceDB’s FTS for keyword recall. Results are merged via Reciprocal Rank Fusion before two-stage ColBERT reranking. Call-graph traversal (osgrep trace) and structural file views (osgrep skeleton) round out the agent-facing surface.
Critical correction from triage: the README describes an MCP server. Source inspection reveals src/commands/mcp.ts returns tools: [] for ListToolsRequestSchema and "Not implemented" for all tool calls — the MCP server is a scaffold, not a working surface.
What it does (verified from source)
Section titled “What it does (verified from source)”Indexing pipeline
Section titled “Indexing pipeline”Indexing is driven by src/lib/index/syncer.ts. A file walker scans the repo respecting .gitignore and .osgrepignore; files are skipped by extension allowlist, 2 MB size limit, or null-byte detection. Unchanged files are skipped via mtime/size hash in an LMDB metadata cache.
Chunking (src/lib/index/chunker.ts) splits files using web-tree-sitter at function/class boundaries defined per language (src/lib/core/languages.ts). Chunks exceeding 75 lines or 2,000 chars are split with configurable line/char overlap. An anchor chunk is built per file containing imports, exports, and top-level comments as a metadata-rich entry point.
Embedding runs in a process pool (not threads) via piscina-managed workers to isolate ONNX runtime crashes. Two models are loaded from ~/.osgrep/models/:
| Model | ID | Output |
|---|---|---|
| Dense | onnx-community/granite-embedding-30m-english-ONNX | 384-dim float32, mean-pooled |
| ColBERT | ryandono/mxbai-edge-colbert-v0-17m-onnx-int8 | Per-token int8 grid + 48-dim pooled summary |
Each chunk is stored in LanceDB with: vector (384-dim), colbert (Binary int8 token grid), colbert_scale (quantization scale factor), pooled_colbert_48d (48-dim cosine summary), plus metadata: defined_symbols, referenced_symbols, imports, exports, role, complexity, is_exported, parent_symbol, file_skeleton.
Search pipeline
Section titled “Search pipeline”src/lib/search/searcher.ts implements a three-stage retrieval pipeline:
- Dual retrieval: LanceDB vector search (dense ANN) + LanceDB FTS (BM25), both with
PRE_RERANK_K = max(top_k × 5, 500)candidates. - RRF merge (k=60): fuses the two candidate lists into a deduplicated ranked set, truncated to
STAGE1_K = 200. - Stage 1 — cheap cosine filter: ranks by
pooled_colbert_48dcosine similarity; keeps topSTAGE2_K = 40. - Stage 2 — full ColBERT rerank: runs maxSim (
src/lib/workers/colbert-math.ts) on topRERANK_TOP = 20candidates; blends with RRF score atFUSED_WEIGHT = 0.5.
Structural boosting post-rerank:
ORCH(orchestration, high fan-out/complexity) andDEF(type/class definition) chunks receive a mild upward boost.DOCSchunks are penalized 0.6×.- Test/spec paths (directories named
tests/,specs/,benchmark/; files matching.test.tsor.spec.ts) are penalized 0.5× (configurable viaOSGREP_TEST_PENALTY).
Results are deduplicated by ID, then by >50% line-range overlap, then diversified to MAX_PER_FILE = 3 results per file.
All thresholds are overridable via env vars: OSGREP_PRE_K, OSGREP_STAGE1_K, OSGREP_STAGE2_K, OSGREP_RERANK_TOP, OSGREP_RERANK_BLEND, OSGREP_TEST_PENALTY, OSGREP_MAX_PER_FILE.
Language support
Section titled “Language support”Tree-sitter grammars are downloaded on demand to ~/.osgrep/grammars/. Languages with grammars (function/class boundary chunking):
| Language | Extensions |
|---|---|
| TypeScript | .ts |
| TSX | .tsx |
| JavaScript / JSX | .js, .jsx, .mjs, .cjs |
| Python | .py |
| Go | .go |
| Rust | .rs |
| C++ | .cpp, .hpp, .cc, .cxx |
| C | .c, .h |
| Ruby | .rb |
| PHP | .php |
Languages in INDEXABLE_EXTENSIONS but without a grammar (line-based chunking fallback, no AST boundary detection): Java, C#, Swift, Kotlin, Scala, Lua, Dart, Elixir, Clojure, and all config/markup formats (JSON, YAML, TOML, XML, Markdown, SQL, HTML, CSS, shell scripts).
Trace command
Section titled “Trace command”osgrep trace <symbol> builds a 1-hop call graph from the LanceDB index. It queries:
- Callers: chunks where
array_contains(referenced_symbols, symbol)— up to 100 results. - Callees:
referenced_symbolsof the definition chunk forsymbol.
Depth is fixed at 1 hop; there is no multi-hop traversal option. Call edges are derived from the referenced_symbols field populated during chunking, not from a static analysis pass — accuracy depends on symbol resolution quality at index time.
Skeleton command
Section titled “Skeleton command”osgrep skeleton <target> (where target is a file path, symbol name, or free-text query) emits a compressed structural view of a file. Skeletons are stored in the file_skeleton field per chunk at index time; src/lib/skeleton/retriever.ts reads them from cache. Fresh generation falls back to src/lib/skeleton/skeletonizer.ts. Output includes function/class signatures with role annotations (ORCH, DEF, IMPL) and elided bodies. Token estimate uses Math.ceil(length / 4) — a character-count heuristic, not an actual tokenizer.
MCP server — confirmed stub
Section titled “MCP server — confirmed stub”src/commands/mcp.ts registers with the MCP SDK but returns:
tools: []forListToolsRequestSchema{ content: [{ type: "text", text: "Not implemented" }], isError: true }for allCallToolRequestSchemacalls
The server boots, connects via stdio transport, and starts a background sync — but exposes zero callable tools. This contradicts the README’s plugin section. The opencode plugin (src/commands/opencode.ts) and Claude Code plugin (src/commands/claude-code.ts) are separate CLI entrypoints that inject context via prompt injection, not MCP.
Internal eval harness
Section titled “Internal eval harness”src/eval.ts defines 70+ EvalCase records with { query, expectedPath } pairs — each asserting a specific source file should rank first for a given natural-language query. This is a retrieval MRR regression suite run against osgrep’s own codebase. It is not the benchmark referenced in the README (which covers token usage on a third-party codebase).
Benchmark claims — verified vs as-reported
Section titled “Benchmark claims — verified vs as-reported”Source reviewed; harness not executed.
| Claim | Value | Scope | Assessment |
|---|---|---|---|
| Token reduction | ~20% | Cost proxy, 10 queries | benchmark_opencode.csv: avg baseline $0.36 vs osgrep $0.29 (−19%). 10 queries against the opencode codebase. Methodology matches claim directionally. |
| Speedup | ~30% | Response time, 10 queries | Same CSV: avg baseline 157 s vs osgrep 100 s (−36%). Consistent with claim. |
| osgrep wins | 6/10 queries | Per-query winner column | Confirmed from CSV. Baseline wins 2/10; 2 ties. |
| Cost sometimes increases | e.g. row 2 | 2/10 queries cost more with osgrep | CSV shows 3 queries where osgrep cost ≥ baseline cost. Not mentioned in README. |
| Benchmark corpus | opencode codebase | Single third-party repo | Corpus is not identified in README. Confirmed from CSV column headers and query content. |
| No quality evaluation | — | Answers not judged for correctness | CSV records only time and cost. Whether osgrep answers were correct or equivalent to baseline is not assessed. |
| Internal MRR harness | eval.ts | osgrep’s own codebase | A 70+ case retrieval regression suite exists, testing that osgrep’s search ranks correct source files for dev queries — independent of the README benchmark. |
Verdict: the 20%/30% headline figures are directionally supported by the CSV data, but the benchmark is a 10-query, single-codebase, cost-only evaluation with no answer quality assessment. The figures are real observations, not fabricated, but are not sufficient for a reproducible performance claim.
Architectural assessment
Section titled “Architectural assessment”Strengths
Section titled “Strengths”-
Three-model search stack is source-confirmed and implemented. Dense (Granite 30M) + FTS + ColBERT (mxbai-edge-colbert-v0 17M) reranking is fully wired in
searcher.tsandorchestrator.ts. The two-stage rerank (cheap pooled cosine → full maxSim) is a practical efficiency tradeoff. -
Per-chunk symbol metadata enables call-graph queries without a separate graph store.
defined_symbolsandreferenced_symbolsare stored per chunk in LanceDB, makingosgrep tracepossible without a second database. Scope is limited to 1-hop but the mechanism is sound. -
All pipeline thresholds are env-var overridable.
OSGREP_PRE_K,OSGREP_STAGE1_K,OSGREP_STAGE2_K,OSGREP_RERANK_TOP,OSGREP_RERANK_BLEND,OSGREP_TEST_PENALTY,OSGREP_MAX_PER_FILE— tunable without code changes. -
Internal regression harness exists. The 70+ MRR eval cases in
eval.tsprovide a repeatable signal for search quality changes against the codebase itself. -
ONNX process pool is crash-isolated. Workers run as child processes (not threads), so a stuck or crashing ONNX session doesn’t hang the whole CLI. SIGTERM/SIGKILL fallback confirmed in
pool.ts.
Weaknesses / risks
Section titled “Weaknesses / risks”-
MCP server is a non-functional stub. The README’s MCP section is misleading. Zero tools are callable via MCP as of commit 9f2faf7. This removes the primary integration path for many agent frameworks.
-
Grammar coverage gap vs. indexable extension list. Java, C#, Swift, Kotlin, Scala and all config/markup formats lack tree-sitter grammars and fall back to line-based chunking. Boundary quality for these is significantly worse.
-
Trace depth is fixed at 1 hop. There is no
--depthflag; multi-hop impact analysis is not supported. -
Token estimate in skeleton is a character heuristic.
Math.ceil(length / 4)is reported as a token count but is not based on a real tokenizer. Users relying on this figure for context budget decisions will get inaccurate numbers. -
Benchmark covers one codebase, 10 queries, no quality assessment. Generalizability is unknown. Cost savings could reverse on different codebases or query types.
-
Maintenance pace. Last commit 2026-01-17 — ~3 months stale. The MCP stub suggests the tool is mid-development. Significant advertised features (MCP) remain unimplemented.
Recommendation
Section titled “Recommendation”Use when:
- Semantic code search via CLI is the primary requirement and no external service (Qdrant, Docker) is acceptable.
- The richest hybrid retrieval pipeline (FTS + dense + two-stage ColBERT reranking) is needed in a single npm install.
osgrep skeleton(compressed structural views with role annotations) orosgrep trace(1-hop call-graph) are the specific agent primitives required.
Avoid when:
- MCP framework integration is required — the MCP server is a non-functional stub as of
9f2faf7; zero tools are callable. - The codebase is primarily Java, C#, Swift, or Kotlin — these lack tree-sitter grammars and fall back to line-based chunking with significantly lower boundary quality.
- Reproducible performance evidence is required before adoption — the benchmark harness (
run-benchmark.sh,src/bench/) is missing from the repo. - Active maintenance is required — last commit 2026-01-17, ~3 months stale.
vs socraticode: osgrep’s pipeline is richer (two-stage ColBERT reranking vs single RRF pass) and requires no external services; socraticode requires Docker + Qdrant but its MCP server is functional. Prefer osgrep for standalone CLI use; prefer socraticode if Qdrant is already in the stack.
vs codebase-memory-mcp: codebase-memory-mcp stores a graph structure enabling relationship traversal across symbols; osgrep is a flat vector index with no graph edges. Prefer codebase-memory-mcp when graph queries (call-graph depth > 1, dependency paths) are needed.
vs serena: serena uses LSP for symbol-level precision and supports editing; osgrep offers broader language indexing via embeddings but lower precision on symbol boundaries. Prefer serena for polyglot refactoring workflows requiring exact edits; prefer osgrep for free-text code discovery.
Open questions after source review
Section titled “Open questions after source review”- Skeleton freshness:
file_skeletonis stored at index time. If a file changes between re-indexes, the cached skeleton will be stale. The validator invalidateSchemachecks forcomplexityandis_exportedonly; no skeleton staleness check found. - ColBERT model provenance:
ryandono/mxbai-edge-colbert-v0-17m-onnx-int8is a community-uploaded int8 quantization of mxbai-edge-colbert. Licensing and quantization quality relative to the official mxbai release are not documented. - Symbol resolution quality:
referenced_symbolsis populated during chunking from tree-sitter. The accuracy of cross-file call edges (e.g. method calls on imported types) is undocumented and likely incomplete. - Daemon (
osgrep serve) fault tolerance: the source was not inspected in this pass; restart policy and socket/port behavior are unknown. - MCP roadmap: no issue or PR found discussing MCP implementation status. Whether this is planned, deprioritized, or abandoned is unclear.
Comparison hooks (for ANALYSIS.md matrix)
Section titled “Comparison hooks (for ANALYSIS.md matrix)”- Retrieval model: FTS (LanceDB BM25) + dense Granite 30M (384-dim) + mxbai-edge-colbert-v0 17M int8 late-interaction reranking; merged via RRF (k=60); two-stage rerank (pooled cosine filter → full maxSim).
- Token saving scope: ~20% cost / ~30% time (as reported; 10-query, single-codebase CSV, cost-proxy only; no answer quality assessment; benchmark harness absent from repo).
- Storage: LanceDB (embedded, per-repo isolation derived from git remote URL; no external service or server).
- Models: local ONNX, ~150 MB download to
~/.osgrep/models/; no API keys required. - MCP status: stub —
ListToolsRequestSchemareturnstools: []; allCallToolRequestSchemacalls return"Not implemented"(confirmed fromsrc/commands/mcp.ts, commit9f2faf7). - Language coverage gap: 10 tree-sitter grammars (TS, JS, Python, Go, Rust, C++, C, Ruby, PHP, TSX); Java, C#, Swift, Kotlin, Scala and all config/markup formats fall back to line-based chunking.
- Agentic primitives:
osgrep skeleton(compressed structural view, role annotations ORCH/DEF/IMPL, LanceDB-cached);osgrep trace(1-hop call graph fromdefined_symbols/referenced_symbolsfields). - Maintenance signal: last push 2026-01-17 (~3 months stale at analysis date); 1,128 stars; Apache-2.0.
- Source-corrected: README describes MCP server as a working integration — source confirms it is a non-functional scaffold returning zero tools.