danjdewhurst-git-semantic-bun — Benchmark Reproduction
danjdewhurst-git-semantic-bun — Benchmark Reproduction
Section titled “danjdewhurst-git-semantic-bun — Benchmark Reproduction”Source: tools/danjdewhurst-git-semantic-bun/ (pinned: 1743d3e9, 2026-02-26)
Date: 2026-04-13
Environment: not yet run — this is a repro guide only
Outcome: not attempted
Harness location
Section titled “Harness location”Two distinct harnesses are present:
scripts/perf-ci.ts # CI performance regression suitesrc/commands/benchmark.ts # gsb benchmark command (ranking latency + ANN recall)test/performance-baseline.test.ts # Bun unit test: performance smoke testtest/performance-smoke.test.ts # Bun unit test: warm search smoke testnpm scripts (from package.json):
bun run perf:ci # scripts/perf-ci.ts --baseline .github/perf-baseline.json --output perf-artifacts/perf-snapshot.jsonbun run perf:baseline # scripts/perf-ci.ts --write-baseline ... (writes a new baseline)bun test # full test suite including performance smoke testsWhat the harnesses measure
Section titled “What the harnesses measure”bun run perf:ci (scripts/perf-ci.ts)
Section titled “bun run perf:ci (scripts/perf-ci.ts)”- Builds a synthetic 5,000-commit index with 32-dimension fake vectors (
GSB_FAKE_EMBEDDINGS=1). - Runs three suites: cold search (load + embed + search, 15 iterations), warm search (model pre-loaded, 30 iterations), and index load (20 iterations).
- Compares results against
.github/perf-baseline.jsonand fails if any metric exceeds the allowed regression threshold (300% of baseline by default). - Does not test real embedding model throughput — vectors are synthetic.
Committed baseline (.github/perf-baseline.json):
| Suite | p50 ms | p95 ms | mean ms |
|---|---|---|---|
| cold (load+embed+search) | 8.7 | 26.8 | 10.3 |
| warm (model pre-loaded) | 1.5 | 2.3 | 1.6 |
| index load | 4.0 | 6.3 | 4.6 |
Note: these figures use 32-dim fake vectors, not 384-dim Xenova/all-MiniLM-L6-v2 embeddings.
gsb benchmark <query> (src/commands/benchmark.ts)
Section titled “gsb benchmark <query> (src/commands/benchmark.ts)”- Runs against a real indexed repository.
- Measures ranking latency: full O(n log n) sort vs heap O(n log k) top-K.
- With
--ann: measures ANN (HNSW) recall@k vs exact search, and latency speedup. - Supports
--save/--historyto track results inbenchmarks.jsonl. - Does not measure semantic recall — no relevance labels required or used.
Environment requirements
Section titled “Environment requirements”Bun >= 1.3.9 (hard requirement — Bun-native project, not Node-compatible)git (for gsb commands; not needed for perf-ci.ts on its own)usearch (optional — bun add usearch — for ANN benchmark)The perf CI script (bun run perf:ci) creates and destroys a temp directory; no real git repo is needed. The gsb benchmark command requires an initialised and indexed git repository.
How to run the CI performance harness
Section titled “How to run the CI performance harness”cd tools/danjdewhurst-git-semantic-bunbun installbun run perf:ciExpected output (if no regression):
Perf snapshot written: perf-artifacts/perf-snapshot.jsoncold: p50=...ms p95=...ms mean=...mswarm: p50=...ms p95=...ms mean=...msindexLoad: p50=...ms p95=...ms mean=...msPerf guardrails: PASSTo write a new baseline from the current machine:
bun run perf:baselineHow to run the unit test suite
Section titled “How to run the unit test suite”cd tools/danjdewhurst-git-semantic-bunbun installbun testThe test suite includes test/performance-baseline.test.ts and test/performance-smoke.test.ts, which assert on warm search latency using fake embeddings against a small fixture index. The golden ranking test (test/ranking-golden.test.ts) validates the hybrid scoring formula against test/fixtures/ranking-golden.json.
How to run gsb benchmark against a real repo
Section titled “How to run gsb benchmark against a real repo”cd <any git repository>bun add -g github:danjdewhurst/git-semantic-bun # or use local clone
gsb initgsb indexgsb benchmark "fix authentication timeout" -i 50 -n 10gsb benchmark "fix authentication timeout" -i 50 -n 10 --ann # if usearch installedgsb benchmark "fix authentication timeout" --savegsb benchmark --historyRepro status
Section titled “Repro status”Not attempted. This guide covers how to run the harnesses; actual reproduction (comparing measured figures to the committed baseline on the same synthetic dataset) was not performed.
To reproduce, run bun run perf:ci as described above. The baseline in .github/perf-baseline.json was produced on the author’s machine; expect different absolute numbers on different hardware, but relative regressions should be consistent within the same machine.