Benchmark repro guide — n2-arachne
Benchmark Repro Guide: n2-arachne
Section titled “Benchmark Repro Guide: n2-arachne”This document records the state of the benchmark harness for n2-arachne as found in the vendored source at tools/choihyunsus-n2-arachne/.
Harness status
Section titled “Harness status”No executable benchmark harness is available in the vendored source.
The following benchmark script paths are referenced in the repository but are absent from the tree:
| Reference location | Path referenced | Present in repo |
|---|---|---|
CHANGELOG.md (v4.0.0) | test/test-benchmark.js | No |
README.md (Run benchmarks section) | test/bench-hybrid-engine.js | No |
README.md (Run benchmarks section) | test/bench-10mb.js | No |
The package.json test script is echo 'Tests run via CI pipeline' — no runnable test or benchmark is available to the public. The data-hybrid-bench/benchmark-report.json output path referenced in the README also does not exist in the vendored tree.
Claimed benchmark figures (as reported)
Section titled “Claimed benchmark figures (as reported)”All figures below are from README.md and CHANGELOG.md. None have been independently reproduced.
Real-world compression benchmark
Section titled “Real-world compression benchmark”| Metric | Claimed value | Source |
|---|---|---|
| Project size | 3,219 files / 4.68 M tokens | README benchmark table |
| Arachne output | 14,074 tokens | README benchmark table |
| Compression ratio | 333x (99.7% reduction) | README benchmark table |
| Initial index time | 627 ms | README benchmark table |
| Incremental index time | 0 ms | README benchmark table |
| SQLite DB size | 24 MB | README benchmark table |
Benchmark subject: N2 Browser project (the author’s own production project). No independent dataset is provided.
Search engine performance (v4.0)
Section titled “Search engine performance (v4.0)”Hardware: AMD Ryzen 5 5600G, Node v24, Windows x64 (as stated in README).
| Search Mode | Engine | Claimed performance | Notes |
|---|---|---|---|
| Keyword | Rust BM25 (memchr + rayon) | 4.98 ms / query | 1.3x faster than TS fallback |
| Keyword | SQLite LIKE | 0.021 ms / query | DB index path |
| Semantic KNN | sqlite-vec (C++ SIMD) | 29.52 ms / query | 10K × 768D vectors |
| Batch Cosine | Rust (napi-rs) | 4.91 ms / query | Retired — causes GC/OOM at scale |
Note: The README intro callout also states “25ms” for the sqlite-vec scan. This is internally inconsistent with the 29.52 ms figure in the benchmark table. Both values claim the same test conditions (10,000 × 768D vectors). Neither is reproducible without the benchmark scripts.
The BatchCosine (Rust) path is labelled “Legacy” in the README and has been retired from the production code path in v4.0 due to V8 heap OOM on large corpora. The 19.9x speedup figure in the CHANGELOG (and 22.3x in the README table) refers to this retired path.
The headline “1GB codebase search in 0.54 seconds” appears on line 12 of the README but is not supported by any benchmark table entry.
How to run (if scripts become available)
Section titled “How to run (if scripts become available)”The README specifies:
cd tools/choihyunsus-n2-arachnenpm run buildnode test/bench-hybrid-engine.js # Engine comparison (BM25 vs sqlite-vec vs TS)node test/bench-10mb.js # Memory scale testOutput would be written to data-hybrid-bench/benchmark-report.json.
Environment requirements (as stated in README)
Section titled “Environment requirements (as stated in README)”- Node.js >= 18 (tested with Node v24)
- npm or npx
- Ollama running locally on
http://localhost:11434(for semantic/hybrid benchmarks) nomic-embed-textmodel pulled in Ollama- The Rust native module must be built or the prebuilt
.nodemust be compatible with the host platform/arch
Building the Rust module from source
Section titled “Building the Rust module from source”cd tools/choihyunsus-n2-arachne/nativecargo build --release# The .node file will be output to native/ after napi-rs post-buildRequires: Rust toolchain (stable), napi-build crate dependencies.
Reproduction notes
Section titled “Reproduction notes”This guide was written based on source inspection only. The benchmark scripts were not executed. The repro status is:
- Compression claim (333x): cannot reproduce — no harness, no test dataset provided.
- Index time (627 ms): cannot reproduce — no harness, no test dataset provided.
- BM25 search time (4.98 ms): cannot reproduce —
test/bench-hybrid-engine.jsabsent. - sqlite-vec KNN time (25 ms / 29.52 ms): cannot reproduce — same harness absent; figure is internally inconsistent.
- BatchCosine speedup (19.9x / 22.3x): cannot reproduce; path is retired from production code and labelled “Legacy”.
If the benchmark scripts are published by the author, running them on a representative codebase (ideally not the author’s own N2 Browser project) would be the minimum needed to validate the compression and performance claims.