Benchmarks
About These Benchmarks
Section titled “About These Benchmarks”The data in this document comes from real-world usage across hundreds of hours of development work with Claude Desktop and Claude Code on production codebases. This includes Go projects (50,000+ lines), React/TypeScript frontends, database migrations, and log file analysis. Numbers represent typical results, not best-case scenarios.
For a deeper discussion of token optimization strategies and session-level cost analysis, see Performance and Tokens.
Throughput
Section titled “Throughput”Operations Per Second
Section titled “Operations Per Second”| Metric | Value | Conditions |
|---|---|---|
| Sustained throughput | 2,016 ops/sec | Mixed read/write/edit workload |
| Peak throughput | 3,500+ ops/sec | Read-heavy workload, all files cached |
| Concurrent operations | 8-16 | Depends on --parallel-ops setting |
Measured on a standard development machine (Intel i7, NVMe SSD, 32 GB RAM). Performance scales with hardware — faster storage and more CPU cores yield proportionally higher throughput.
Latency by Operation Type
Section titled “Latency by Operation Type”| Operation | Cached | Uncached |
|---|---|---|
| File read (small, < 100 KB) | < 1 ms | 1-5 ms |
| File read (medium, 100-500 KB) | < 1 ms | 5-15 ms |
| File read (large, 500 KB - 5 MB) | 1-3 ms | 15-50 ms |
| Directory listing | < 1 ms | 2-10 ms |
| Search (simple) | 1-5 ms | 10-50 ms |
| Search (regex with context) | 5-20 ms | 20-100 ms |
| File edit (small) | 2-5 ms | 5-15 ms |
| File write (small) | 3-8 ms | 5-15 ms |
Cached latency is under 1 ms for most operations because the 3-tier caching system serves content directly from memory.
Cache Performance
Section titled “Cache Performance”Hit Rates
Section titled “Hit Rates”| Metric | Value |
|---|---|
| Cache hit rate (cold start) | 0% |
| Cache hit rate (after 5 minutes) | 85-90% |
| Cache hit rate (after warmup, typical session) | 98.9% |
| Cache hit rate (read-heavy sessions) | 99%+ |
High hit rates are typical because AI agents tend to read the same files multiple times during a session — checking context, verifying changes, reading related files, etc.
Cache Configuration vs. Memory
Section titled “Cache Configuration vs. Memory”| Configuration | Cache Size | Typical Process Memory | Notes |
|---|---|---|---|
| Minimal | 50 MB | 50-80 MB | Small projects, constrained environments |
| Default | 100 MB | 80-120 MB | Most development work |
| Large | 200 MB | 150-250 MB | Large codebases |
| Enterprise | 500 MB | 200-400 MB | Enterprise projects, log analysis |
Memory usage is stable — the cache has a fixed maximum size and evicts old entries automatically using LRU (Least Recently Used) policy.
Cache Tiers
Section titled “Cache Tiers”The 3-tier caching system uses different strategies for different data types:
| Tier | Technology | TTL | What It Caches |
|---|---|---|---|
| File content | BigCache | 3 minutes | Full file contents (read operations) |
| Directory listings | go-cache | 2 minutes | Directory entries (list operations) |
| File metadata | go-cache | 10 minutes | File size, modification time, permissions |
Cache invalidation is handled by fsnotify file watching — when a file changes on disk, its cache entry is invalidated immediately.
Memory Usage
Section titled “Memory Usage”Base Memory
Section titled “Base Memory”| Component | Memory |
|---|---|
| Go runtime + server framework | ~15 MB |
| Worker pool (ants) | ~2 MB |
| Compiled regex cache | ~1 MB |
| Base overhead | ~20 MB |
Under Load
Section titled “Under Load”| Scenario | Total Memory |
|---|---|
| Idle (no operations) | ~20 MB |
| Active session, default cache (100 MB) | 80-120 MB |
| Active session, large cache (200 MB) | 150-250 MB |
| Peak during streaming operation | ~100 MB above baseline |
| Pipeline with 20 steps, 50 files | 120-180 MB |
Memory usage during streaming operations is bounded by the 64 KB buffer size — even when processing a 50 MB file, only 64 KB of buffer memory is allocated at any point.
File Size Threshold Performance
Section titled “File Size Threshold Performance”Each I/O strategy is optimized for its file size tier:
| File Size | Strategy | Read Time | Edit Time | Memory Overhead |
|---|---|---|---|---|
| 10 KB | Direct I/O | < 1 ms | 2 ms | ~10 KB |
| 50 KB | Direct I/O | < 1 ms | 3 ms | ~50 KB |
| 100 KB | Streaming | 1 ms | 5 ms | ~64 KB buffer |
| 300 KB | Streaming | 2 ms | 8 ms | ~64 KB buffer |
| 1 MB | Chunked | 5 ms | 15 ms | ~64 KB buffer |
| 5 MB | Chunked | 20 ms | 50 ms | ~64 KB buffer |
| 10 MB | Special | 40 ms | 100 ms | ~64 KB buffer |
| 50 MB | Special | 200 ms | Rejected | ~64 KB buffer |
Files above 50 MB are rejected for edit operations to prevent accidental massive changes. Read operations continue to work at any file size.
The thresholds (100 KB, 500 KB, 5 MB, 50 MB) are compiled constants in core/config.go. They were chosen based on benchmarking across typical development workloads — source code files, configuration files, data files, and logs.
Token Reduction
Section titled “Token Reduction”Compact Mode
Section titled “Compact Mode”With --compact-mode enabled:
| Operation | Without Compact | With Compact | Savings |
|---|---|---|---|
| Directory listing (per entry) | ~50 tokens | ~3 tokens | 94% |
| File read response | ~15,000 tokens | ~800 tokens | 95% |
| File edit response | ~25,000 tokens | ~1,200 tokens | 95% |
| Search results (10 matches) | ~5,000 tokens | ~800 tokens | 84% |
| Aggregate session (100 ops) | ~2,100,000 tokens | ~480,000 tokens | 77% |
Surgical Editing vs. Full Rewrites
Section titled “Surgical Editing vs. Full Rewrites”The biggest token savings come from targeted operations instead of full file rewrites:
| Approach | Tokens | Savings vs. Full Rewrite |
|---|---|---|
| Full read-modify-write (3,000-line file) | ~150,000 | Baseline |
| Search + range read + edit | ~1,200 | 99% |
This 99% reduction is not an exaggeration — it has been measured repeatedly across real projects. The key is to use search_files to find the exact location, read_file with start_line/end_line to read only the relevant section, and edit_file to apply the change.
Pipeline Token Reduction
Section titled “Pipeline Token Reduction”The pipeline system (batch_operations with pipeline_json) eliminates MCP round-trip overhead for multi-step workflows:
| Component | Tokens Per MCP Call |
|---|---|
| Request JSON serialization | ~100-150 |
| Claude’s reasoning about the call | ~100-200 |
| Response parsing + processing | ~100-200 |
| Total overhead per call | ~300-600 |
Measured Benchmark: Search + Read + Count (2 files)
Section titled “Measured Benchmark: Search + Read + Count (2 files)”| Method | MCP Calls | Server Time | Overhead Tokens |
|---|---|---|---|
| Individual calls | 5 | ~5 ms | ~2,100 |
| Pipeline (1 call) | 1 | 3.4 ms | ~400 |
| Reduction | 5x | 1.5x | 5x |
Measured Benchmark: Dry-Run Refactor (19 files, 261 occurrences)
Section titled “Measured Benchmark: Dry-Run Refactor (19 files, 261 occurrences)”| Method | MCP Calls | Overhead Tokens |
|---|---|---|
| Individual calls | ~22 | ~9,000 |
| Pipeline (1 call) | 1 | ~400 |
| Reduction | 22x | 22x |
When Pipelines Help Most
Section titled “When Pipelines Help Most”| Scenario | Calls Saved | Typical Reduction |
|---|---|---|
| Search + Edit + Verify | 3-7 calls reduced to 1 | 3-7x |
| Search + Read N files | N+1 calls reduced to 1 | Up to 10x |
| Search + Edit N + Count N | 2N+1 calls reduced to 1 | Up to 22x |
| Single file read | 1 call remains 1 | No benefit |
Pipeline output modes also affect token usage:
- Compact (
verbose: false): ~30 tokens —OK: 3/3 steps | 2 files | 0 edits - Verbose (
verbose: true): Full file contents (truncated at 50 lines per file), per-file counts, complete file lists
Real Session Data
Section titled “Real Session Data”From an actual 2-hour coding session on a Go project (47 files, 12,000 lines of code):
| Metric | Value |
|---|---|
| File reads | 156 |
| File edits | 43 |
| Searches | 28 |
| Directory listings | 12 |
| Cache hit rate | 94.2% |
| Mode | Tokens Used | Approximate Cost |
|---|---|---|
| Without MCP optimizations | ~2,100,000 | $6.30 |
| With MCP + compact mode | ~480,000 | $1.44 |
| Savings | 77% | 77% |
This is a typical session. Some sessions show higher savings (90%+) when performing many surgical edits. Some show lower savings (60%) when genuinely needing to read many full files.
Internal Engine Optimizations
Section titled “Internal Engine Optimizations”Three optimizations in the engine reduce overhead on every MCP operation, with no configuration changes needed:
AllowedPaths Pre-Resolution
Section titled “AllowedPaths Pre-Resolution”The access control check (isPathAllowed) previously called filepath.EvalSymlinks() — a real I/O syscall — for every allowed base path on every operation. With 5 allowed paths, that was 5 syscalls per read, write, edit, delete, or list.
Now the allowed paths are resolved once at server startup. The per-operation check iterates pre-resolved strings with zero I/O.
| Metric | Before | After |
|---|---|---|
EvalSymlinks calls per operation | N (one per allowed path) | 0 |
EvalSymlinks calls at startup | 0 | N (once) |
Regex Compilation Cache
Section titled “Regex Compilation Cache”Search functions compiled regex patterns from scratch on every call. Now they use the engine’s CompileRegex() cache, which stores compiled patterns in a sync.RWMutex-protected map (up to 100 patterns). Repeated searches with the same pattern skip compilation entirely.
Extension Lookup Maps
Section titled “Extension Lookup Maps”File type detection (isTextFile, isBinaryFile) previously used O(n) linear scans through 45-entry slices. Replaced with O(1) map lookups against textExtensionsMap (70+ entries) and binaryExtensionsMap. Both maps are initialized once at package load time.
Measuring Your Own Performance
Section titled “Measuring Your Own Performance”Use server_info to see real-time metrics during a session:
server_info({ action: "stats" })Returns:
Performance Statistics
Uptime: 2h 15mTotal operations: 892Operations/sec: 2,016 (peak), 110 (average)Cache hit rate: 94.2%Memory usage: 156 MB / 200 MB
Token estimates (this session): Without optimization: ~1,800,000 With optimization: ~420,000 Estimated savings: 76.7%For historical data, enable --log-dir and use the Dashboard binary to view detailed operation logs, timing distributions, and backup history.
Summary
Section titled “Summary”| Claim | Evidence |
|---|---|
| 2,016 ops/sec sustained throughput | Mixed workload benchmark on standard hardware |
| 98.9% cache hit rate | Typical session after warmup period |
| < 1 ms cached latency | 3-tier cache serves from memory |
| 77% token reduction | Aggregate across mixed session, measured over hundreds of hours |
| 99% savings on surgical edits | Targeted search + range read + edit vs. full rewrite |
| 5-22x fewer MCP calls with pipelines | Pipeline vs. individual calls, measured on real codebase |
| 20 MB base memory | Go runtime + server framework, no cache loaded |
These numbers are honest representations of what you can expect in real use. Individual results vary based on workload, hardware, and file types.
See Also
Section titled “See Also”- Performance and Tokens — Detailed token optimization strategies and session analysis
- Configuration — Server flags that affect performance
- Automatic I/O Strategy — How file size thresholds work
- Pipeline System — Multi-step workflows for token reduction
Last updated: March 2026 Version: 4.0.0