Skip to content

Benchmarks

The data in this document comes from real-world usage across hundreds of hours of development work with Claude Desktop and Claude Code on production codebases. This includes Go projects (50,000+ lines), React/TypeScript frontends, database migrations, and log file analysis. Numbers represent typical results, not best-case scenarios.

For a deeper discussion of token optimization strategies and session-level cost analysis, see Performance and Tokens.


MetricValueConditions
Sustained throughput2,016 ops/secMixed read/write/edit workload
Peak throughput3,500+ ops/secRead-heavy workload, all files cached
Concurrent operations8-16Depends on --parallel-ops setting

Measured on a standard development machine (Intel i7, NVMe SSD, 32 GB RAM). Performance scales with hardware — faster storage and more CPU cores yield proportionally higher throughput.

OperationCachedUncached
File read (small, < 100 KB)< 1 ms1-5 ms
File read (medium, 100-500 KB)< 1 ms5-15 ms
File read (large, 500 KB - 5 MB)1-3 ms15-50 ms
Directory listing< 1 ms2-10 ms
Search (simple)1-5 ms10-50 ms
Search (regex with context)5-20 ms20-100 ms
File edit (small)2-5 ms5-15 ms
File write (small)3-8 ms5-15 ms

Cached latency is under 1 ms for most operations because the 3-tier caching system serves content directly from memory.


MetricValue
Cache hit rate (cold start)0%
Cache hit rate (after 5 minutes)85-90%
Cache hit rate (after warmup, typical session)98.9%
Cache hit rate (read-heavy sessions)99%+

High hit rates are typical because AI agents tend to read the same files multiple times during a session — checking context, verifying changes, reading related files, etc.

ConfigurationCache SizeTypical Process MemoryNotes
Minimal50 MB50-80 MBSmall projects, constrained environments
Default100 MB80-120 MBMost development work
Large200 MB150-250 MBLarge codebases
Enterprise500 MB200-400 MBEnterprise projects, log analysis

Memory usage is stable — the cache has a fixed maximum size and evicts old entries automatically using LRU (Least Recently Used) policy.

The 3-tier caching system uses different strategies for different data types:

TierTechnologyTTLWhat It Caches
File contentBigCache3 minutesFull file contents (read operations)
Directory listingsgo-cache2 minutesDirectory entries (list operations)
File metadatago-cache10 minutesFile size, modification time, permissions

Cache invalidation is handled by fsnotify file watching — when a file changes on disk, its cache entry is invalidated immediately.


ComponentMemory
Go runtime + server framework~15 MB
Worker pool (ants)~2 MB
Compiled regex cache~1 MB
Base overhead~20 MB
ScenarioTotal Memory
Idle (no operations)~20 MB
Active session, default cache (100 MB)80-120 MB
Active session, large cache (200 MB)150-250 MB
Peak during streaming operation~100 MB above baseline
Pipeline with 20 steps, 50 files120-180 MB

Memory usage during streaming operations is bounded by the 64 KB buffer size — even when processing a 50 MB file, only 64 KB of buffer memory is allocated at any point.


Each I/O strategy is optimized for its file size tier:

File SizeStrategyRead TimeEdit TimeMemory Overhead
10 KBDirect I/O< 1 ms2 ms~10 KB
50 KBDirect I/O< 1 ms3 ms~50 KB
100 KBStreaming1 ms5 ms~64 KB buffer
300 KBStreaming2 ms8 ms~64 KB buffer
1 MBChunked5 ms15 ms~64 KB buffer
5 MBChunked20 ms50 ms~64 KB buffer
10 MBSpecial40 ms100 ms~64 KB buffer
50 MBSpecial200 msRejected~64 KB buffer

Files above 50 MB are rejected for edit operations to prevent accidental massive changes. Read operations continue to work at any file size.

The thresholds (100 KB, 500 KB, 5 MB, 50 MB) are compiled constants in core/config.go. They were chosen based on benchmarking across typical development workloads — source code files, configuration files, data files, and logs.


With --compact-mode enabled:

OperationWithout CompactWith CompactSavings
Directory listing (per entry)~50 tokens~3 tokens94%
File read response~15,000 tokens~800 tokens95%
File edit response~25,000 tokens~1,200 tokens95%
Search results (10 matches)~5,000 tokens~800 tokens84%
Aggregate session (100 ops)~2,100,000 tokens~480,000 tokens77%

The biggest token savings come from targeted operations instead of full file rewrites:

ApproachTokensSavings vs. Full Rewrite
Full read-modify-write (3,000-line file)~150,000Baseline
Search + range read + edit~1,20099%

This 99% reduction is not an exaggeration — it has been measured repeatedly across real projects. The key is to use search_files to find the exact location, read_file with start_line/end_line to read only the relevant section, and edit_file to apply the change.

The pipeline system (batch_operations with pipeline_json) eliminates MCP round-trip overhead for multi-step workflows:

ComponentTokens Per MCP Call
Request JSON serialization~100-150
Claude’s reasoning about the call~100-200
Response parsing + processing~100-200
Total overhead per call~300-600

Measured Benchmark: Search + Read + Count (2 files)

Section titled “Measured Benchmark: Search + Read + Count (2 files)”
MethodMCP CallsServer TimeOverhead Tokens
Individual calls5~5 ms~2,100
Pipeline (1 call)13.4 ms~400
Reduction5x1.5x5x

Measured Benchmark: Dry-Run Refactor (19 files, 261 occurrences)

Section titled “Measured Benchmark: Dry-Run Refactor (19 files, 261 occurrences)”
MethodMCP CallsOverhead Tokens
Individual calls~22~9,000
Pipeline (1 call)1~400
Reduction22x22x
ScenarioCalls SavedTypical Reduction
Search + Edit + Verify3-7 calls reduced to 13-7x
Search + Read N filesN+1 calls reduced to 1Up to 10x
Search + Edit N + Count N2N+1 calls reduced to 1Up to 22x
Single file read1 call remains 1No benefit

Pipeline output modes also affect token usage:

  • Compact (verbose: false): ~30 tokens — OK: 3/3 steps | 2 files | 0 edits
  • Verbose (verbose: true): Full file contents (truncated at 50 lines per file), per-file counts, complete file lists

From an actual 2-hour coding session on a Go project (47 files, 12,000 lines of code):

MetricValue
File reads156
File edits43
Searches28
Directory listings12
Cache hit rate94.2%
ModeTokens UsedApproximate Cost
Without MCP optimizations~2,100,000$6.30
With MCP + compact mode~480,000$1.44
Savings77%77%

This is a typical session. Some sessions show higher savings (90%+) when performing many surgical edits. Some show lower savings (60%) when genuinely needing to read many full files.


Three optimizations in the engine reduce overhead on every MCP operation, with no configuration changes needed:

The access control check (isPathAllowed) previously called filepath.EvalSymlinks() — a real I/O syscall — for every allowed base path on every operation. With 5 allowed paths, that was 5 syscalls per read, write, edit, delete, or list.

Now the allowed paths are resolved once at server startup. The per-operation check iterates pre-resolved strings with zero I/O.

MetricBeforeAfter
EvalSymlinks calls per operationN (one per allowed path)0
EvalSymlinks calls at startup0N (once)

Search functions compiled regex patterns from scratch on every call. Now they use the engine’s CompileRegex() cache, which stores compiled patterns in a sync.RWMutex-protected map (up to 100 patterns). Repeated searches with the same pattern skip compilation entirely.

File type detection (isTextFile, isBinaryFile) previously used O(n) linear scans through 45-entry slices. Replaced with O(1) map lookups against textExtensionsMap (70+ entries) and binaryExtensionsMap. Both maps are initialized once at package load time.


Use server_info to see real-time metrics during a session:

server_info({ action: "stats" })

Returns:

Performance Statistics
Uptime: 2h 15m
Total operations: 892
Operations/sec: 2,016 (peak), 110 (average)
Cache hit rate: 94.2%
Memory usage: 156 MB / 200 MB
Token estimates (this session):
Without optimization: ~1,800,000
With optimization: ~420,000
Estimated savings: 76.7%

For historical data, enable --log-dir and use the Dashboard binary to view detailed operation logs, timing distributions, and backup history.


ClaimEvidence
2,016 ops/sec sustained throughputMixed workload benchmark on standard hardware
98.9% cache hit rateTypical session after warmup period
< 1 ms cached latency3-tier cache serves from memory
77% token reductionAggregate across mixed session, measured over hundreds of hours
99% savings on surgical editsTargeted search + range read + edit vs. full rewrite
5-22x fewer MCP calls with pipelinesPipeline vs. individual calls, measured on real codebase
20 MB base memoryGo runtime + server framework, no cache loaded

These numbers are honest representations of what you can expect in real use. Individual results vary based on workload, hardware, and file types.



Last updated: March 2026 Version: 4.0.0