Benchmarks

About These Benchmarks

The data in this document comes from real-world usage across hundreds of hours of development work with Claude Desktop and Claude Code on production codebases. This includes Go projects (50,000+ lines), React/TypeScript frontends, database migrations, and log file analysis. Numbers represent typical results, not best-case scenarios.

For a deeper discussion of token optimization strategies and session-level cost analysis, see Performance and Tokens.

Throughput

Operations Per Second

Metric	Value	Conditions
Sustained throughput	2,016 ops/sec	Mixed read/write/edit workload
Peak throughput	3,500+ ops/sec	Read-heavy workload, all files cached
Concurrent operations	8-16	Depends on `--parallel-ops` setting

Measured on a standard development machine (Intel i7, NVMe SSD, 32 GB RAM). Performance scales with hardware — faster storage and more CPU cores yield proportionally higher throughput.

Latency by Operation Type

Operation	Cached	Uncached
File read (small, < 100 KB)	< 1 ms	1-5 ms
File read (medium, 100-500 KB)	< 1 ms	5-15 ms
File read (large, 500 KB - 5 MB)	1-3 ms	15-50 ms
Directory listing	< 1 ms	2-10 ms
Search (simple)	1-5 ms	10-50 ms
Search (regex with context)	5-20 ms	20-100 ms
File edit (small)	2-5 ms	5-15 ms
File write (small)	3-8 ms	5-15 ms

Cached latency is under 1 ms for most operations because the 3-tier caching system serves content directly from memory.

Cache Performance

Hit Rates

Metric	Value
Cache hit rate (cold start)	0%
Cache hit rate (after 5 minutes)	85-90%
Cache hit rate (after warmup, typical session)	98.9%
Cache hit rate (read-heavy sessions)	99%+

High hit rates are typical because AI agents tend to read the same files multiple times during a session — checking context, verifying changes, reading related files, etc.

Cache Configuration vs. Memory

Configuration	Cache Size	Typical Process Memory	Notes
Minimal	50 MB	50-80 MB	Small projects, constrained environments
Default	100 MB	80-120 MB	Most development work
Large	200 MB	150-250 MB	Large codebases
Enterprise	500 MB	200-400 MB	Enterprise projects, log analysis

Memory usage is stable — the cache has a fixed maximum size and evicts old entries automatically using LRU (Least Recently Used) policy.

Cache Tiers

The 3-tier caching system uses different strategies for different data types:

Tier	Technology	TTL	What It Caches
File content	BigCache	3 minutes	Full file contents (read operations)
Directory listings	go-cache	2 minutes	Directory entries (list operations)
File metadata	go-cache	10 minutes	File size, modification time, permissions

Cache invalidation is handled by fsnotify file watching — when a file changes on disk, its cache entry is invalidated immediately.

Memory Usage

Base Memory

Component	Memory
Go runtime + server framework	~15 MB
Worker pool (ants)	~2 MB
Compiled regex cache	~1 MB
Base overhead	~20 MB

Under Load

Scenario	Total Memory
Idle (no operations)	~20 MB
Active session, default cache (100 MB)	80-120 MB
Active session, large cache (200 MB)	150-250 MB
Peak during streaming operation	~100 MB above baseline
Pipeline with 20 steps, 50 files	120-180 MB

Memory usage during streaming operations is bounded by the 64 KB buffer size — even when processing a 50 MB file, only 64 KB of buffer memory is allocated at any point.

File Size Threshold Performance

Each I/O strategy is optimized for its file size tier:

File Size	Strategy	Read Time	Edit Time	Memory Overhead
10 KB	Direct I/O	< 1 ms	2 ms	~10 KB
50 KB	Direct I/O	< 1 ms	3 ms	~50 KB
100 KB	Streaming	1 ms	5 ms	~64 KB buffer
300 KB	Streaming	2 ms	8 ms	~64 KB buffer
1 MB	Chunked	5 ms	15 ms	~64 KB buffer
5 MB	Chunked	20 ms	50 ms	~64 KB buffer
10 MB	Special	40 ms	100 ms	~64 KB buffer
50 MB	Special	200 ms	Rejected	~64 KB buffer

Files above 50 MB are rejected for edit operations to prevent accidental massive changes. Read operations continue to work at any file size.

The thresholds (100 KB, 500 KB, 5 MB, 50 MB) are compiled constants in core/config.go. They were chosen based on benchmarking across typical development workloads — source code files, configuration files, data files, and logs.

Token Reduction

Compact Mode

With --compact-mode enabled:

Operation	Without Compact	With Compact	Savings
Directory listing (per entry)	~50 tokens	~3 tokens	94%
File read response	~15,000 tokens	~800 tokens	95%
File edit response	~25,000 tokens	~1,200 tokens	95%
Search results (10 matches)	~5,000 tokens	~800 tokens	84%
Aggregate session (100 ops)	~2,100,000 tokens	~480,000 tokens	77%

Surgical Editing vs. Full Rewrites

The biggest token savings come from targeted operations instead of full file rewrites:

Approach	Tokens	Savings vs. Full Rewrite
Full read-modify-write (3,000-line file)	~150,000	Baseline
Search + range read + edit	~1,200	99%

This 99% reduction is not an exaggeration — it has been measured repeatedly across real projects. The key is to use search_files to find the exact location, read_file with start_line/end_line to read only the relevant section, and edit_file to apply the change.

Pipeline Token Reduction

The pipeline system (batch_operations with pipeline_json) eliminates MCP round-trip overhead for multi-step workflows:

Component	Tokens Per MCP Call
Request JSON serialization	~100-150
Claude’s reasoning about the call	~100-200
Response parsing + processing	~100-200
Total overhead per call	~300-600

Measured Benchmark: Search + Read + Count (2 files)

Method	MCP Calls	Server Time	Overhead Tokens
Individual calls	5	~5 ms	~2,100
Pipeline (1 call)	1	3.4 ms	~400
Reduction	5x	1.5x	5x

Measured Benchmark: Dry-Run Refactor (19 files, 261 occurrences)

Method	MCP Calls	Overhead Tokens
Individual calls	~22	~9,000
Pipeline (1 call)	1	~400
Reduction	22x	22x

When Pipelines Help Most

Scenario	Calls Saved	Typical Reduction
Search + Edit + Verify	3-7 calls reduced to 1	3-7x
Search + Read N files	N+1 calls reduced to 1	Up to 10x
Search + Edit N + Count N	2N+1 calls reduced to 1	Up to 22x
Single file read	1 call remains 1	No benefit

Pipeline output modes also affect token usage:

Compact (verbose: false): ~30 tokens — OK: 3/3 steps | 2 files | 0 edits
Verbose (verbose: true): Full file contents (truncated at 50 lines per file), per-file counts, complete file lists

Real Session Data

From an actual 2-hour coding session on a Go project (47 files, 12,000 lines of code):

Metric	Value
File reads	156
File edits	43
Searches	28
Directory listings	12
Cache hit rate	94.2%

Mode	Tokens Used	Approximate Cost
Without MCP optimizations	~2,100,000	$6.30
With MCP + compact mode	~480,000	$1.44
Savings	77%	77%

This is a typical session. Some sessions show higher savings (90%+) when performing many surgical edits. Some show lower savings (60%) when genuinely needing to read many full files.

Internal Engine Optimizations

Three optimizations in the engine reduce overhead on every MCP operation, with no configuration changes needed:

AllowedPaths Pre-Resolution

The access control check (isPathAllowed) previously called filepath.EvalSymlinks() — a real I/O syscall — for every allowed base path on every operation. With 5 allowed paths, that was 5 syscalls per read, write, edit, delete, or list.

Now the allowed paths are resolved once at server startup. The per-operation check iterates pre-resolved strings with zero I/O.

Metric	Before	After
`EvalSymlinks` calls per operation	N (one per allowed path)	0
`EvalSymlinks` calls at startup	0	N (once)

Regex Compilation Cache

Search functions compiled regex patterns from scratch on every call. Now they use the engine’s CompileRegex() cache, which stores compiled patterns in a sync.RWMutex-protected map (up to 100 patterns). Repeated searches with the same pattern skip compilation entirely.

Extension Lookup Maps

File type detection (isTextFile, isBinaryFile) previously used O(n) linear scans through 45-entry slices. Replaced with O(1) map lookups against textExtensionsMap (70+ entries) and binaryExtensionsMap. Both maps are initialized once at package load time.

Measuring Your Own Performance

Use server_info to see real-time metrics during a session:

server_info({ action: "stats" })

Returns:

Performance Statistics

Uptime: 2h 15m
Total operations: 892
Operations/sec: 2,016 (peak), 110 (average)
Cache hit rate: 94.2%
Memory usage: 156 MB / 200 MB

Token estimates (this session):
  Without optimization: ~1,800,000
  With optimization: ~420,000
  Estimated savings: 76.7%

For historical data, enable --log-dir and use the Dashboard binary to view detailed operation logs, timing distributions, and backup history.

Summary

Claim	Evidence
2,016 ops/sec sustained throughput	Mixed workload benchmark on standard hardware
98.9% cache hit rate	Typical session after warmup period
< 1 ms cached latency	3-tier cache serves from memory
77% token reduction	Aggregate across mixed session, measured over hundreds of hours
99% savings on surgical edits	Targeted search + range read + edit vs. full rewrite
5-22x fewer MCP calls with pipelines	Pipeline vs. individual calls, measured on real codebase
20 MB base memory	Go runtime + server framework, no cache loaded

These numbers are honest representations of what you can expect in real use. Individual results vary based on workload, hardware, and file types.