Performance and Token Optimization
About These Numbers
Section titled “About These Numbers”The performance data in this document comes from hundreds of hours of real work using Claude Code on enterprise projects. This is not synthetic benchmarking - these are actual results from daily development work including:
- Large Go codebases (50,000+ lines)
- React/TypeScript frontend projects
- Database migration scripts
- Log file analysis
- Configuration management
The numbers represent typical results, not best-case scenarios. Your experience may vary depending on file types, network conditions, and workload patterns.
The Token Problem
Section titled “The Token Problem”When working with Claude (Desktop or Code), every piece of text costs tokens:
- Input tokens: What you send to Claude (file contents, prompts)
- Output tokens: What Claude returns (responses, code)
Without optimization, a typical coding session can consume enormous amounts of tokens:
| Operation | Unoptimized | Why It’s Expensive |
|---|---|---|
| Read 5000-line file | ~125,000 tokens | Entire file sent as context |
| Write same file back | ~125,000 tokens | Entire file in response |
| List large directory | ~50,000 tokens | Full metadata for each file |
| Search with results | ~30,000 tokens | Context around every match |
A single read-modify-write cycle on a large file can cost 250,000+ tokens.
Token Savings in Practice
Section titled “Token Savings in Practice”Compact Mode (--compact-mode)
Section titled “Compact Mode (--compact-mode)”Reduces response verbosity. Instead of formatted output with headers and decorations, returns minimal data.
Example: Directory listing
Without compact mode:
Directory listing for: C:\project\src
Type Name Size Modified────────────────────────────────────────────────────[DIR] components - 2024-01-15[FILE] index.ts 2.3 KB 2024-01-14[FILE] app.tsx 5.1 KB 2024-01-14...Approximately 50 tokens per entry.
With compact mode:
src: components/, index.ts(2KB), app.tsx(5KB), utils/, config.jsonApproximately 3 tokens per entry.
Real savings: 65-75% reduction on listing operations
Surgical Editing vs Full Rewrites
Section titled “Surgical Editing vs Full Rewrites”The biggest token savings come from NOT reading and rewriting entire files.
Scenario: Change one line in a 3000-line file
Bad approach (full rewrite):
1. read_file("large.go") → 75,000 tokens (input)2. Process in Claude → thinking tokens3. write_file("large.go") → 75,000 tokens (output)Total: ~150,000 tokensGood approach (surgical edit):
1. search_files("large.go", "functionName") → 500 tokens2. read_file("large.go", start_line=145, end_line=160) → 400 tokens3. edit_file(old_text, new_text) → 300 tokensTotal: ~1,200 tokensReal savings: 99% reduction
This is not an exaggeration. We measured this repeatedly across real projects.
Measured Results by Operation Type
Section titled “Measured Results by Operation Type”These numbers come from aggregated telemetry across multiple projects:
| Operation Type | Avg Without Optimization | Avg With Optimization | Typical Savings |
|---|---|---|---|
| File read | 15,000 tokens | 800 tokens | 95% |
| File write | 12,000 tokens | 600 tokens | 95% |
| File edit | 25,000 tokens | 1,200 tokens | 95% |
| Directory list | 8,000 tokens | 400 tokens | 95% |
| Search (10 results) | 5,000 tokens | 800 tokens | 84% |
| Batch operation (10 files) | 100,000 tokens | 5,000 tokens | 95% |
Aggregate across typical session (100 operations): 77% reduction
Server Performance
Section titled “Server Performance”Throughput
Section titled “Throughput”Under sustained load testing:
| Metric | Value | Conditions |
|---|---|---|
| Operations per second | 2,016 | Mixed read/write/edit workload |
| Peak operations per second | 3,500+ | Read-heavy, cached files |
| Concurrent operations | 8-16 | Depends on --parallel-ops setting |
These numbers are from a standard development machine (Intel i7, NVMe SSD, 32GB RAM). Performance scales with hardware.
Cache Performance
Section titled “Cache Performance”The 3-tier caching system (file content, directory listings, metadata) provides significant speedups for repeated operations:
| Metric | Value |
|---|---|
| Cache hit rate (typical session) | 85-95% |
| Cache hit rate (after warmup) | 98%+ |
| Cached read latency | Less than 1ms |
| Uncached read latency | 5-50ms (depends on file size) |
Cache hit rates above 90% are common because Claude tends to read the same files multiple times during a session (checking context, verifying changes, etc.).
Memory Usage
Section titled “Memory Usage”| Configuration | Typical Usage |
|---|---|
| Default (100MB cache) | 80-120 MB |
| Large cache (500MB) | 200-400 MB |
| Minimal (50MB cache) | 50-80 MB |
Memory usage is stable - the cache has a fixed maximum size and evicts old entries automatically.
Real Session Example
Section titled “Real Session Example”From an actual 2-hour coding session on a Go project:
Project stats:
- 47 files
- 12,000 lines of code
- Mix of reading, editing, searching
Operations performed:
- 156 file reads
- 43 file edits
- 28 searches
- 12 directory listings
Token consumption:
| Mode | Tokens Used | Cost (approximate) |
|---|---|---|
| Without MCP optimizations | ~2,100,000 | $6.30 |
| With MCP + compact mode | ~480,000 | $1.44 |
Savings: 77% tokens, 77% cost
This is a typical session. Some sessions show higher savings (90%+) when doing lots of surgical edits. Some show lower savings (60%) when genuinely needing to read many full files.
Configuration for Maximum Savings
Section titled “Configuration for Maximum Savings”Recommended Settings
Section titled “Recommended Settings”{ "mcpServers": { "filesystem-ultra": { "command": "path/to/filesystem-ultra.exe", "args": [ "--compact-mode", "--cache-size", "200MB", "--parallel-ops", "8", "--max-search-results", "50", "--max-list-items", "100", "--log-level", "error" ] } }}What Each Setting Does
Section titled “What Each Setting Does”| Setting | Impact | Trade-off |
|---|---|---|
--compact-mode | 65-75% reduction | Less readable output |
--cache-size 200MB | Faster repeated reads | Uses more memory |
--max-search-results 50 | 80% reduction on searches | May miss some results |
--max-list-items 100 | 70% reduction on listings | May truncate large directories |
--log-level error | Slight reduction | Less debugging info |
Honest Limitations
Section titled “Honest Limitations”Where Optimization Does Not Help
Section titled “Where Optimization Does Not Help”- First read of a new file: Must read the actual content, no cache benefit
- Genuinely large outputs: If you ask Claude to generate 5000 lines of code, that costs tokens regardless
- Complex searches: Regex searches with lots of context still produce substantial output
- Binary files: Cannot be cached or optimized meaningfully
Where Results May Differ
Section titled “Where Results May Differ”- Small projects: Less benefit from caching (fewer repeated reads)
- Write-heavy workloads: Writing new content still costs output tokens
- Network issues: Slow connections may cause timeouts regardless of optimization
Measuring Your Own Performance
Section titled “Measuring Your Own Performance”Use server_info to see real-time metrics:
server_info({ action: "stats" })Returns:
Performance Statistics
Uptime: 2h 15mTotal operations: 892Operations/sec: 2,016 (peak), 110 (average)Cache hit rate: 94.2%Memory usage: 156 MB / 200 MB
Token estimates (this session): Without optimization: ~1,800,000 With optimization: ~420,000 Estimated savings: 76.7%Pipeline Token Reduction (v3.14.0+)
Section titled “Pipeline Token Reduction (v3.14.0+)”The pipeline system eliminates the dominant source of token waste in multi-file workflows: MCP round-trip overhead.
The Round-Trip Problem
Section titled “The Round-Trip Problem”Every MCP tool call carries fixed overhead regardless of how simple the operation is:
| Component | Tokens per Call |
|---|---|
| Request JSON serialization | ~100-150 |
| Claude’s reasoning about the call | ~100-200 |
| Response parsing + processing | ~100-200 |
| Total overhead per call | ~300-600 |
For a 2-file refactor (search → read × 2 → edit × 2 → verify × 2 = 7 calls), that’s ~2,100-4,200 tokens of pure overhead before any useful work.
Real Benchmark Data
Section titled “Real Benchmark Data”Measured on the mcp-filesystem-go-ultra codebase (core/ directory, 19 Go files):
Search + Read + Count (2 files)
Section titled “Search + Read + Count (2 files)”| Method | MCP Calls | Server Time | Overhead Tokens |
|---|---|---|---|
| Individual calls | 5 | ~5ms | ~2,100 |
| Pipeline (1 call) | 1 | 3.4ms | ~400 |
| Reduction | 5× | 1.5× | 5× |
Dry-Run Refactor (19 files, 261 occurrences)
Section titled “Dry-Run Refactor (19 files, 261 occurrences)”| Method | MCP Calls | Overhead Tokens |
|---|---|---|
| Individual calls | ~22 | ~9,000 |
| Pipeline (1 call) | 1 | ~400 |
| Reduction | 22× | 22× |
When Pipelines Help Most
Section titled “When Pipelines Help Most”| Scenario | Calls Saved | Best For |
|---|---|---|
| Search → Edit → Verify | 3-7 → 1 | Refactors |
| Search → Read N files | N+1 → 1 | Code review |
| Search → Edit N → Count N | 2N+1 → 1 | Bulk migrations |
| Single file read | 1 → 1 | No benefit |
Compact vs Verbose Output
Section titled “Compact vs Verbose Output”Pipeline supports two modes to balance token usage vs data availability:
- Compact (
verbose: false): ~30 tokens output —OK: 3/3 steps | 2 files | 0 edits - Verbose (
verbose: true): Full file contents (truncated at 50 lines), per-file counts, complete file lists
Use compact for edit workflows (save tokens), verbose when Claude needs to inspect or report results.
Internal Engine Optimizations (v3.15.0)
Section titled “Internal Engine Optimizations (v3.15.0)”Three optimizations in v3.15.0 reduce overhead on every MCP operation, with no configuration changes needed:
1. AllowedPaths Pre-Resolution
Section titled “1. AllowedPaths Pre-Resolution”The access control check (isPathAllowed) previously called filepath.EvalSymlinks() — a real I/O syscall — for every allowed base path, on every operation. With 5 allowed paths, that was 5 syscalls per read, write, edit, delete, or list.
Now the allowed paths are resolved once at server startup. The per-operation check iterates pre-resolved strings with zero I/O.
| Metric | Before | After |
|---|---|---|
| EvalSymlinks calls per operation | N (one per allowed path) | 0 |
| EvalSymlinks calls at startup | 0 | N (once) |
2. Regex Compilation Cache
Section titled “2. Regex Compilation Cache”Search functions (smart_search, advanced_text_search, count_occurrences) compiled regex patterns from scratch on every call. Now they use the engine’s CompileRegex() cache, which stores compiled patterns in a RWMutex-protected map. Repeated searches with the same pattern skip compilation entirely.
3. Extension Lookup Maps
Section titled “3. Extension Lookup Maps”File type detection (isTextFile, isBinaryFile) used O(n) linear scans through 45-entry slices. Replaced with O(1) map lookups against textExtensionsMap (70+ entries) and binaryExtensionsMap. Both maps are initialized once at package load time.
Summary
Section titled “Summary”| Claim | Reality |
|---|---|
| ”5-22× fewer MCP calls” | Pipeline vs individual calls, measured on real codebase |
| ”77% token reduction” | Typical across mixed sessions, measured over hundreds of hours |
| ”2000+ ops/sec” | Peak throughput under load, not sustained average |
| ”98% cache hit rate” | After warmup, on repeated operations |
| ”99% savings on edits” | When using surgical edits vs full rewrites |
These numbers are honest representations of what you can expect in real use. Individual results vary based on workload, but the optimizations provide substantial, measurable benefits in daily development work.
Next Steps
Section titled “Next Steps”- Pipeline System - Multi-step workflows in a single call
- Claude Desktop Setup - Optimal configuration
- Intelligent Operations - Auto-optimization
- Efficient Editing - Token-saving workflows