Performance and Token Optimization

About These Numbers

The performance data in this document comes from hundreds of hours of real work using Claude Code on enterprise projects. This is not synthetic benchmarking - these are actual results from daily development work including:

Large Go codebases (50,000+ lines)
React/TypeScript frontend projects
Database migration scripts
Log file analysis
Configuration management

The numbers represent typical results, not best-case scenarios. Your experience may vary depending on file types, network conditions, and workload patterns.

The Token Problem

When working with Claude (Desktop or Code), every piece of text costs tokens:

Input tokens: What you send to Claude (file contents, prompts)
Output tokens: What Claude returns (responses, code)

Without optimization, a typical coding session can consume enormous amounts of tokens:

Operation	Unoptimized	Why It’s Expensive
Read 5000-line file	~125,000 tokens	Entire file sent as context
Write same file back	~125,000 tokens	Entire file in response
List large directory	~50,000 tokens	Full metadata for each file
Search with results	~30,000 tokens	Context around every match

A single read-modify-write cycle on a large file can cost 250,000+ tokens.

Token Savings in Practice

Compact Mode (`--compact-mode`)

Reduces response verbosity. Instead of formatted output with headers and decorations, returns minimal data.

Example: Directory listing

Without compact mode:

Directory listing for: C:\project\src

Type    Name                    Size        Modified
────────────────────────────────────────────────────
[DIR]   components              -           2024-01-15
[FILE]  index.ts                2.3 KB      2024-01-14
[FILE]  app.tsx                 5.1 KB      2024-01-14
...

Approximately 50 tokens per entry.

With compact mode:

src: components/, index.ts(2KB), app.tsx(5KB), utils/, config.json

Approximately 3 tokens per entry.

Real savings: 65-75% reduction on listing operations

Surgical Editing vs Full Rewrites

The biggest token savings come from NOT reading and rewriting entire files.

Scenario: Change one line in a 3000-line file

Bad approach (full rewrite):

1. read_file("large.go")     → 75,000 tokens (input)
2. Process in Claude         → thinking tokens
3. write_file("large.go")    → 75,000 tokens (output)
Total: ~150,000 tokens

Good approach (surgical edit):

1. search_files("large.go", "functionName")         → 500 tokens
2. read_file("large.go", start_line=145, end_line=160) → 400 tokens
3. edit_file(old_text, new_text)                     → 300 tokens
Total: ~1,200 tokens

Real savings: 99% reduction

This is not an exaggeration. We measured this repeatedly across real projects.

Measured Results by Operation Type

These numbers come from aggregated telemetry across multiple projects:

Operation Type	Avg Without Optimization	Avg With Optimization	Typical Savings
File read	15,000 tokens	800 tokens	95%
File write	12,000 tokens	600 tokens	95%
File edit	25,000 tokens	1,200 tokens	95%
Directory list	8,000 tokens	400 tokens	95%
Search (10 results)	5,000 tokens	800 tokens	84%
Batch operation (10 files)	100,000 tokens	5,000 tokens	95%

Aggregate across typical session (100 operations): 77% reduction

Server Performance

Throughput

Under sustained load testing:

Metric	Value	Conditions
Operations per second	2,016	Mixed read/write/edit workload
Peak operations per second	3,500+	Read-heavy, cached files
Concurrent operations	8-16	Depends on `--parallel-ops` setting

These numbers are from a standard development machine (Intel i7, NVMe SSD, 32GB RAM). Performance scales with hardware.

Cache Performance

The 3-tier caching system (file content, directory listings, metadata) provides significant speedups for repeated operations:

Metric	Value
Cache hit rate (typical session)	85-95%
Cache hit rate (after warmup)	98%+
Cached read latency	Less than 1ms
Uncached read latency	5-50ms (depends on file size)

Cache hit rates above 90% are common because Claude tends to read the same files multiple times during a session (checking context, verifying changes, etc.).

Memory Usage

Configuration	Typical Usage
Default (100MB cache)	80-120 MB
Large cache (500MB)	200-400 MB
Minimal (50MB cache)	50-80 MB

Memory usage is stable - the cache has a fixed maximum size and evicts old entries automatically.

Real Session Example

From an actual 2-hour coding session on a Go project:

Project stats:

47 files
12,000 lines of code
Mix of reading, editing, searching

Operations performed:

156 file reads
43 file edits
28 searches
12 directory listings

Token consumption:

Mode	Tokens Used	Cost (approximate)
Without MCP optimizations	~2,100,000	$6.30
With MCP + compact mode	~480,000	$1.44

Savings: 77% tokens, 77% cost

This is a typical session. Some sessions show higher savings (90%+) when doing lots of surgical edits. Some show lower savings (60%) when genuinely needing to read many full files.

Configuration for Maximum Savings

Recommended Settings

{
  "mcpServers": {
    "filesystem-ultra": {
      "command": "path/to/filesystem-ultra.exe",
      "args": [
        "--compact-mode",
        "--cache-size", "200MB",
        "--parallel-ops", "8",
        "--max-search-results", "50",
        "--max-list-items", "100",
        "--log-level", "error"
      ]
    }
  }
}

What Each Setting Does

Setting	Impact	Trade-off
`--compact-mode`	65-75% reduction	Less readable output
`--cache-size 200MB`	Faster repeated reads	Uses more memory
`--max-search-results 50`	80% reduction on searches	May miss some results
`--max-list-items 100`	70% reduction on listings	May truncate large directories
`--log-level error`	Slight reduction	Less debugging info

Honest Limitations

Where Optimization Does Not Help

First read of a new file: Must read the actual content, no cache benefit
Genuinely large outputs: If you ask Claude to generate 5000 lines of code, that costs tokens regardless
Complex searches: Regex searches with lots of context still produce substantial output
Binary files: Cannot be cached or optimized meaningfully

Where Results May Differ

Small projects: Less benefit from caching (fewer repeated reads)
Write-heavy workloads: Writing new content still costs output tokens
Network issues: Slow connections may cause timeouts regardless of optimization

Measuring Your Own Performance

Use server_info to see real-time metrics:

server_info({ action: "stats" })

Returns:

Performance Statistics

Uptime: 2h 15m
Total operations: 892
Operations/sec: 2,016 (peak), 110 (average)
Cache hit rate: 94.2%
Memory usage: 156 MB / 200 MB

Token estimates (this session):
  Without optimization: ~1,800,000
  With optimization: ~420,000
  Estimated savings: 76.7%

Pipeline Token Reduction (v3.14.0+)

The pipeline system eliminates the dominant source of token waste in multi-file workflows: MCP round-trip overhead.

The Round-Trip Problem

Every MCP tool call carries fixed overhead regardless of how simple the operation is:

Component	Tokens per Call
Request JSON serialization	~100-150
Claude’s reasoning about the call	~100-200
Response parsing + processing	~100-200
Total overhead per call	~300-600

For a 2-file refactor (search → read × 2 → edit × 2 → verify × 2 = 7 calls), that’s ~2,100-4,200 tokens of pure overhead before any useful work.

Real Benchmark Data

Measured on the mcp-filesystem-go-ultra codebase (core/ directory, 19 Go files):

Search + Read + Count (2 files)

Method	MCP Calls	Server Time	Overhead Tokens
Individual calls	5	~5ms	~2,100
Pipeline (1 call)	1	3.4ms	~400
Reduction	5×	1.5×	5×

Dry-Run Refactor (19 files, 261 occurrences)

Method	MCP Calls	Overhead Tokens
Individual calls	~22	~9,000
Pipeline (1 call)	1	~400
Reduction	22×	22×

When Pipelines Help Most

Scenario	Calls Saved	Best For
Search → Edit → Verify	3-7 → 1	Refactors
Search → Read N files	N+1 → 1	Code review
Search → Edit N → Count N	2N+1 → 1	Bulk migrations
Single file read	1 → 1	No benefit

Compact vs Verbose Output

Pipeline supports two modes to balance token usage vs data availability:

Compact (verbose: false): ~30 tokens output — OK: 3/3 steps | 2 files | 0 edits
Verbose (verbose: true): Full file contents (truncated at 50 lines), per-file counts, complete file lists

Use compact for edit workflows (save tokens), verbose when Claude needs to inspect or report results.

Internal Engine Optimizations (v3.15.0)

Three optimizations in v3.15.0 reduce overhead on every MCP operation, with no configuration changes needed:

1. AllowedPaths Pre-Resolution

The access control check (isPathAllowed) previously called filepath.EvalSymlinks() — a real I/O syscall — for every allowed base path, on every operation. With 5 allowed paths, that was 5 syscalls per read, write, edit, delete, or list.

Now the allowed paths are resolved once at server startup. The per-operation check iterates pre-resolved strings with zero I/O.

Metric	Before	After
EvalSymlinks calls per operation	N (one per allowed path)	0
EvalSymlinks calls at startup	0	N (once)

2. Regex Compilation Cache

Search functions (smart_search, advanced_text_search, count_occurrences) compiled regex patterns from scratch on every call. Now they use the engine’s CompileRegex() cache, which stores compiled patterns in a RWMutex-protected map. Repeated searches with the same pattern skip compilation entirely.

3. Extension Lookup Maps

File type detection (isTextFile, isBinaryFile) used O(n) linear scans through 45-entry slices. Replaced with O(1) map lookups against textExtensionsMap (70+ entries) and binaryExtensionsMap. Both maps are initialized once at package load time.

Summary

Claim	Reality
”5-22× fewer MCP calls”	Pipeline vs individual calls, measured on real codebase
”77% token reduction”	Typical across mixed sessions, measured over hundreds of hours
”2000+ ops/sec”	Peak throughput under load, not sustained average
”98% cache hit rate”	After warmup, on repeated operations
”99% savings on edits”	When using surgical edits vs full rewrites

These numbers are honest representations of what you can expect in real use. Individual results vary based on workload, but the optimizations provide substantial, measurable benefits in daily development work.

Next Steps

Pipeline System - Multi-step workflows in a single call
Claude Desktop Setup - Optimal configuration
Intelligent Operations - Auto-optimization
Efficient Editing - Token-saving workflows