Bug #5: Token Efficiency
Status: RESOLVED in v3.5.0 - v3.7.0 Category: Token Optimization Severity: High (Cost impact) Resolution Date: 2025
Problem
Section titled “Problem”Working with Claude Desktop on enterprise projects consumed enormous amounts of tokens. A typical 2-hour coding session could use 2+ million tokens (~$6-7 USD), making intensive AI-assisted development prohibitively expensive.
Root Causes
Section titled “Root Causes”- Verbose output: Every file listing, search result, and operation returned formatted, human-readable output with headers, separators, and decorations
- Full file operations: No way to read/write portions of files - always entire content
- No caching: Repeated reads of the same file consumed tokens each time
- Wasteful patterns: AI agents read entire files to make small changes
Impact
Section titled “Impact”| Operation | Tokens Used | Cost at Scale |
|---|---|---|
| List directory (50 files) | ~2,500 tokens | Adds up fast |
| Read 5,000-line file | ~125,000 tokens | Expensive |
| Write same file back | ~125,000 tokens | Very expensive |
| Single edit cycle | ~250,000 tokens | Unsustainable |
A single read-modify-write cycle on a large file could cost 250,000+ tokens.
Solution: 4-Phase Optimization
Section titled “Solution: 4-Phase Optimization”Phase 1: Compact Mode
Section titled “Phase 1: Compact Mode”Added --compact-mode flag that reduces output verbosity dramatically.
Before (verbose mode):
Directory listing for: C:\project\src
Type Name Size Modified----------------------------------------------------[DIR] components - 2024-01-15[FILE] index.ts 2.3 KB 2024-01-14[FILE] app.tsx 5.1 KB 2024-01-14...~50 tokens per entry
After (compact mode):
src: components/, index.ts(2KB), app.tsx(5KB), utils/, config.json~3 tokens per entry
Result: 65-75% reduction on listing operations
Phase 2: Range Operations
Section titled “Phase 2: Range Operations”Added tools to read/write specific portions of files.
New tools:
read_file_range- Read specific line rangechunked_read_file- Read file in chunkssmart_edit_file- Edit without loading full file
Before:
read_file({path: "large.go"}) // 125,000 tokens for 5000 linesAfter:
read_file_range({path: "large.go", start: 145, end: 160}) // 400 tokensResult: 99% reduction for targeted reads
Phase 3: Surgical Editing
Section titled “Phase 3: Surgical Editing”Optimized edit operations to avoid full file rewrites.
Before (full rewrite):
1. read_file("large.go") → 75,000 tokens (input)2. [Claude processes] → thinking tokens3. write_file("large.go") → 75,000 tokens (output)Total: ~150,000 tokensAfter (surgical edit):
1. smart_search("functionName") → 500 tokens2. read_file_range(lines 145-160) → 400 tokens3. edit_file(old_text, new_text) → 300 tokensTotal: ~1,200 tokensResult: 99% reduction on edit operations
Phase 4: Intelligent Caching
Section titled “Phase 4: Intelligent Caching”Implemented 3-tier caching system:
| Cache Tier | Purpose | Hit Rate |
|---|---|---|
| BigCache | File content | 85-95% |
| go-cache | Directory listings | 90%+ |
| go-cache | File metadata | 95%+ |
Repeated operations hit cache instead of re-processing.
Result: 85-95% cache hit rate after warmup
Implementation
Section titled “Implementation”Compact Mode Configuration
Section titled “Compact Mode Configuration”{ "mcpServers": { "filesystem-ultra": { "command": "filesystem-ultra.exe", "args": [ "--compact-mode", "--cache-size", "200MB" ] } }}Code Changes
Section titled “Code Changes”Compact formatting (core/engine.go):
func (e *UltraFastEngine) formatDirectoryListing(entries []os.DirEntry) string { if e.config.CompactMode { // Compact: "file1.go(2KB), file2.go(5KB), dir/" return e.formatCompact(entries) } // Verbose: Full table with headers return e.formatVerbose(entries)}Range reading (core/engine.go):
func (e *UltraFastEngine) ReadFileRange(path string, start, end int) (string, error) { // Only read requested lines scanner := bufio.NewScanner(file) for lineNum := 1; scanner.Scan(); lineNum++ { if lineNum >= start && lineNum <= end { result.WriteString(scanner.Text()) } if lineNum > end { break } } return result.String(), nil}Measured Results
Section titled “Measured Results”Real Session Comparison
Section titled “Real Session Comparison”From an actual 2-hour coding session on a Go project:
| Metric | Without Optimization | With Optimization |
|---|---|---|
| File reads | 156 operations | 156 operations |
| File edits | 43 operations | 43 operations |
| Searches | 28 operations | 28 operations |
| Total tokens | ~2,100,000 | ~480,000 |
| Cost | ~$6.30 | ~$1.44 |
Savings: 77% tokens, 77% cost
By Operation Type
Section titled “By Operation Type”| Operation Type | Before | After | Savings |
|---|---|---|---|
| File read | 15,000 tokens | 800 tokens | 95% |
| File write | 12,000 tokens | 600 tokens | 95% |
| File edit | 25,000 tokens | 1,200 tokens | 95% |
| Directory list | 8,000 tokens | 400 tokens | 95% |
| Search (10 results) | 5,000 tokens | 800 tokens | 84% |
New Capabilities
Section titled “New Capabilities”Telemetry Tool
Section titled “Telemetry Tool”Added get_edit_telemetry to monitor efficiency:
get_edit_telemetry()Response:
Edit Telemetry Summary
Total edits: 43Targeted edits: 38 (88%)Full rewrites: 5 (12%)
Goal: >80% targeted editsStatus: OPTIMALOptimization Suggestions
Section titled “Optimization Suggestions”Added get_optimization_suggestion for file-specific advice:
get_optimization_suggestion({path: "large.go"})Response:
File: large.goSize: 156 KBCategory: Medium
Recommendation: Use surgical operations- For reading: read_file_range or intelligent_read- For editing: edit_file or intelligent_edit- Avoid: read_file (unnecessary token cost)Backward Compatibility
Section titled “Backward Compatibility”100% backward compatible:
- Compact mode is opt-in via
--compact-modeflag - All original tools work unchanged
- New tools are additions, not replacements
Resolution Timeline
Section titled “Resolution Timeline”- v3.5.0: Compact mode, range operations
- v3.6.0: Multi-edit, cache improvements
- v3.7.0: Telemetry, optimization suggestions
- Status: Production Ready
Lessons Learned
Section titled “Lessons Learned”- Measure before optimizing - Token telemetry revealed the biggest waste areas
- Optimize the common case - Most edits are small changes, not full rewrites
- Make efficiency the default - Compact mode should be on by default for AI clients
- Cache aggressively - AI agents read the same files repeatedly
- Provide guidance - Tools like
get_optimization_suggestionhelp agents self-optimize