Bug #5: Token Efficiency

Status: RESOLVED in v3.5.0 - v3.7.0 Category: Token Optimization Severity: High (Cost impact) Resolution Date: 2025

Problem

Working with Claude Desktop on enterprise projects consumed enormous amounts of tokens. A typical 2-hour coding session could use 2+ million tokens (~$6-7 USD), making intensive AI-assisted development prohibitively expensive.

Root Causes

Verbose output: Every file listing, search result, and operation returned formatted, human-readable output with headers, separators, and decorations
Full file operations: No way to read/write portions of files - always entire content
No caching: Repeated reads of the same file consumed tokens each time
Wasteful patterns: AI agents read entire files to make small changes

Impact

Operation	Tokens Used	Cost at Scale
List directory (50 files)	~2,500 tokens	Adds up fast
Read 5,000-line file	~125,000 tokens	Expensive
Write same file back	~125,000 tokens	Very expensive
Single edit cycle	~250,000 tokens	Unsustainable

A single read-modify-write cycle on a large file could cost 250,000+ tokens.

Solution: 4-Phase Optimization

Phase 1: Compact Mode

Added --compact-mode flag that reduces output verbosity dramatically.

Before (verbose mode):

Directory listing for: C:\project\src

Type    Name                    Size        Modified
----------------------------------------------------
[DIR]   components              -           2024-01-15
[FILE]  index.ts                2.3 KB      2024-01-14
[FILE]  app.tsx                 5.1 KB      2024-01-14
...

~50 tokens per entry

After (compact mode):

src: components/, index.ts(2KB), app.tsx(5KB), utils/, config.json

~3 tokens per entry

Result: 65-75% reduction on listing operations

Phase 2: Range Operations

Added tools to read/write specific portions of files.

New tools:

read_file_range - Read specific line range
chunked_read_file - Read file in chunks
smart_edit_file - Edit without loading full file

Before:

read_file({path: "large.go"})  // 125,000 tokens for 5000 lines

After:

read_file_range({path: "large.go", start: 145, end: 160})  // 400 tokens

Result: 99% reduction for targeted reads

Phase 3: Surgical Editing

Optimized edit operations to avoid full file rewrites.

Before (full rewrite):

1. read_file("large.go")     → 75,000 tokens (input)
2. [Claude processes]        → thinking tokens
3. write_file("large.go")    → 75,000 tokens (output)
Total: ~150,000 tokens

After (surgical edit):

1. smart_search("functionName")    → 500 tokens
2. read_file_range(lines 145-160)  → 400 tokens
3. edit_file(old_text, new_text)   → 300 tokens
Total: ~1,200 tokens

Result: 99% reduction on edit operations

Phase 4: Intelligent Caching

Implemented 3-tier caching system:

Cache Tier	Purpose	Hit Rate
BigCache	File content	85-95%
go-cache	Directory listings	90%+
go-cache	File metadata	95%+

Repeated operations hit cache instead of re-processing.

Result: 85-95% cache hit rate after warmup

Implementation

Compact Mode Configuration

{
  "mcpServers": {
    "filesystem-ultra": {
      "command": "filesystem-ultra.exe",
      "args": [
        "--compact-mode",
        "--cache-size", "200MB"
      ]
    }
  }
}

Code Changes

Compact formatting (core/engine.go):

func (e *UltraFastEngine) formatDirectoryListing(entries []os.DirEntry) string {
    if e.config.CompactMode {
        // Compact: "file1.go(2KB), file2.go(5KB), dir/"
        return e.formatCompact(entries)
    }
    // Verbose: Full table with headers
    return e.formatVerbose(entries)
}

Range reading (core/engine.go):

func (e *UltraFastEngine) ReadFileRange(path string, start, end int) (string, error) {
    // Only read requested lines
    scanner := bufio.NewScanner(file)
    for lineNum := 1; scanner.Scan(); lineNum++ {
        if lineNum >= start && lineNum <= end {
            result.WriteString(scanner.Text())
        }
        if lineNum > end {
            break
        }
    }
    return result.String(), nil
}

Measured Results

Real Session Comparison

From an actual 2-hour coding session on a Go project:

Metric	Without Optimization	With Optimization
File reads	156 operations	156 operations
File edits	43 operations	43 operations
Searches	28 operations	28 operations
Total tokens	~2,100,000	~480,000
Cost	~$6.30	~$1.44

Savings: 77% tokens, 77% cost

By Operation Type

Operation Type	Before	After	Savings
File read	15,000 tokens	800 tokens	95%
File write	12,000 tokens	600 tokens	95%
File edit	25,000 tokens	1,200 tokens	95%
Directory list	8,000 tokens	400 tokens	95%
Search (10 results)	5,000 tokens	800 tokens	84%

New Capabilities

Telemetry Tool

Added get_edit_telemetry to monitor efficiency:

get_edit_telemetry()

Response:

Edit Telemetry Summary

Total edits: 43
Targeted edits: 38 (88%)
Full rewrites: 5 (12%)

Goal: >80% targeted edits
Status: OPTIMAL

Optimization Suggestions

Added get_optimization_suggestion for file-specific advice:

get_optimization_suggestion({path: "large.go"})

Response:

File: large.go
Size: 156 KB
Category: Medium

Recommendation: Use surgical operations
- For reading: read_file_range or intelligent_read
- For editing: edit_file or intelligent_edit
- Avoid: read_file (unnecessary token cost)

Backward Compatibility

100% backward compatible:

Compact mode is opt-in via --compact-mode flag
All original tools work unchanged
New tools are additions, not replacements

Resolution Timeline

v3.5.0: Compact mode, range operations
v3.6.0: Multi-edit, cache improvements
v3.7.0: Telemetry, optimization suggestions
Status: Production Ready

Lessons Learned

Measure before optimizing - Token telemetry revealed the biggest waste areas
Optimize the common case - Most edits are small changes, not full rewrites
Make efficiency the default - Compact mode should be on by default for AI clients
Cache aggressively - AI agents read the same files repeatedly
Provide guidance - Tools like get_optimization_suggestion help agents self-optimize