Automatic I/O Strategy

Overview

In v4.0.0, there are no separate streaming tools. The three core tools — read_file, write_file, and edit_file — automatically select the optimal I/O strategy based on file size. You never need to choose between “direct read” and “chunked read” or between “write” and “streaming write.” The engine handles it transparently.

What Was Consolidated

v3 Tool	v4 Equivalent	What Happens Now
`streaming_write_file`	`write_file`	Streaming is automatic for large content
`chunked_read_file`	`read_file`	Chunking is automatic for large files
`smart_edit_file`	`edit_file`	Line-by-line processing is automatic for large files
`intelligent_read`	`read_file`	Intelligence is built-in
`intelligent_write`	`write_file`	Intelligence is built-in
`streaming_read_file`	`read_file`	Streaming is automatic when needed

File Size Thresholds

The engine classifies files into four tiers and selects the I/O strategy accordingly. These thresholds are defined in core/config.go and apply to all three core tools.

Tier	Size Range	Strategy	Description
Small	< 100 KB	Direct I/O	File is read/written entirely in a single operation. Fastest for most source code files.
Medium	100 KB — 500 KB	Streaming I/O	Uses buffered streaming with moderate memory allocation. Suitable for larger source files and small data files.
Large	500 KB — 5 MB	Chunked Processing	File is processed in adaptive chunks. Suitable for large data files, logs, and generated code.
Very Large	> 5 MB	Special Handling	Lazy loading, pagination, and progress reporting. Edit operations are rejected above 50 MB to prevent accidental destructive changes.

Buffer Size

All streaming and chunked operations use a 64 KB buffer (DefaultBufferSize), which is optimal for most disk I/O patterns on modern hardware.

How Each Tool Adapts

`read_file`

File Size	Behavior
< 100 KB	Reads entire file into memory and returns it
100 KB — 500 KB	Streams content with buffered reader
500 KB — 5 MB	Reads in chunks, assembles result
> 5 MB	Uses pagination; consider using `start_line`/`end_line` for specific ranges

For very large files, the most token-efficient approach is to read only the lines you need:

// Read the entire file (auto-selects strategy)
read_file({ path: "large-dataset.csv" })

// Read only lines 100-150 (most efficient for large files)
read_file({ path: "large-dataset.csv", start_line: 100, end_line: 150 })

// Read the last 50 lines (log tailing)
read_file({ path: "server.log", max_lines: 50, mode: "tail" })

`write_file`

Content Size	Behavior
< 100 KB	Direct write with atomic rename
100 KB — 500 KB	Streaming write with buffered writer
500 KB — 5 MB	Chunked write with progress tracking
> 5 MB	Streaming write with progress reporting

All writes are atomic — content is written to a temporary file first, then renamed to the target path. This prevents partial writes on failure.

// Small file (direct write)
write_file({ path: "config.json", content: '{"key": "value"}' })

// Large file (streaming is automatic)
write_file({ path: "data.csv", content: largeCSVContent })

`edit_file`

File Size	Behavior
< 100 KB	Loads file into memory, applies replacement, writes back
100 KB — 500 KB	Streaming read, in-memory replace, streaming write
500 KB — 5 MB	Line-by-line processing with the `LargeFileProcessor`
> 50 MB	Rejected — file is too large for safe editing

The 50 MB hard limit on edits exists to prevent accidental massive changes. If you need to modify files larger than 50 MB, split the operation or use external tools.

// Small file edit (direct)
edit_file({ path: "config.go", old_text: "v3.0.0", new_text: "v4.0.0" })

// Large file edit (line-by-line processing is automatic)
edit_file({ path: "large-module.go", old_text: "oldPattern", new_text: "newPattern" })

The Large File Processor

For files in the 500 KB — 5 MB range, the engine uses LargeFileProcessor (defined in core/large_file_processor.go), which supports three processing modes:

Mode	When Used	Description
In-Memory	File < 500 KB	Loads entire content, applies transformations
Line-by-Line	File 500 KB — 5 MB	Processes one line at a time with minimal memory
Chunk-Based	File > 5 MB (reads only)	Processes in fixed-size chunks

The processor is also used by multi_edit and the pipeline system’s regex_transform action.

Regex Transformations

The RegexTransformer (defined in core/regex_transformer.go) handles advanced regex-based edits. It applies to edit_file with mode:"regex" and the pipeline regex_transform action.

For large files, regex transformations use the same adaptive strategy:

// Regex transform (auto-selects strategy based on file size)
edit_file({
  path: "handlers.go",
  mode: "regex",
  patterns_json: JSON.stringify([
    {
      pattern: "func (\\w+)\\(\\)",
      replacement: "func $1(ctx context.Context)",
      limit: -1
    }
  ]),
  dry_run: true
})

When multiple regex patterns are applied, the transformer supports both sequential (pattern-by-pattern) and parallel execution modes, depending on whether the patterns are independent.

Memory Usage

The adaptive I/O strategy keeps memory usage predictable:

Scenario	Memory Overhead
Reading a 50 KB file (direct)	~50 KB
Streaming a 300 KB file	~64 KB buffer + partial content
Chunked read of a 3 MB file	~64 KB buffer at any point
Editing a 2 MB file (line-by-line)	~64 KB buffer + current line

The engine’s cache (default 100 MB, configurable via --cache-size) stores recently accessed file contents, directory listings, and metadata. Cache eviction is automatic when the limit is reached.

Progress and Timeouts

Operation Timeout

All file operations have a default timeout of 30 seconds (DefaultOperationTimeout). For very large files that take longer to process, the engine extends the timeout automatically.

Audit Logging

When --log-dir is configured, each operation logs its duration, file size, and strategy used. This data appears in the dashboard’s Operations page and can be used to identify bottlenecks.

// Check performance stats to see I/O strategy distribution
server_info({ action: "stats" })

When to Use Line Ranges Instead

Even though the engine handles large files automatically, reading only the lines you need is always more token-efficient. For files over 100 KB, consider using start_line/end_line instead of reading the full file:

// Instead of reading a 500 KB file entirely:
read_file({ path: "large-module.go" })  // ~12,500 tokens

// Read only the section you need:
search_files({ path: "large-module.go", pattern: "targetFunction", include_content: true })
// Then:
read_file({ path: "large-module.go", start_line: 200, end_line: 250 })  // ~1,250 tokens

This is the single biggest token optimization available — a 90%+ reduction for targeted reads on large files.

Configuration

The size thresholds are compiled constants and cannot be changed at runtime. However, you can influence I/O behavior with these server flags:

Flag	Default	Effect on I/O
`--cache-size`	100 MB	Larger cache reduces disk reads for repeated access
`--parallel-ops`	2x CPU cores (max 16)	More concurrent operations for batch workloads
`--compact-mode`	false	Reduces response size (65-75% token savings)

See Configuration for the complete CLI reference.