Why Stateful Continuations Are a Game-Changer for AI Agents: The Transport Layer Revolution (2026)

Stateful Continuation for AI Agents: Why Transport Layers Now Matter

Anirudh Mendiratta

The world of AI coding agents is evolving rapidly, and with it, the importance of efficient data transmission is becoming increasingly clear. In this article, I'll delve into the significance of transport layers in AI agent workflows, particularly in the context of stateful continuation. I'll explore how this technology can dramatically reduce overhead and improve performance, but also discuss the trade-offs and challenges it presents.

The Evolution of AI Coding Agents

AI coding agents have gone from novelty to a daily workflow for many organizations since December 2025. Tools like Claude Code, OpenAI Codex, Cursor, and Cline now routinely perform multi-file edits, run test suites, and iterate on failing builds. OpenAI reports over 1.6 million weekly active users on Codex alone, with a typical engineer on the Codex team running 4-8 parallel agents.

The core of these agents is the "agent loop": a cycle of model inference and tool execution that repeats until the task is complete. A single turn of the agent loop typically involves reading several files to understand the codebase, editing some files, and running tests, which involves 10-15 tool calls, often more for complex refactoring.

The HTTP Overhead Problem

With HTTP-based APIs, including OpenAI's Responses API over HTTP and the older Chat Completions API, each turn is a stateless request. The server doesn't remember what happened on the previous turn, so the client must resend everything: system instructions and tool definitions, the original user prompt, every prior model output, and every tool call result.

This means the request payload grows linearly with each turn. In our benchmarks, we measured the actual per-turn bytes sent by the client over HTTP versus WebSocket. By turn 9, HTTP is sending nearly 10x as much data per request as WebSocket.

The Transport Layer Matters

This experience highlighted something that's becoming increasingly relevant as AI coding agents mature: the transport layer matters more for agentic workflows than for simple chat. A single-turn chat completion sends a prompt and gets a response. An agentic coding session involves 10, 20, or sometimes 50+ sequential turns in which the model reads code, proposes changes, runs tests, reads error output, fixes issues, and iterates.

With each turn, the conversation context grows, and over HTTP, that entire growing context must be retransmitted every time. This growing payload is a bottleneck, especially over bandwidth-constrained links.

Stateful Continuation: The Solution

Stateful continuation cuts overhead dramatically. Caching context server-side can reduce client-sent data by 80%+ and improve execution time by 15–29%. The benefit is architectural, not protocol-specific. Any approach that avoids retransmitting context can achieve similar gains.

In February 2026, OpenAI introduced WebSocket mode for their responses API, which caches the conversation history in the server memory to solve this problem. I was excited to try it out and see how it performs compared to HTTP.

Benchmarking the Claims

To validate these claims with controlled measurements, we built a benchmark harness that simulates realistic agentic coding workflows against OpenAI's Responses API. The harness is open source and available on GitHub.

We defined three coding tasks of varying complexity and measured TTFT (Time to First Token), bytes sent, bytes received, and total time for each task.

Key Findings

  1. WebSocket consistently reduces client-sent data by 80-86%. This is the most reliable finding, independent of model, API variance, or task complexity. HTTP sends 153-176 KB per task; WebSocket sends 21-32 KB.

  2. WebSocket delivers 15-29% faster end-to-end execution. With GPT-5.4, WebSocket was 29% faster — roughly consistent with Cline's reported 39% on complex workflows.

  3. First-turn TTFT is similar across approaches. The WebSocket handshake doesn't add meaningful overhead — first-turn TTFT was within noise of HTTP for both models.

Why It's Faster: The Architecture

The performance difference is a direct consequence of eliminating redundant data transmission. HTTP is stateless by design, while WebSocket is stateful.

The Bandwidth Math

Using our actual GPT-5.4 data for a typical 10-turn coding task, HTTP total bytes sent (client → server) is 176 KB per task, while WebSocket total bytes sent is 32 KB per task. That's an 82% reduction in client-sent bytes.

Architectural Lessons

  1. API Compatibility vs Performance: The Protocol Tax

The OpenAI-compatible HTTP API is the de facto standard. But this compatibility comes at a cost: the API is inherently stateless, requiring full context to be retransmitted on every request.

  1. Protocol Overhead at Scale: When Bytes Per Turn Matter

At the scale of agentic coding in 2026, the overhead of resending context is significant. For a single major provider, we estimate roughly 1 million concurrent agentic coding sessions at peak.

  1. Server-Side State: The Real Innovation

The key insight is that WebSocket isn't faster because of the protocol — TCP-based WebSocket has similar framing overhead to HTTP/2. The speed comes from server-side state management.

  1. The Statefulness Spectrum

Different approaches to the context accumulation problem offer different trade-offs. The sweet spot for most agentic workflows is WebSocket + store=false.

  1. Parallel Execution: Multiple Connections, Not Multiplexing

For parallel tasks, you need separate WebSocket connections. The bandwidth savings from WebSocket still apply per-connection, but concurrent connections may hit API rate limits more aggressively than concurrent HTTP requests.

When HTTP Is Still the Right Choice

WebSocket mode isn't universally better. Use HTTP for simple, few-turn interactions, multi-provider support, stateless infrastructure, and debugging and observability.

Conclusion

For agentic coding workflows, the move from stateless HTTP to stateful WebSocket connections delivers meaningful performance improvements. But the WebSocket advantage comes with a trade-off: it's currently OpenAI-specific, creating provider lock-in. The question is whether the industry converges on a standard for stateful LLM continuation, or whether this remains a provider-specific competitive advantage.

The benchmarking harness and all results are available on GitHub.

Why Stateful Continuations Are a Game-Changer for AI Agents: The Transport Layer Revolution (2026)

References

Top Articles
Latest Posts
Recommended Articles
Article information

Author: Francesca Jacobs Ret

Last Updated:

Views: 5561

Rating: 4.8 / 5 (48 voted)

Reviews: 87% of readers found this page helpful

Author information

Name: Francesca Jacobs Ret

Birthday: 1996-12-09

Address: Apt. 141 1406 Mitch Summit, New Teganshire, UT 82655-0699

Phone: +2296092334654

Job: Technology Architect

Hobby: Snowboarding, Scouting, Foreign language learning, Dowsing, Baton twirling, Sculpting, Cabaret

Introduction: My name is Francesca Jacobs Ret, I am a innocent, super, beautiful, charming, lucky, gentle, clever person who loves writing and wants to share my knowledge and understanding with you.