Context Engineering: The Next Frontier in Multi-Agent AI Systems
December 8, 2025
The landscape of AI agent development has evolved dramatically. We've moved beyond simple chatbots to deploying sophisticated, autonomous agents capable of handling long-horizon tasks—automating complex workflows, conducting deep research, and maintaining intricate codebases.
But this ambition immediately hits a critical bottleneck: context.
As agents run longer, the amount of information they need to track explodes: chat history, tool outputs, external documents, intermediate reasoning steps. The industry's response has been to lean on ever-larger context windows. But simply giving agents more space to paste text cannot be the only scaling strategy.
The Context Scaling Problem
According to Google's recent announcement, the naive pattern of "append everything into one giant prompt" collapses under three-way pressure:
1. Cost and Latency Spirals
Model cost and time-to-first-token grow rapidly with context size. Shoveling raw history and verbose tool payloads into the window makes agents prohibitively slow and expensive in production environments.
2. Signal Degradation
A context window flooded with irrelevant logs, stale tool outputs, or deprecated state can distract the model. This "lost in the middle" problem causes agents to fixate on past patterns rather than immediate instructions, degrading decision-making quality.
3. Physical Limits
Real-world workloads—involving full RAG results, intermediate artifacts, and long conversation traces—eventually overflow even the largest fixed windows.
Throwing more tokens at the problem buys time, but it doesn't change the shape of the curve.
Context as a Compiled View
Google's Agent Development Kit (ADK) introduces a fundamental shift in thinking: Context is a compiled view over a richer stateful system.
In this paradigm:
- Sessions, memory, and artifacts are the sources – the full, structured state of the interaction
- Flows and processors are the compiler pipeline – transforming that state
- The working context is the compiled view shipped to the LLM for each invocation
This mental model transforms context engineering from prompt gymnastics into proper systems engineering. You're forced to ask standard systems questions: What is the intermediate representation? Where do we apply compaction? How do we make transformations observable?
Three Core Principles
ADK's architecture is built on three design principles:
1. Separate Storage from Presentation
Distinguish between durable state (Sessions) and per-call views (working context). This allows independent evolution of storage schemas and prompt formats.
2. Explicit Transformations
Context is built through named, ordered processors—not ad-hoc string concatenation. This makes the "compilation" step observable and testable.
3. Scope by Default
Every model call and sub-agent sees only the minimum required context. Agents must explicitly reach for more information via tools, rather than being flooded by default.
The Tiered Context Model
ADK organizes context into distinct layers:
┌─────────────────────────────────┐
│ Working Context │ ← Immediate prompt for this call
├─────────────────────────────────┤
│ Session │ ← Durable log of interaction
├─────────────────────────────────┤
│ Memory │ ← Long-lived searchable knowledge
├─────────────────────────────────┤
│ Artifacts │ ← Large data addressed by reference
└─────────────────────────────────┘
Working Context: A Recomputed View
For each invocation, ADK rebuilds the working context from underlying state. It's:
- Ephemeral (thrown away after the call)
- Configurable (change formatting without migrating storage)
- Model-agnostic (works across different LLM providers)
Flows and Processors: The Pipeline
Every LLM-based agent is backed by an LLM Flow, which maintains ordered lists of processors. For example:
self.request_processors += [
basic.request_processor,
auth_preprocessor.request_processor,
instructions.request_processor,
identity.request_processor,
contents.request_processor,
context_cache_processor.request_processor,
]
These processors form ADK's machinery to compile context. The order matters—each builds on previous outputs, giving you natural insertion points for custom filtering, compaction, and routing.
Relevance: What Matters Now
Once structure is established, the challenge becomes relevance: Given a tiered context architecture, what specific information belongs in the model's active window right now?
ADK answers this through collaboration between human domain knowledge and agentic decision-making.
Artifacts: Externalizing Large State
ADK treats large data as Artifacts—named, versioned objects managed by an ArtifactService. Instead of dumping a 5MB CSV into chat history, agents see only a lightweight reference. When needed, they use LoadArtifactsTool to temporarily load content.
This handle pattern turns "5MB of noise in every prompt" into a precise, on-demand resource.
Memory: Long-Term Knowledge on Demand
The MemoryService manages long-lived semantic knowledge through two patterns:
- Reactive recall: Agent recognizes a gap and explicitly calls
load_memory_tool - Proactive recall: System runs similarity search before model invocation
This replaces "context stuffing" with a "memory-based" workflow.
Multi-Agent Context Management
Single-agent systems struggle with context bloat; multi-agent systems amplify it. If a root agent passes its full history to a sub-agent, you trigger context explosion.
Two Interaction Patterns
-
Agents as Tools: Root agent treats specialized agents as functions—call with focused prompt, get result, move on. Callee sees only specific instructions.
-
Agent Transfer: Control is fully handed off. Sub-agent inherits a view over the Session and drives the workflow.
Scoped Handoffs
ADK's include_contents parameter controls how much context flows from root to sub-agent:
- Default mode: Pass full working context
- None mode: Sub-agent sees no prior history, only new prompt
Because sub-agent context is also built via processors, handoff rules plug into the same pipeline. No separate machinery needed.
Production Implications
This architecture shift has real production consequences:
Cost efficiency: Precise context management can reduce token usage by 50-70% in multi-turn conversations.
Latency optimization: Context caching divides windows into stable prefixes and variable suffixes, reducing recomputation.
Reliability: Explicit transformations make context flow observable and debuggable.
Scalability: Tiered storage + on-demand retrieval breaks through fixed window limits.
The Path Forward
As we push agents to tackle longer horizons, "context management" can no longer mean "string manipulation." It must be treated as an architectural concern alongside storage and compute.
Context engineering is systems engineering for the LLM era.
Google's ADK represents a maturation of the field—moving from interesting prototypes to scalable, reliable production systems. The tiered storage model, compiled views, and explicit processing pipelines demonstrate that building robust AI agents requires the same rigor we apply to distributed systems and databases.
The question isn't whether your agents need context engineering. It's whether you'll do it explicitly and systematically, or implicitly and chaotically.
Based on Google's announcement of their Agent Development Kit approach to production multi-agent systems. Read the full technical deep-dive.