World Context Protocol
The bridge between AI agents and the physical world, evolving from today's Model Context Protocol to enable true world-aware intelligence.
Model Context Protocol Today
Understanding how we currently interface with Language Foundation Models
How we construct, manage, and inject context into LLM sessions today
Current Characteristics
- • Linear and stateless: Past context packed into prompts
- • Token-limited: Entire context fits within token windows
- • Text-based: Structured through language and JSON
- • Explicit injection: Manual context inclusion
Current Use Cases
- • Prompt engineering and template management
- • RAG (Retrieval-Augmented Generation) systems
- • Tool calling and function execution
- • Memory injection via vector search
The Four Phases of Context Protocol Evolution
From static prompts to dynamic world interfaces
LLMs like GPT-4, Claude 3, Gemini 1.5
Function
Serial, stateless communication: "Prompt → Output"
Context
Token-based windows, manual injection
Tools
LangChain, RAG systems, prompt templates
LLMs with memory, agents, and tool usage
Function
State ↔ Query ↔ Tools ↔ Model ↔ Feedback
Context
External tools, long-term memory, perceptual inputs
Tools
Agents, function calling, memory APIs
Early WFM systems with simulation and sensors
Function
Spatial world models, simulation state, physical causality
Context
Multimodal streams, time-aware memory
Tools
3D spatial maps, sensor fusion, interactive planners
True WFM systems with embodied experience
Function
Ongoing, living interface with continuous world models
Context
Real-time sensory inputs, social understanding
Tools
ROS-like AI protocols, multi-agent collaboration
WCP Technical Specification
Phase 4 World Context Protocol JSON structure and implementation
{
"timestamp": "2025-06-03T17:30:00Z",
"agent": {
"id": "wfm-007",
"name": "EVA",
"role": "Home Assistant",
"memory_snapshot": {
"user_preferences": { "music_genre": "ambient" },
"object_locations": { "keys": "entry_table" }
}
},
"environment": {
"location": "home_kitchen",
"map": {
"objects": [
{
"id": "mug01",
"type": "mug",
"location": [1.2, 0.8, 0.9],
"state": "on_table"
}
]
},
"sensors": {
"vision": { "active_objects": ["mug01"] },
"audio": { "last_transcript": "Clean the kitchen" }
}
},
"intent": {
"user_command": "clean under the table",
"parsed_goal": {
"action": "clean_area",
"target": "under_table"
}
},
"planning": {
"current_plan": [
{ "step": "locate vacuum_bot", "status": "complete" },
{ "step": "navigate to table", "status": "in_progress" }
]
},
"simulation": {
"predicted_outcome": "success likely",
"confidence": 0.92
},
"response": {
"text": "Starting cleaning under the table now.",
"speech": "playing"
}
}Identity, memory snapshot, and internal state of the AI agent operating in the world.
Physical and sensory description of the world, including object locations and sensor data.
Parsed user instruction and AI interpretation of goals and objectives.
Current task breakdown and execution state with progress tracking.
Predicted outcomes and confidence levels from internal world modeling.
Multi-modal output including text, speech, and physical actions.
WCP ⇄ Agent Bridge Architecture
How World Foundation Models interface with real and simulated environments
IoT, Simulation]
Model Agent]
World Context Collector
Transforms raw inputs into structured WCP messages
Intent + Planning Pipeline
Converts commands into structured goals and plans
Agent Loop Hook
Connects WCP to core World Foundation Model
Action Dispatch
Executes decisions in real or simulated environments
State Synchronization
Redis, Convex, Kafka Streams
Memory & Timeline
Temporal.io, EventStore, Pinecone
Simulation & Planning
Three.js, Unity, IsaacSim, ROS2
Agent Interface
OpenAI Functions, LangGraph, Transformers
MCP vs WCP Comparison
Understanding the fundamental differences in approach and capability
| Feature | MCP (Today) | WCP (Future) |
|---|---|---|
| Context Format | Text + JSON tool calls | Structured multimodal world graph |
| Session Model | Stateless or short-term context | Persistent agent state across time |
| Interaction | Request/response (chat turn) | Continuous interaction loop |
| Input Types | Text, structured data | Vision, audio, sensors, spatial state |
| Memory | Vector search (RAG) | Embedded memory + world map |
| Grounding | Weak (statistical patterns) | Strong (sensorimotor feedback) |
Build the Future Interface
Start exploring how the World Context Protocol will enable the next generation of AI systems that truly understand and interact with our world.