Zep - Context Engineering & Memory for Agents Cheatsheet
Zep is a memory and context-engineering layer for AI agents. Built on the Graphiti temporal knowledge graph engine, it ingests conversation history and business data, fuses them into a queryable graph that tracks how facts change over time, and returns relevant, governed context with low latency to ground agent responses. It offers an open-source core and a managed cloud service (SOC 2 / HIPAA), with SDKs for Python, TypeScript, and Go.
Installation / Setup
| Target | Command |
|---|
| Python SDK | pip install zep-cloud (cloud) |
| TypeScript SDK | npm install @getzep/zep-cloud |
| Self-hosted (Community Edition) | run via the project’s Docker Compose |
| API key | export ZEP_API_KEY=... |
Core Concepts
| Term | Meaning |
|---|
| User | An end user the agent serves |
| Thread | A conversation session for a user |
| Graph | The temporal knowledge graph of a user/group |
| Fact | A time-aware relationship in the graph |
| Context block | Assembled, ready-to-inject context string |
Users & Threads
from zep_cloud.client import Zep
zep = Zep(api_key="...")
zep.user.add(user_id="nick", email="nick@example.com")
zep.thread.create(thread_id="t1", user_id="nick")
| Call | Description |
|---|
user.add(...) | Create a user |
thread.create(...) | Start a conversation thread |
thread.add_messages(...) | Append messages (auto-ingested to the graph) |
user.delete(...) | Remove a user and their data |
Adding Memory
zep.thread.add_messages(
thread_id="t1",
messages=[{"role": "user", "content": "I moved to Berlin.", "name": "Nick"}],
)
# Add non-chat business data directly to the graph
zep.graph.add(user_id="nick", type="text",
data="Nick's subscription tier is Pro.")
| Call | Description |
|---|
thread.add_messages(...) | Ingest conversation turns |
graph.add(...) | Add arbitrary text/JSON to the graph |
| Ingestion | Entities/facts extracted and time-stamped automatically |
Retrieving Context
# Get an assembled context block for the prompt
memory = zep.thread.get_user_context(thread_id="t1")
print(memory.context) # ready-to-inject string of relevant facts
# Or query the graph directly
edges = zep.graph.search(user_id="nick", query="where does Nick live?")
| Call | Returns |
|---|
thread.get_user_context(...) | A synthesized context block |
graph.search(...) | Facts/edges or nodes matching a query |
| Search scope | edges (facts), nodes (entities), or episodes |
Why Temporal
Because Zep is graph-based and time-aware, contradictory updates do not overwrite blindly — old facts are invalidated with a timestamp and new ones recorded, so the agent gets the current truth while history stays queryable.
| Capability | Benefit |
|---|
| Fact invalidation | Current context stays accurate |
| Provenance | Trace facts to their source |
| Governed retrieval | Low-latency, permissioned context |
| Cross-session | Memory persists across threads |
Common Workflows
# The agent loop with Zep memory
zep.thread.add_messages(thread_id="t1", messages=user_turn)
context = zep.thread.get_user_context(thread_id="t1").context
# prepend `context` to your LLM system prompt, then generate
Zep vs Other Memory Layers
| Aspect | Zep | Mem0 | raw vector store |
|---|
| Model | Temporal graph (Graphiti) | Multi-tier | Embeddings only |
| Temporal facts | Yes | Limited | No |
| Context assembly | Built-in block | Retrieval | Manual |
| Best for | Production agent memory | Personalization | Simple recall |
Resources