MemGPT / Letta - OS-Style Agent Memory Cheatsheet
MemGPT is the technique — and Letta the framework that grew from it — for giving LLM agents operating-system-style memory management. The core idea: treat the context window like RAM (fast but small) and add “disk” in the form of searchable archival memory. The agent itself decides, via tool calls, what to keep in main context and what to page out to storage, letting it maintain coherent long-term memory far beyond the raw context limit. (The project is now developed as Letta.)
Installation
| Method | Command |
|---|
| pip | pip install letta |
| Run the server | letta server |
| Docker | docker run -p 8283:8283 letta/letta:latest |
| ADE (web UI) | connect the Agent Development Environment to the server |
| Verify | letta version |
Memory Architecture
| Tier | Analogy | Contents |
|---|
| Main context (core memory) | RAM | Persona + key facts always in the prompt |
| Recall memory | Recent files | Conversation history, searchable |
| Archival memory | Disk | Arbitrary long-term facts, searchable |
| The agent | The OS | Decides what to page in/out via tools |
Core Memory (Always In-Context)
| Block | Purpose |
|---|
persona | Who the agent is / how it behaves |
human | What it knows about the user |
| Custom blocks | Domain-specific always-present facts |
The agent edits these blocks with tools (core_memory_append, core_memory_replace) as it learns.
Creating an Agent
from letta_client import Letta
client = Letta(base_url="http://localhost:8283")
agent = client.agents.create(
name="assistant",
memory_blocks=[
{"label": "persona", "value": "I am a concise, helpful assistant."},
{"label": "human", "value": "The user's name is Nick."},
],
model="openai/gpt-4o",
embedding="openai/text-embedding-3-small",
)
response = client.agents.messages.create(
agent_id=agent.id,
messages=[{"role": "user", "content": "Remember I prefer dark mode."}],
)
| Tool (agent-invoked) | Action |
|---|
core_memory_append | Add to an always-in-context block |
core_memory_replace | Update a core memory block |
archival_memory_insert | Store a fact to archival (disk) |
archival_memory_search | Retrieve from archival memory |
conversation_search | Search recall memory |
Archival Memory
| Command | Description |
|---|
client.agents.passages.create(agent_id, text=...) | Insert an archival memory |
client.agents.passages.list(agent_id) | List stored passages |
| Agent search | Agent calls archival_memory_search automatically when relevant |
Persistence & State
| Feature | Note |
|---|
| Stateful agents | Agent state persists on the server across sessions |
| Storage | SQLite by default; PostgreSQL for production |
| Export/import | Serialize agents to move them between deployments |
| Multi-agent | Run and coordinate several stateful agents |
Common Workflows
# A long-running assistant that remembers across sessions
# 1) create once with persona/human blocks
# 2) each session, just send messages — Letta manages memory paging
client.agents.messages.create(agent_id=agent.id,
messages=[{"role": "user", "content": "What do you remember about me?"}])
# the agent searches recall/archival and answers with persistent context
MemGPT/Letta vs Other Memory
| Aspect | Letta (MemGPT) | Mem0 | Zep |
|---|
| Model | OS-style paging, agent-managed | Multi-tier store | Temporal graph |
| Statefulness | Server-side agents | Library | Service |
| Control | Agent decides paging | App decides | Service manages |
| Best for | Long-running autonomous agents | Personalization | Temporal facts |
Resources