MemGPT / Letta - OS-Style Agent Memory Cheatsheet

MemGPT is the technique — and Letta the framework that grew from it — for giving LLM agents operating-system-style memory management. The core idea: treat the context window like RAM (fast but small) and add “disk” in the form of searchable archival memory. The agent itself decides, via tool calls, what to keep in main context and what to page out to storage, letting it maintain coherent long-term memory far beyond the raw context limit. (The project is now developed as Letta.)

Installation

Method	Command
pip	`pip install letta`
Run the server	`letta server`
Docker	`docker run -p 8283:8283 letta/letta:latest`
ADE (web UI)	connect the Agent Development Environment to the server
Verify	`letta version`

Memory Architecture

Tier	Analogy	Contents
Main context (core memory)	RAM	Persona + key facts always in the prompt
Recall memory	Recent files	Conversation history, searchable
Archival memory	Disk	Arbitrary long-term facts, searchable
The agent	The OS	Decides what to page in/out via tools

Core Memory (Always In-Context)

Block	Purpose
`persona`	Who the agent is / how it behaves
`human`	What it knows about the user
Custom blocks	Domain-specific always-present facts

The agent edits these blocks with tools (core_memory_append, core_memory_replace) as it learns.

Creating an Agent

from letta_client import Letta

client = Letta(base_url="http://localhost:8283")

agent = client.agents.create(
    name="assistant",
    memory_blocks=[
        {"label": "persona", "value": "I am a concise, helpful assistant."},
        {"label": "human", "value": "The user's name is Nick."},
    ],
    model="openai/gpt-4o",
    embedding="openai/text-embedding-3-small",
)

Messaging & Memory Tools

response = client.agents.messages.create(
    agent_id=agent.id,
    messages=[{"role": "user", "content": "Remember I prefer dark mode."}],
)

Tool (agent-invoked)	Action
`core_memory_append`	Add to an always-in-context block
`core_memory_replace`	Update a core memory block
`archival_memory_insert`	Store a fact to archival (disk)
`archival_memory_search`	Retrieve from archival memory
`conversation_search`	Search recall memory

Archival Memory

Command	Description
`client.agents.passages.create(agent_id, text=...)`	Insert an archival memory
`client.agents.passages.list(agent_id)`	List stored passages
Agent search	Agent calls `archival_memory_search` automatically when relevant

Persistence & State

Feature	Note
Stateful agents	Agent state persists on the server across sessions
Storage	SQLite by default; PostgreSQL for production
Export/import	Serialize agents to move them between deployments
Multi-agent	Run and coordinate several stateful agents

Common Workflows

# A long-running assistant that remembers across sessions
# 1) create once with persona/human blocks
# 2) each session, just send messages — Letta manages memory paging
client.agents.messages.create(agent_id=agent.id,
    messages=[{"role": "user", "content": "What do you remember about me?"}])
# the agent searches recall/archival and answers with persistent context

MemGPT/Letta vs Other Memory

Aspect	Letta (MemGPT)	Mem0	Zep
Model	OS-style paging, agent-managed	Multi-tier store	Temporal graph
Statefulness	Server-side agents	Library	Service
Control	Agent decides paging	App decides	Service manages
Best for	Long-running autonomous agents	Personalization	Temporal facts

MemGPT / Letta - OS-Style Agent Memory Cheatsheet

MemGPT / Letta - OS-Style Agent Memory Cheatsheet

Installation

Memory Architecture

Core Memory (Always In-Context)

Creating an Agent

Messaging & Memory Tools

Archival Memory

Persistence & State

Common Workflows

MemGPT/Letta vs Other Memory

Resources