Overview
Letta (formerly MemGPT) is a framework for building stateful LLM agents with persistent memory. It implements a virtual context management system that allows LLMs to operate beyond their context window limits by managing their own memory hierarchy. Agents can read and write to archival storage, maintain conversation history, and edit their own system prompts at runtime.
The framework provides a server-based architecture where agents run as persistent services with REST API access. Letta agents maintain core memory (always in context), recall memory (searchable conversation history), and archival memory (long-term vector storage). This enables applications like long-running personal assistants, document Q&A systems, and autonomous agents that learn over time.
Installation
pip Install
pip install letta
letta quickstart --backend openai
# Or use local models
letta quickstart --backend ollama
Docker
docker run -d \
--name letta-server \
-p 8283:8283 \
-v letta_data:/root/.letta \
-e OPENAI_API_KEY=sk-... \
letta/letta:latest
# Access server at http://localhost:8283
# Access ADE (Agent Development Environment) at http://localhost:8283/ade
Docker Compose
version: '3.8'
services:
letta:
image: letta/letta:latest
ports:
- "8283:8283"
environment:
- OPENAI_API_KEY=${OPENAI_API_KEY}
volumes:
- letta_data:/root/.letta
volumes:
letta_data:
From Source
git clone https://github.com/letta-ai/letta.git
cd letta
pip install -e ".[dev]"
letta server
Core Concepts
Memory Architecture
| Memory Type | Description | Persistence | Searchable |
|---|
| Core Memory | Always in LLM context (persona + human info) | Yes | No (always visible) |
| Recall Memory | Conversation history | Yes | Yes (search) |
| Archival Memory | Long-term vector storage | Yes | Yes (semantic) |
Python SDK
from letta import create_client
# Create a client
client = create_client()
# Create an agent
agent_state = client.create_agent(
name="my-assistant",
memory_blocks=[
{"label": "human", "value": "Name: Alice. Preferences: concise answers."},
{"label": "persona", "value": "You are a helpful research assistant."}
],
llm_config={"model": "gpt-4o", "model_endpoint_type": "openai"},
embedding_config={"embedding_endpoint_type": "openai", "embedding_model": "text-embedding-3-small"}
)
# Send a message
response = client.send_message(
agent_id=agent_state.id,
role="user",
message="What do you know about me?"
)
print(response.messages)
# Add to archival memory
client.insert_archival_memory(
agent_id=agent_state.id,
memory="Alice works at Acme Corp as a data scientist."
)
# Search archival memory
results = client.get_archival_memory(
agent_id=agent_state.id,
query="workplace"
)
REST API
# Create an agent
curl -X POST http://localhost:8283/v1/agents \
-H "Content-Type: application/json" \
-d '{
"name": "research-agent",
"memory_blocks": [
{"label": "human", "value": "User is a software engineer."},
{"label": "persona", "value": "You are a coding assistant."}
],
"llm": "gpt-4o",
"embedding": "text-embedding-3-small"
}'
# Send a message
curl -X POST http://localhost:8283/v1/agents/{agent_id}/messages \
-H "Content-Type: application/json" \
-d '{
"role": "user",
"message": "Help me optimize this SQL query"
}'
# Get agent memory
curl -X GET http://localhost:8283/v1/agents/{agent_id}/memory
# Search archival memory
curl -X GET "http://localhost:8283/v1/agents/{agent_id}/archival?query=python&count=5"
# List all agents
curl -X GET http://localhost:8283/v1/agents
# Delete an agent
curl -X DELETE http://localhost:8283/v1/agents/{agent_id}
| Tool | Function | Description |
|---|
send_message | Sends visible message to user | Primary communication |
core_memory_append | Adds to core memory block | Remember new facts |
core_memory_replace | Edits core memory block | Update existing facts |
archival_memory_insert | Stores in archival memory | Long-term storage |
archival_memory_search | Searches archival memory | Recall stored info |
conversation_search | Searches recall memory | Find past messages |
from letta import create_client
client = create_client()
# Define a custom tool
def get_weather(city: str) -> str:
"""Get the current weather for a city.
Args:
city: The city name to get weather for.
Returns:
A string describing the current weather.
"""
import requests
resp = requests.get(f"https://wttr.in/{city}?format=3")
return resp.text
# Register the tool
tool = client.create_tool(get_weather)
# Create agent with custom tool
agent = client.create_agent(
name="weather-agent",
tools=[tool.name, "send_message"]
)
Configuration
Server Configuration
# Environment variables
export LETTA_PG_URI=postgresql://user:pass@localhost:5432/letta
export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...
# Start server with custom config
letta server --port 8283 --host 0.0.0.0
# Configure default model
letta server --default-llm gpt-4o
Model Configuration
from letta import LLMConfig, EmbeddingConfig
llm_config = LLMConfig(
model="gpt-4o",
model_endpoint_type="openai",
model_endpoint="https://api.openai.com/v1",
context_window=128000
)
embedding_config = EmbeddingConfig(
embedding_endpoint_type="openai",
embedding_model="text-embedding-3-small",
embedding_dim=1536,
embedding_chunk_size=300
)
# Use local models via Ollama
local_llm = LLMConfig(
model="llama3.1:70b",
model_endpoint_type="ollama",
model_endpoint="http://localhost:11434",
context_window=8192
)
Advanced Usage
Multi-Agent Systems
from letta import create_client
client = create_client()
# Create specialized agents
researcher = client.create_agent(
name="researcher",
memory_blocks=[
{"label": "persona", "value": "You research topics thoroughly."},
{"label": "human", "value": ""}
]
)
writer = client.create_agent(
name="writer",
memory_blocks=[
{"label": "persona", "value": "You write clear summaries."},
{"label": "human", "value": ""}
]
)
# Orchestrate agents
research_result = client.send_message(
agent_id=researcher.id,
role="user",
message="Research the latest trends in RAG systems"
)
summary = client.send_message(
agent_id=writer.id,
role="user",
message=f"Summarize this research: {research_result.messages}"
)
Persistent Data Sources
# Attach a data source to an agent
source = client.create_source(name="company-docs")
# Load files into source
client.load_file_to_source(
source_id=source.id,
file_path="/path/to/documents/manual.pdf"
)
# Attach source to agent (populates archival memory)
client.attach_source_to_agent(
source_id=source.id,
agent_id=agent.id
)
Streaming Responses
# Stream agent responses
stream = client.send_message(
agent_id=agent.id,
role="user",
message="Write a detailed analysis",
stream=True
)
for chunk in stream:
if hasattr(chunk, 'content'):
print(chunk.content, end='', flush=True)
Troubleshooting
| Issue | Solution |
|---|
| Agent not responding | Check LLM API key and model availability |
| Memory not persisting | Verify database connection (PostgreSQL recommended) |
| Context window exceeded | Reduce core memory size or switch to larger context model |
| Archival search returns empty | Check embedding model config, ensure data was inserted |
| Slow response times | Use faster model, reduce archival search count |
| Tool execution fails | Check tool function signature matches docstring |
| Docker container exits | Check logs: docker logs letta-server |
| Connection refused on 8283 | Ensure server is running: letta server |
# Debug mode
letta server --debug
# Check server health
curl http://localhost:8283/v1/health
# View agent details
curl http://localhost:8283/v1/agents/{agent_id}
# Export agent state
curl http://localhost:8283/v1/agents/{agent_id}/export > agent_backup.json