Letta Cheat Sheet

Overview

Letta (formerly MemGPT) is a framework for building stateful LLM agents with persistent memory. It implements a virtual context management system that allows LLMs to operate beyond their context window limits by managing their own memory hierarchy. Agents can read and write to archival storage, maintain conversation history, and edit their own system prompts at runtime.

The framework provides a server-based architecture where agents run as persistent services with REST API access. Letta agents maintain core memory (always in context), recall memory (searchable conversation history), and archival memory (long-term vector storage). This enables applications like long-running personal assistants, document Q&A systems, and autonomous agents that learn over time.

Installation

pip Install

pip install letta
letta quickstart --backend openai
# Or use local models
letta quickstart --backend ollama

Docker

docker run -d \
  --name letta-server \
  -p 8283:8283 \
  -v letta_data:/root/.letta \
  -e OPENAI_API_KEY=sk-... \
  letta/letta:latest

# Access server at http://localhost:8283
# Access ADE (Agent Development Environment) at http://localhost:8283/ade

Docker Compose

version: '3.8'
services:
  letta:
    image: letta/letta:latest
    ports:
      - "8283:8283"
    environment:
      - OPENAI_API_KEY=${OPENAI_API_KEY}
    volumes:
      - letta_data:/root/.letta

volumes:
  letta_data:

From Source

git clone https://github.com/letta-ai/letta.git
cd letta
pip install -e ".[dev]"
letta server

Core Concepts

Memory Architecture

Memory Type	Description	Persistence	Searchable
Core Memory	Always in LLM context (persona + human info)	Yes	No (always visible)
Recall Memory	Conversation history	Yes	Yes (search)
Archival Memory	Long-term vector storage	Yes	Yes (semantic)

Python SDK

from letta import create_client

# Create a client
client = create_client()

# Create an agent
agent_state = client.create_agent(
    name="my-assistant",
    memory_blocks=[
        {"label": "human", "value": "Name: Alice. Preferences: concise answers."},
        {"label": "persona", "value": "You are a helpful research assistant."}
    ],
    llm_config={"model": "gpt-4o", "model_endpoint_type": "openai"},
    embedding_config={"embedding_endpoint_type": "openai", "embedding_model": "text-embedding-3-small"}
)

# Send a message
response = client.send_message(
    agent_id=agent_state.id,
    role="user",
    message="What do you know about me?"
)
print(response.messages)

# Add to archival memory
client.insert_archival_memory(
    agent_id=agent_state.id,
    memory="Alice works at Acme Corp as a data scientist."
)

# Search archival memory
results = client.get_archival_memory(
    agent_id=agent_state.id,
    query="workplace"
)

REST API

# Create an agent
curl -X POST http://localhost:8283/v1/agents \
  -H "Content-Type: application/json" \
  -d '{
    "name": "research-agent",
    "memory_blocks": [
      {"label": "human", "value": "User is a software engineer."},
      {"label": "persona", "value": "You are a coding assistant."}
    ],
    "llm": "gpt-4o",
    "embedding": "text-embedding-3-small"
  }'

# Send a message
curl -X POST http://localhost:8283/v1/agents/{agent_id}/messages \
  -H "Content-Type: application/json" \
  -d '{
    "role": "user",
    "message": "Help me optimize this SQL query"
  }'

# Get agent memory
curl -X GET http://localhost:8283/v1/agents/{agent_id}/memory

# Search archival memory
curl -X GET "http://localhost:8283/v1/agents/{agent_id}/archival?query=python&count=5"

# List all agents
curl -X GET http://localhost:8283/v1/agents

# Delete an agent
curl -X DELETE http://localhost:8283/v1/agents/{agent_id}

Agent Tools

Built-in Tools

Tool	Function	Description
`send_message`	Sends visible message to user	Primary communication
`core_memory_append`	Adds to core memory block	Remember new facts
`core_memory_replace`	Edits core memory block	Update existing facts
`archival_memory_insert`	Stores in archival memory	Long-term storage
`archival_memory_search`	Searches archival memory	Recall stored info
`conversation_search`	Searches recall memory	Find past messages

Custom Tools

from letta import create_client

client = create_client()

# Define a custom tool
def get_weather(city: str) -> str:
    """Get the current weather for a city.

    Args:
        city: The city name to get weather for.

    Returns:
        A string describing the current weather.
    """
    import requests
    resp = requests.get(f"https://wttr.in/{city}?format=3")
    return resp.text

# Register the tool
tool = client.create_tool(get_weather)

# Create agent with custom tool
agent = client.create_agent(
    name="weather-agent",
    tools=[tool.name, "send_message"]
)

Configuration

Server Configuration

# Environment variables
export LETTA_PG_URI=postgresql://user:pass@localhost:5432/letta
export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...

# Start server with custom config
letta server --port 8283 --host 0.0.0.0

# Configure default model
letta server --default-llm gpt-4o

Model Configuration

from letta import LLMConfig, EmbeddingConfig

llm_config = LLMConfig(
    model="gpt-4o",
    model_endpoint_type="openai",
    model_endpoint="https://api.openai.com/v1",
    context_window=128000
)

embedding_config = EmbeddingConfig(
    embedding_endpoint_type="openai",
    embedding_model="text-embedding-3-small",
    embedding_dim=1536,
    embedding_chunk_size=300
)

# Use local models via Ollama
local_llm = LLMConfig(
    model="llama3.1:70b",
    model_endpoint_type="ollama",
    model_endpoint="http://localhost:11434",
    context_window=8192
)

Advanced Usage

Multi-Agent Systems

from letta import create_client

client = create_client()

# Create specialized agents
researcher = client.create_agent(
    name="researcher",
    memory_blocks=[
        {"label": "persona", "value": "You research topics thoroughly."},
        {"label": "human", "value": ""}
    ]
)

writer = client.create_agent(
    name="writer",
    memory_blocks=[
        {"label": "persona", "value": "You write clear summaries."},
        {"label": "human", "value": ""}
    ]
)

# Orchestrate agents
research_result = client.send_message(
    agent_id=researcher.id,
    role="user",
    message="Research the latest trends in RAG systems"
)

summary = client.send_message(
    agent_id=writer.id,
    role="user",
    message=f"Summarize this research: {research_result.messages}"
)

Persistent Data Sources

# Attach a data source to an agent
source = client.create_source(name="company-docs")

# Load files into source
client.load_file_to_source(
    source_id=source.id,
    file_path="/path/to/documents/manual.pdf"
)

# Attach source to agent (populates archival memory)
client.attach_source_to_agent(
    source_id=source.id,
    agent_id=agent.id
)

Streaming Responses

# Stream agent responses
stream = client.send_message(
    agent_id=agent.id,
    role="user",
    message="Write a detailed analysis",
    stream=True
)

for chunk in stream:
    if hasattr(chunk, 'content'):
        print(chunk.content, end='', flush=True)

Troubleshooting

Issue	Solution
Agent not responding	Check LLM API key and model availability
Memory not persisting	Verify database connection (PostgreSQL recommended)
Context window exceeded	Reduce core memory size or switch to larger context model
Archival search returns empty	Check embedding model config, ensure data was inserted
Slow response times	Use faster model, reduce archival search count
Tool execution fails	Check tool function signature matches docstring
Docker container exits	Check logs: `docker logs letta-server`
Connection refused on 8283	Ensure server is running: `letta server`

# Debug mode
letta server --debug

# Check server health
curl http://localhost:8283/v1/health

# View agent details
curl http://localhost:8283/v1/agents/{agent_id}

# Export agent state
curl http://localhost:8283/v1/agents/{agent_id}/export > agent_backup.json