GPT Engineer

Installation

# Install via pip (Python 3.9+)
pip install gpt-engineer

# Or install with all optional dependencies
pip install gpt-engineer[benchmark]

# Install from source (latest development)
git clone https://github.com/gpt-engineer-org/gpt-engineer
cd gpt-engineer
pip install -e .

# Verify installation
gpte --version
gpt-engineer --version     # alias

# Set OpenAI API key
export OPENAI_API_KEY=sk-your-key-here

# Or use Anthropic
export ANTHROPIC_API_KEY=sk-ant-your-key-here

Configuration

# API keys
export OPENAI_API_KEY=sk-your-key-here
export ANTHROPIC_API_KEY=sk-ant-your-key-here

# Optional: Azure OpenAI
export AZURE_OPENAI_API_KEY=your-azure-key
export OPENAI_API_BASE=https://your-resource.openai.azure.com/
export OPENAI_API_TYPE=azure
export OPENAI_API_VERSION=2023-05-15

# Model selection via CLI flag (see Core Commands)
# Default model: gpt-4o

# GPTE config environment variables
export GPTE_MAX_TOKENS=4096
export GPTE_MODEL=gpt-4o

# .gpte/config.toml (project-level config, optional)
model = "gpt-4o"
temperature = 0.1
max_tokens = 4096

# Project structure gpte creates:
my_project/
├── .gpteng/              # gpte workspace (git ignored)
│   └── memory/           # conversation memory files
├── prompt               # YOUR main prompt file
└── [generated files]    # all created source files

Core Commands / API

Command	Description
`gpte PROJECT_DIR`	Generate codebase from prompt in directory
`gpte PROJECT_DIR -i`	Improve existing code interactively
`gpte PROJECT_DIR --improve`	Improve mode (same as -i)
`gpte PROJECT_DIR -m MODEL`	Use specific model
`gpte PROJECT_DIR --model MODEL`	Use specific model (long form)
`gpte PROJECT_DIR -t TEMP`	Set temperature (0.0–2.0, default 0.1)
`gpte PROJECT_DIR --lite`	Lite mode — skip clarification step
`gpte PROJECT_DIR --azure`	Use Azure OpenAI endpoint
`gpte PROJECT_DIR -h`	Show all options
`gpte benchmark`	Run benchmark suite
`gpte benchmark --task TASK`	Run specific benchmark task

Flag	Description
`-m / --model`	LLM model name (gpt-4o, claude-3-5-sonnet, etc.)
`-t / --temperature`	Sampling temperature (default: 0.1)
`-i / --improve`	Improve existing codebase mode
`--lite`	Skip clarification, go straight to coding
`--azure`	Use Azure OpenAI
`--verbose`	Verbose output
`--prompt_file`	Specify custom prompt file path

Advanced Usage

Writing Effective Prompts

# Create a project directory and write a prompt file
mkdir my_web_app
cd my_web_app

cat > prompt << 'EOF'
Build a REST API using FastAPI and Python with the following features:

1. User management
   - POST /users - Create a new user (name, email, password)
   - GET /users/{id} - Get user by ID
   - PUT /users/{id} - Update user
   - DELETE /users/{id} - Delete user

2. Authentication
   - POST /auth/login - JWT-based login
   - POST /auth/logout - Invalidate token

3. Requirements
   - Use SQLite with SQLAlchemy ORM
   - Include Pydantic models for request/response validation
   - Add proper error handling (404, 422, 500)
   - Include a requirements.txt
   - Add basic tests using pytest

The app should run with: uvicorn main:app --reload
EOF

# Generate the codebase
gpte .

Improve Mode

# Improve an existing codebase
cd my_existing_project

# Create or edit prompt with what to change
cat > prompt << 'EOF'
Add the following improvements to the existing FastAPI application:

1. Add rate limiting (max 100 requests per minute per IP)
2. Add request logging middleware that logs method, path, and response time
3. Add input validation for email fields using regex
4. Add a health check endpoint at GET /health
EOF

# Run improve mode
gpte . --improve

# Or interactive improve (asks you before applying changes)
gpte . -i

File Selection in Improve Mode

# When running improve mode, gpte asks which files to edit
# You can pre-specify files to avoid interactive prompts

# gpte will show a diff and ask for confirmation before applying
# Press ENTER to accept, or modify the diff

# To skip file selection entirely, use --improve with clear prompt
# Specify exact filenames in your prompt:
cat > prompt << 'EOF'
Modify only these files:
- src/auth/middleware.py: Add JWT validation
- src/models/user.py: Add email validation field
- tests/test_auth.py: Add tests for new middleware

Do not modify any other files.
EOF

gpte . --improve

Model Selection

# Use GPT-4o (default, best results)
gpte my_project -m gpt-4o

# Use Claude for potentially better code
gpte my_project -m claude-3-5-sonnet-20241022

# Use cheaper models for simple tasks
gpte my_project -m gpt-4o-mini

# Use local models via LiteLLM
pip install litellm
OPENAI_API_BASE=http://localhost:11434/v1 OPENAI_API_KEY=ollama \
  gpte my_project -m ollama/codellama

Customizing Preprompts

# Preprompts directory (system instructions for the AI)
# Default location: ~/.local/share/gpt-engineer/preprompts/
# Or: /path/to/gpt-engineer/gpt_engineer/preprompts/

# List default preprompts
ls $(python -c "import gpt_engineer; import os; print(os.path.dirname(gpt_engineer.__file__))")/preprompts/

# Key preprompt files:
# clarify         — how gpte asks clarifying questions
# generate_code   — main code generation instructions  
# improve         — improve mode instructions
# entrypoint      — how to create the main run script
# use_qa          — Q&A format instructions

# Override preprompts for a project
mkdir -p my_project/preprompts

# Custom generate_code preprompt
cat > my_project/preprompts/generate_code << 'EOF'
You are an expert TypeScript developer who follows these principles:
- Always use strict TypeScript (no `any` types)
- Follow functional programming patterns where possible
- Write comprehensive JSDoc comments
- Use async/await instead of Promises directly
- All functions must have explicit return types
- Follow Google TypeScript style guide

When generating code:
1. Start with clear file structure overview
2. Implement each file completely
3. Include proper error handling
4. Add unit tests for all business logic functions
EOF

# Run with custom preprompts
gpte my_project --prompt_file my_project/prompt

Python API

from gpt_engineer.applications.cli.main import main
from gpt_engineer.core.default.steps import gen_code, improve
from gpt_engineer.core.ai import AI
from gpt_engineer.core.prompt import Prompt
import os

# Initialize AI
ai = AI(
    model_name="gpt-4o",
    temperature=0.1,
    azure_endpoint=None,
)

# Generate code from a prompt
project_path = "./my_project"
os.makedirs(project_path, exist_ok=True)

prompt = Prompt(
    text="Create a Python CLI tool that converts Markdown to HTML using mistune.",
)

# Run generation
files_dict = gen_code(ai, prompt)
print("Generated files:", list(files_dict.files.keys()))

Token Optimization

# Use lite mode to skip clarification (saves ~1000 tokens)
gpte my_project --lite

# Write precise prompts to reduce back-and-forth
# BAD (vague):
echo "Build a web app" > prompt

# GOOD (precise):
cat > prompt << 'EOF'
Build a minimal Flask web app with:
- Single endpoint: GET / returns "Hello, World!" as JSON
- No database, no auth, no templates
- Single file: app.py
- requirements.txt with only flask
EOF

# Limit scope per generation session
# Generate in chunks: core first, then features

# Use GPT-4o-mini for early iterations when testing prompts
gpte my_project -m gpt-4o-mini    # cheap prototype
gpte my_project -m gpt-4o         # final quality generation

Benchmarking

# Run the built-in benchmark suite
gpte benchmark

# Benchmark on specific tasks
gpte benchmark --task "hello_world"
gpte benchmark --task "fibonacci"

# Custom evaluation (requires benchmark extra)
pip install gpt-engineer[benchmark]

# Run all benchmarks and get scores
gpte benchmark --eval

Common Workflows

Greenfield Project Generation

# 1. Create project directory
mkdir saas_dashboard && cd saas_dashboard

# 2. Write detailed prompt
cat > prompt << 'EOF'
Build a React + TypeScript dashboard application:

TECH STACK:
- React 18 with TypeScript
- Vite as build tool
- Tailwind CSS for styling
- React Query for data fetching
- React Router for navigation
- Recharts for data visualization

FEATURES:
1. Sidebar navigation with: Dashboard, Analytics, Users, Settings
2. Dashboard page with 4 KPI cards (users, revenue, orders, growth)
3. Line chart showing last 30 days of revenue
4. Users table with pagination (mock data okay)
5. Responsive layout (works on mobile)

STRUCTURE:
- src/components/ — reusable UI components
- src/pages/      — page-level components
- src/hooks/      — custom React hooks
- src/types/      — TypeScript interfaces

Start with: npm run dev
EOF

# 3. Generate
gpte . --lite

# 4. Install and run
npm install && npm run dev

Incremental Feature Addition

# Start with core
cat > prompt << 'EOF'
Create a Python FastAPI app with:
- SQLite database
- User model: id, email, created_at
- CRUD endpoints for users
- Alembic migrations
EOF
gpte . --lite

# Add authentication
cat > prompt << 'EOF'
Add JWT authentication to the existing FastAPI app:
- POST /auth/register - create user with hashed password
- POST /auth/login - return JWT token
- Protected route decorator
- 1 hour token expiration
EOF
gpte . --improve

# Add tests
cat > prompt << 'EOF'
Add pytest tests covering:
- User CRUD operations
- Auth register/login flows
- Invalid input validation
Use TestClient and an in-memory SQLite database.
EOF
gpte . --improve

Code Refactoring

cd legacy_project

cat > prompt << 'EOF'
Refactor this Python codebase to:
1. Replace all `print()` statements with proper logging (use `logging` module)
2. Add type hints to all function signatures
3. Split the monolithic main.py into modules:
   - config.py — configuration loading
   - utils.py  — helper functions
   - models.py — data classes
   - main.py   — entrypoint only
4. Keep all existing functionality intact
EOF

gpte . --improve

Tips and Best Practices

Prompt Engineering for Code Generation

Principle	Example
Specify tech stack exactly	”Flask 3.0, SQLAlchemy 2.0, PostgreSQL 15”
List all files needed	”Create: app.py, models.py, requirements.txt, README.md”
Define the run command	”App starts with: python main.py”
Include constraints	”No external APIs, no Docker, pure Python stdlib where possible”
Specify testing	”Include pytest tests covering the happy path”
Define interfaces	Include API contracts or data schemas in prompt

Workflow Recommendations

# Start simple, iterate
# Pass 1: Core structure (cheap model, lite mode)
gpte . -m gpt-4o-mini --lite

# Pass 2: Verify and improve
gpte . --improve   # fix issues found in pass 1

# Pass 3: Quality pass (best model)
gpte . --improve -m gpt-4o

# Always review generated code before running
# Generated code may have:
# - Placeholder values that need real credentials
# - Hardcoded paths that need adjustment
# - Missing environment variable configuration

Cost Estimation

Task Size	Tokens (approx)	GPT-4o Cost	GPT-4o-mini Cost
Simple script	2,000–5,000	~$0.02–0.05	~$0.001
Small app (5–10 files)	10,000–20,000	~$0.10–0.20	~$0.01
Medium app (20+ files)	30,000–60,000	~$0.30–0.60	~$0.02
Complex system	60,000–150,000	~$0.60–1.50	~$0.05

# Reduce costs:
# 1. Use --lite flag (skip clarification = ~30% fewer tokens)
# 2. Use gpt-4o-mini for prototyping, gpt-4o for final output
# 3. Write precise prompts (less back-and-forth)
# 4. Generate incrementally (smaller scoped prompts per session)

Troubleshooting

# Rate limit errors
# → Reduce request frequency, or use Azure OpenAI (higher limits)

# Context length exceeded
# → Break prompt into smaller features
# → Use --lite to reduce conversation turns
# → Use gpt-4o (128k context) over gpt-3.5 (16k)

# Generated code doesn't run
# → Check generated requirements.txt for version conflicts
# → Look for placeholder values (API keys, DB URLs, etc.)
# → Run in improve mode to fix specific errors:
echo "Fix: ModuleNotFoundError: No module named 'httpx'" > prompt
gpte . --improve

# Files not being generated
# → Be explicit in prompt: "Create these files: main.py, utils.py, ..."
# → Check .gpteng/memory/ for conversation history

# Inconsistent output across runs
# → Lower temperature: gpte . -t 0.0 (fully deterministic)