Installation
# Install the Modal client
pip install modal
# Authenticate (creates token in ~/.modal.toml)
modal token new
# Or set environment variable
export MODAL_TOKEN_ID=...
export MODAL_TOKEN_SECRET=...
# Verify
modal --version
modal profile current
Configuration
modal.toml (Auto-generated by modal token new)
[default]
token_id = "ak-xxxxxxxxxxxx"
token_secret = "as-xxxxxxxxxxxx"
App Definition
# app.py — every Modal project starts with an App
import modal
app = modal.App("my-ml-app")
# Custom container image
image = (
modal.Image.debian_slim(python_version="3.12")
.pip_install(
"torch==2.3.0",
"transformers==4.40.0",
"accelerate",
"datasets",
"pillow",
)
.run_commands(
"apt-get install -y libgl1",
)
)
Core Commands
CLI
| Command | Description |
|---|
modal run app.py | Run the app locally (executes remotely) |
modal run app.py::my_function | Run a specific function |
modal deploy app.py | Deploy the app (persistent) |
modal serve app.py | Serve with hot-reload (dev mode) |
modal shell app.py::my_image | Open interactive shell in container |
modal shell --cmd bash | Open bash in image |
modal app list | List deployed apps |
modal app stop my-app | Stop a deployed app |
modal app logs my-app | Stream app logs |
modal container list | List running containers |
modal container exec <id> bash | Exec into a running container |
modal volume create my-vol | Create a volume |
modal volume list | List volumes |
modal volume put my-vol ./data /data | Upload files to volume |
modal volume get my-vol /data ./local | Download files from volume |
modal volume ls my-vol / | List volume contents |
modal secret create MY_SECRET key=value | Create a secret |
modal secret list | List secrets |
modal profile list | List auth profiles |
modal profile activate myprofile | Switch profile |
modal token new | Generate new token |
modal nfs create my-nfs | Create network filesystem |
Advanced Usage
Functions and GPU Scheduling
import modal
app = modal.App("gpu-training")
image = modal.Image.debian_slim().pip_install("torch", "torchvision")
@app.function(
image=image,
gpu="A10G", # A10G | A100 | A100-80GB | H100 | T4 | L4 | any
cpu=4, # CPU cores
memory=32768, # Memory in MB (32GB)
timeout=3600, # Seconds (default 300, max 86400)
retries=3, # Auto-retry on failure
concurrency_limit=10, # Max parallel instances
)
def train_model(config: dict) -> dict:
import torch
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using: {device} — {torch.cuda.get_device_name(0)}")
# ... training code ...
return {"loss": 0.05, "accuracy": 0.97}
# Run locally (executes on Modal's cloud)
if __name__ == "__main__":
with app.run():
result = train_model.remote({"lr": 0.001, "epochs": 10})
print(result)
Classes (Warm Containers)
import modal
app = modal.App("inference-service")
image = modal.Image.debian_slim().pip_install("transformers", "torch", "accelerate")
@app.cls(
image=image,
gpu="A10G",
container_idle_timeout=300, # Keep warm for 5 min after last request
allow_concurrent_inputs=10, # Handle 10 requests per container
)
class TextClassifier:
@modal.enter() # Runs once on container start
def load_model(self):
from transformers import pipeline
self.classifier = pipeline(
"text-classification",
model="distilbert-base-uncased-finetuned-sst-2-english",
device=0
)
@modal.method()
def classify(self, text: str) -> dict:
return self.classifier(text)[0]
@modal.exit() # Runs on container shutdown
def cleanup(self):
del self.classifier
# Call the class
@app.local_entrypoint()
def main():
classifier = TextClassifier()
result = classifier.classify.remote("Modal is incredible!")
print(result)
Volumes (Persistent Storage)
import modal
app = modal.App("training-with-storage")
# Create or reference a volume
volume = modal.Volume.from_name("training-data", create_if_missing=True)
image = modal.Image.debian_slim().pip_install("torch", "datasets")
@app.function(
image=image,
gpu="A10G",
volumes={"/data": volume}, # Mount at /data inside container
)
def download_and_train():
import os
# Check if data already exists (persisted from a previous run)
if not os.path.exists("/data/dataset.json"):
from datasets import load_dataset
ds = load_dataset("imdb", split="train")
ds.save_to_disk("/data/dataset.json")
volume.commit() # Flush writes to the volume
# Load from volume
from datasets import load_from_disk
ds = load_from_disk("/data/dataset.json")
# ... train ...
# Save model checkpoint to volume
model.save_pretrained("/data/checkpoints/epoch_1")
volume.commit()
Secrets
import modal
# Create secrets via CLI:
# modal secret create openai-secret OPENAI_API_KEY=sk-...
# modal secret create aws-creds AWS_ACCESS_KEY_ID=... AWS_SECRET_ACCESS_KEY=...
openai_secret = modal.Secret.from_name("openai-secret")
aws_secret = modal.Secret.from_name("aws-creds")
# Local secrets (for local development)
local_secret = modal.Secret.from_dict({"MY_KEY": "my-value"})
env_secret = modal.Secret.from_dotenv(".env.production")
app = modal.App("secret-demo")
@app.function(secrets=[openai_secret, aws_secret])
def call_openai():
import os
import openai
client = openai.OpenAI(api_key=os.environ["OPENAI_API_KEY"])
# ...
Web Endpoints
import modal
from fastapi import FastAPI, Request
from fastapi.responses import JSONResponse
app = modal.App("web-api")
# Simple web endpoint
@app.function(gpu="T4")
@modal.web_endpoint(method="POST")
def generate(item: dict) -> dict:
text = item["text"]
result = run_model(text)
return {"result": result}
# Full FastAPI app
web_app = FastAPI()
@app.function(
image=modal.Image.debian_slim().pip_install("fastapi", "torch", "transformers"),
gpu="A10G",
container_idle_timeout=300,
)
@modal.asgi_app()
def fastapi_app():
return web_app
@web_app.get("/health")
async def health():
return {"status": "ok"}
@web_app.post("/classify")
async def classify(request: Request):
body = await request.json()
result = run_inference(body["text"])
return JSONResponse({"label": result})
Parallel Map
import modal
app = modal.App("parallel-processing")
@app.function(cpu=2, memory=4096, timeout=600)
def process_item(item: dict) -> dict:
# CPU-bound processing
return {"id": item["id"], "result": heavy_computation(item)}
@app.local_entrypoint()
def main():
items = load_large_dataset() # 10,000 items
# map — returns results in order
results = list(process_item.map(items))
# starmap — for functions with multiple args
pairs = [(item, config) for item in items]
results = list(process_item.starmap(pairs))
# for_each — fire and forget
process_item.for_each(items)
# map with error handling
for result in process_item.map(items, return_exceptions=True):
if isinstance(result, Exception):
print(f"Error: {result}")
else:
handle(result)
Cron Jobs and Schedules
import modal
from datetime import datetime
app = modal.App("scheduled-jobs")
@app.function(schedule=modal.Period(hours=6)) # Every 6 hours
def refresh_cache():
print(f"Refreshing at {datetime.now()}")
fetch_and_store()
@app.function(schedule=modal.Cron("0 9 * * MON-FRI")) # 9am weekdays UTC
def send_daily_report():
report = generate_report()
send_email(report)
@app.function(schedule=modal.Period(days=1))
def daily_retraining():
train_model()
deploy_model()
Custom Container Images
import modal
# Build from Dockerfile
dockerfile_image = modal.Image.from_dockerfile("./Dockerfile")
# Debian slim — most common
image = (
modal.Image.debian_slim(python_version="3.12")
.apt_install("libgl1", "libglib2.0-0")
.pip_install("opencv-python-headless", "torch", "ultralytics")
.copy_local_file("./config.yaml", "/app/config.yaml")
.env({"PYTORCH_CUDA_ALLOC_CONF": "max_split_size_mb:128"})
.run_commands("python -c 'import torch; print(torch.__version__)'")
)
# From a public Docker image
custom_image = modal.Image.from_registry("nvcr.io/nvidia/pytorch:24.03-py3")
# Micromamba (conda-compatible)
conda_image = (
modal.Image.micromamba(python_version="3.11")
.micromamba_install("cudatoolkit=11.8", channels=["conda-forge", "nvidia"])
.pip_install("torch", "transformers")
)
Fine-Tuning on Demand
import modal
app = modal.App("llm-finetuning")
volume = modal.Volume.from_name("finetuning-outputs", create_if_missing=True)
image = (
modal.Image.debian_slim(python_version="3.12")
.pip_install("transformers", "peft", "trl", "bitsandbytes", "accelerate", "datasets")
)
@app.function(
image=image,
gpu="A100-80GB",
timeout=7200, # 2 hours
volumes={"/outputs": volume},
secrets=[modal.Secret.from_name("huggingface-token")],
)
def finetune_llm(
base_model: str = "meta-llama/Llama-3-8b-instruct",
dataset_name: str = "my-org/my-dataset",
num_epochs: int = 3,
):
import os
from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments
from peft import LoraConfig, get_peft_model
from trl import SFTTrainer
model = AutoModelForCausalLM.from_pretrained(
base_model,
token=os.environ["HUGGING_FACE_HUB_TOKEN"],
load_in_4bit=True,
device_map="auto"
)
lora_config = LoraConfig(r=16, lora_alpha=32, target_modules=["q_proj", "v_proj"])
model = get_peft_model(model, lora_config)
trainer = SFTTrainer(
model=model,
train_dataset=dataset,
args=TrainingArguments(
output_dir="/outputs/checkpoints",
num_train_epochs=num_epochs,
)
)
trainer.train()
model.save_pretrained("/outputs/final-model")
volume.commit()
return {"status": "complete", "output": "/outputs/final-model"}
Network File System (NFS)
import modal
# For shared read-write access across concurrent functions
nfs = modal.NetworkFileSystem.from_name("shared-nfs", create_if_missing=True)
@app.function(network_file_systems={"/shared": nfs})
def write_results(data):
with open("/shared/results.json", "a") as f:
f.write(data)
Common Workflows
Local Development → Cloud Deployment
# Dev: run with hot reload
modal serve app.py
# Test a single function
modal run app.py::train_model
# Interactive debugging
modal shell app.py::image
# Deploy to production
modal deploy app.py
# Check deployment
modal app list
modal app logs my-ml-app
Batch Inference Pipeline
@app.function(gpu="T4", timeout=300)
def run_inference(batch: list[str]) -> list[dict]:
model = load_model()
return [model.predict(text) for text in batch]
@app.local_entrypoint()
def main():
texts = load_texts() # 100,000 texts
batch_size = 64
batches = [texts[i:i+batch_size] for i in range(0, len(texts), batch_size)]
all_results = []
for batch_result in run_inference.map(batches):
all_results.extend(batch_result)
save_results(all_results)
Reproducible Experiments
@app.function(
image=image,
gpu="A100",
volumes={"/data": data_volume, "/outputs": output_volume},
secrets=[modal.Secret.from_name("wandb-secret")],
)
def experiment(config: dict):
import wandb
import os
wandb.init(project="my-project", config=config)
results = train(**config)
wandb.log(results)
# Save checkpoint
save_checkpoint(results["model"], f"/outputs/{wandb.run.id}")
output_volume.commit()
return results
Tips and Best Practices
- Define images at module level — Modal caches image layers; rebuilding only when
pip_install or commands change avoids unnecessary image builds.
- Use
@app.cls with container_idle_timeout for inference services — keeps containers warm to eliminate cold start latency for real-time APIs.
volume.commit() must be called explicitly after writes — Modal volumes are not automatically synced; forgetting this loses data.
- GPU selection matters for cost —
T4 is cheapest for inference, A10G is a good training balance, A100-80GB for large models; use gpu="any" for flexibility.
timeout defaults to 300s — always set a longer timeout for training jobs; the max is 86400s (24 hours).
- Secrets never appear in logs or images — they are injected as environment variables at runtime; never bake secrets into the image with
env().
modal shell app.py::image opens an interactive container with your image — invaluable for debugging dependency issues.
return_exceptions=True in .map() prevents one failed item from killing the entire batch — handle errors item by item.
- Use
modal serve for web endpoint development — it hot-reloads on file changes so you can iterate without redeploying.
- Volumes are regional — ensure your volume and functions are in the same region to avoid cross-region transfer costs and latency.
allow_concurrent_inputs on @app.cls enables a single container to handle multiple requests — critical for GPU cost efficiency on inference workloads.
- Prefer
@app.local_entrypoint() over if __name__ == "__main__" for Modal apps — it integrates with the Modal CLI and runs inside the Modal context automatically.