Ir al contenido

Modal

Installation

# Install the Modal client
pip install modal

# Authenticate (creates token in ~/.modal.toml)
modal token new

# Or set environment variable
export MODAL_TOKEN_ID=...
export MODAL_TOKEN_SECRET=...

# Verify
modal --version
modal profile current

Configuration

modal.toml (Auto-generated by modal token new)

[default]
token_id = "ak-xxxxxxxxxxxx"
token_secret = "as-xxxxxxxxxxxx"

App Definition

# app.py — every Modal project starts with an App
import modal

app = modal.App("my-ml-app")

# Custom container image
image = (
    modal.Image.debian_slim(python_version="3.12")
    .pip_install(
        "torch==2.3.0",
        "transformers==4.40.0",
        "accelerate",
        "datasets",
        "pillow",
    )
    .run_commands(
        "apt-get install -y libgl1",
    )
)

Core Commands

CLI

CommandDescription
modal run app.pyRun the app locally (executes remotely)
modal run app.py::my_functionRun a specific function
modal deploy app.pyDeploy the app (persistent)
modal serve app.pyServe with hot-reload (dev mode)
modal shell app.py::my_imageOpen interactive shell in container
modal shell --cmd bashOpen bash in image
modal app listList deployed apps
modal app stop my-appStop a deployed app
modal app logs my-appStream app logs
modal container listList running containers
modal container exec <id> bashExec into a running container
modal volume create my-volCreate a volume
modal volume listList volumes
modal volume put my-vol ./data /dataUpload files to volume
modal volume get my-vol /data ./localDownload files from volume
modal volume ls my-vol /List volume contents
modal secret create MY_SECRET key=valueCreate a secret
modal secret listList secrets
modal profile listList auth profiles
modal profile activate myprofileSwitch profile
modal token newGenerate new token
modal nfs create my-nfsCreate network filesystem

Advanced Usage

Functions and GPU Scheduling

import modal

app = modal.App("gpu-training")

image = modal.Image.debian_slim().pip_install("torch", "torchvision")

@app.function(
    image=image,
    gpu="A10G",                    # A10G | A100 | A100-80GB | H100 | T4 | L4 | any
    cpu=4,                         # CPU cores
    memory=32768,                  # Memory in MB (32GB)
    timeout=3600,                  # Seconds (default 300, max 86400)
    retries=3,                     # Auto-retry on failure
    concurrency_limit=10,          # Max parallel instances
)
def train_model(config: dict) -> dict:
    import torch
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    print(f"Using: {device}{torch.cuda.get_device_name(0)}")
    # ... training code ...
    return {"loss": 0.05, "accuracy": 0.97}


# Run locally (executes on Modal's cloud)
if __name__ == "__main__":
    with app.run():
        result = train_model.remote({"lr": 0.001, "epochs": 10})
        print(result)

Classes (Warm Containers)

import modal

app = modal.App("inference-service")

image = modal.Image.debian_slim().pip_install("transformers", "torch", "accelerate")

@app.cls(
    image=image,
    gpu="A10G",
    container_idle_timeout=300,    # Keep warm for 5 min after last request
    allow_concurrent_inputs=10,    # Handle 10 requests per container
)
class TextClassifier:
    @modal.enter()                 # Runs once on container start
    def load_model(self):
        from transformers import pipeline
        self.classifier = pipeline(
            "text-classification",
            model="distilbert-base-uncased-finetuned-sst-2-english",
            device=0
        )

    @modal.method()
    def classify(self, text: str) -> dict:
        return self.classifier(text)[0]

    @modal.exit()                  # Runs on container shutdown
    def cleanup(self):
        del self.classifier

# Call the class
@app.local_entrypoint()
def main():
    classifier = TextClassifier()
    result = classifier.classify.remote("Modal is incredible!")
    print(result)

Volumes (Persistent Storage)

import modal

app = modal.App("training-with-storage")

# Create or reference a volume
volume = modal.Volume.from_name("training-data", create_if_missing=True)

image = modal.Image.debian_slim().pip_install("torch", "datasets")

@app.function(
    image=image,
    gpu="A10G",
    volumes={"/data": volume},     # Mount at /data inside container
)
def download_and_train():
    import os
    # Check if data already exists (persisted from a previous run)
    if not os.path.exists("/data/dataset.json"):
        from datasets import load_dataset
        ds = load_dataset("imdb", split="train")
        ds.save_to_disk("/data/dataset.json")
        volume.commit()            # Flush writes to the volume

    # Load from volume
    from datasets import load_from_disk
    ds = load_from_disk("/data/dataset.json")
    # ... train ...
    
    # Save model checkpoint to volume
    model.save_pretrained("/data/checkpoints/epoch_1")
    volume.commit()

Secrets

import modal

# Create secrets via CLI:
# modal secret create openai-secret OPENAI_API_KEY=sk-...
# modal secret create aws-creds AWS_ACCESS_KEY_ID=... AWS_SECRET_ACCESS_KEY=...

openai_secret = modal.Secret.from_name("openai-secret")
aws_secret = modal.Secret.from_name("aws-creds")

# Local secrets (for local development)
local_secret = modal.Secret.from_dict({"MY_KEY": "my-value"})
env_secret = modal.Secret.from_dotenv(".env.production")

app = modal.App("secret-demo")

@app.function(secrets=[openai_secret, aws_secret])
def call_openai():
    import os
    import openai
    client = openai.OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    # ...

Web Endpoints

import modal
from fastapi import FastAPI, Request
from fastapi.responses import JSONResponse

app = modal.App("web-api")

# Simple web endpoint
@app.function(gpu="T4")
@modal.web_endpoint(method="POST")
def generate(item: dict) -> dict:
    text = item["text"]
    result = run_model(text)
    return {"result": result}

# Full FastAPI app
web_app = FastAPI()

@app.function(
    image=modal.Image.debian_slim().pip_install("fastapi", "torch", "transformers"),
    gpu="A10G",
    container_idle_timeout=300,
)
@modal.asgi_app()
def fastapi_app():
    return web_app

@web_app.get("/health")
async def health():
    return {"status": "ok"}

@web_app.post("/classify")
async def classify(request: Request):
    body = await request.json()
    result = run_inference(body["text"])
    return JSONResponse({"label": result})

Parallel Map

import modal

app = modal.App("parallel-processing")

@app.function(cpu=2, memory=4096, timeout=600)
def process_item(item: dict) -> dict:
    # CPU-bound processing
    return {"id": item["id"], "result": heavy_computation(item)}


@app.local_entrypoint()
def main():
    items = load_large_dataset()   # 10,000 items

    # map — returns results in order
    results = list(process_item.map(items))

    # starmap — for functions with multiple args
    pairs = [(item, config) for item in items]
    results = list(process_item.starmap(pairs))

    # for_each — fire and forget
    process_item.for_each(items)

    # map with error handling
    for result in process_item.map(items, return_exceptions=True):
        if isinstance(result, Exception):
            print(f"Error: {result}")
        else:
            handle(result)

Cron Jobs and Schedules

import modal
from datetime import datetime

app = modal.App("scheduled-jobs")

@app.function(schedule=modal.Period(hours=6))     # Every 6 hours
def refresh_cache():
    print(f"Refreshing at {datetime.now()}")
    fetch_and_store()

@app.function(schedule=modal.Cron("0 9 * * MON-FRI"))  # 9am weekdays UTC
def send_daily_report():
    report = generate_report()
    send_email(report)

@app.function(schedule=modal.Period(days=1))
def daily_retraining():
    train_model()
    deploy_model()

Custom Container Images

import modal

# Build from Dockerfile
dockerfile_image = modal.Image.from_dockerfile("./Dockerfile")

# Debian slim — most common
image = (
    modal.Image.debian_slim(python_version="3.12")
    .apt_install("libgl1", "libglib2.0-0")
    .pip_install("opencv-python-headless", "torch", "ultralytics")
    .copy_local_file("./config.yaml", "/app/config.yaml")
    .env({"PYTORCH_CUDA_ALLOC_CONF": "max_split_size_mb:128"})
    .run_commands("python -c 'import torch; print(torch.__version__)'")
)

# From a public Docker image
custom_image = modal.Image.from_registry("nvcr.io/nvidia/pytorch:24.03-py3")

# Micromamba (conda-compatible)
conda_image = (
    modal.Image.micromamba(python_version="3.11")
    .micromamba_install("cudatoolkit=11.8", channels=["conda-forge", "nvidia"])
    .pip_install("torch", "transformers")
)

Fine-Tuning on Demand

import modal

app = modal.App("llm-finetuning")

volume = modal.Volume.from_name("finetuning-outputs", create_if_missing=True)

image = (
    modal.Image.debian_slim(python_version="3.12")
    .pip_install("transformers", "peft", "trl", "bitsandbytes", "accelerate", "datasets")
)

@app.function(
    image=image,
    gpu="A100-80GB",
    timeout=7200,                  # 2 hours
    volumes={"/outputs": volume},
    secrets=[modal.Secret.from_name("huggingface-token")],
)
def finetune_llm(
    base_model: str = "meta-llama/Llama-3-8b-instruct",
    dataset_name: str = "my-org/my-dataset",
    num_epochs: int = 3,
):
    import os
    from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments
    from peft import LoraConfig, get_peft_model
    from trl import SFTTrainer

    model = AutoModelForCausalLM.from_pretrained(
        base_model,
        token=os.environ["HUGGING_FACE_HUB_TOKEN"],
        load_in_4bit=True,
        device_map="auto"
    )

    lora_config = LoraConfig(r=16, lora_alpha=32, target_modules=["q_proj", "v_proj"])
    model = get_peft_model(model, lora_config)

    trainer = SFTTrainer(
        model=model,
        train_dataset=dataset,
        args=TrainingArguments(
            output_dir="/outputs/checkpoints",
            num_train_epochs=num_epochs,
        )
    )
    trainer.train()
    model.save_pretrained("/outputs/final-model")
    volume.commit()

    return {"status": "complete", "output": "/outputs/final-model"}

Network File System (NFS)

import modal

# For shared read-write access across concurrent functions
nfs = modal.NetworkFileSystem.from_name("shared-nfs", create_if_missing=True)

@app.function(network_file_systems={"/shared": nfs})
def write_results(data):
    with open("/shared/results.json", "a") as f:
        f.write(data)

Common Workflows

Local Development → Cloud Deployment

# Dev: run with hot reload
modal serve app.py

# Test a single function
modal run app.py::train_model

# Interactive debugging
modal shell app.py::image

# Deploy to production
modal deploy app.py

# Check deployment
modal app list
modal app logs my-ml-app

Batch Inference Pipeline

@app.function(gpu="T4", timeout=300)
def run_inference(batch: list[str]) -> list[dict]:
    model = load_model()
    return [model.predict(text) for text in batch]

@app.local_entrypoint()
def main():
    texts = load_texts()                    # 100,000 texts
    batch_size = 64
    batches = [texts[i:i+batch_size] for i in range(0, len(texts), batch_size)]

    all_results = []
    for batch_result in run_inference.map(batches):
        all_results.extend(batch_result)

    save_results(all_results)

Reproducible Experiments

@app.function(
    image=image,
    gpu="A100",
    volumes={"/data": data_volume, "/outputs": output_volume},
    secrets=[modal.Secret.from_name("wandb-secret")],
)
def experiment(config: dict):
    import wandb
    import os

    wandb.init(project="my-project", config=config)
    results = train(**config)
    wandb.log(results)

    # Save checkpoint
    save_checkpoint(results["model"], f"/outputs/{wandb.run.id}")
    output_volume.commit()

    return results

Tips and Best Practices

  • Define images at module level — Modal caches image layers; rebuilding only when pip_install or commands change avoids unnecessary image builds.
  • Use @app.cls with container_idle_timeout for inference services — keeps containers warm to eliminate cold start latency for real-time APIs.
  • volume.commit() must be called explicitly after writes — Modal volumes are not automatically synced; forgetting this loses data.
  • GPU selection matters for costT4 is cheapest for inference, A10G is a good training balance, A100-80GB for large models; use gpu="any" for flexibility.
  • timeout defaults to 300s — always set a longer timeout for training jobs; the max is 86400s (24 hours).
  • Secrets never appear in logs or images — they are injected as environment variables at runtime; never bake secrets into the image with env().
  • modal shell app.py::image opens an interactive container with your image — invaluable for debugging dependency issues.
  • return_exceptions=True in .map() prevents one failed item from killing the entire batch — handle errors item by item.
  • Use modal serve for web endpoint development — it hot-reloads on file changes so you can iterate without redeploying.
  • Volumes are regional — ensure your volume and functions are in the same region to avoid cross-region transfer costs and latency.
  • allow_concurrent_inputs on @app.cls enables a single container to handle multiple requests — critical for GPU cost efficiency on inference workloads.
  • Prefer @app.local_entrypoint() over if __name__ == "__main__" for Modal apps — it integrates with the Modal CLI and runs inside the Modal context automatically.