Saltar a contenido

Ollama

Ollama

__HTML_TAG_92_ Todos los comandos_HTML_TAG_93__

Ollama es una herramienta para ejecutar grandes modelos de lenguaje localmente en su máquina, proporcionando privacidad, control y acceso sin conexión a los modelos AI como Llama, Mistral y CodeLlama.

Instalación > Configuración

Command Description
INLINE_CODE_10 Install Ollama on Linux/macOS
INLINE_CODE_11 Install via Homebrew (macOS)
INLINE_CODE_12 Check installed version
INLINE_CODE_13 Start Ollama server
INLINE_CODE_14 List running models
INLINE_CODE_15 List installed models

Model Management

Command Description
INLINE_CODE_16 Download Llama 3.1 model
INLINE_CODE_17 Download Mistral model
INLINE_CODE_18 Download CodeLlama model
INLINE_CODE_19 Download specific model size
INLINE_CODE_20 Show model information
INLINE_CODE_21 Remove model

General Purpose Models

Command Description
INLINE_CODE_22 Llama 3.1 8B parameters
INLINE_CODE_23 Llama 3.1 70B parameters
INLINE_CODE_24 Mistral 7B model
INLINE_CODE_25 Mixtral 8x7B mixture of experts
INLINE_CODE_26 Google Gemma 7B
INLINE_CODE_27 Microsoft Phi-3 Mini

Code-Specialized Models

Command Description
INLINE_CODE_28 CodeLlama 7B for coding
INLINE_CODE_29 CodeLlama 13B for coding
INLINE_CODE_30 CodeGemma for code generation
INLINE_CODE_31 DeepSeek Coder model
INLINE_CODE_32 StarCoder2 for code

Specialized Models

Command Description
INLINE_CODE_33 LLaVA multimodal model
INLINE_CODE_34 Text embedding model
INLINE_CODE_35 Sentence embedding model
INLINE_CODE_36 Large embedding model

Running Models

__TABLE_103_

Chat Interface

__TABLE_104_

API Usage

REST API_TABLE_105__

Streaming Responses

Command Description
INLINE_CODE_51 Stream response
INLINE_CODE_52 Stream chat

Model Configuration

Temperatura y parámetros

Command Description
INLINE_CODE_53 Set temperature
INLINE_CODE_54 Set top-p sampling
INLINE_CODE_55 Set top-k sampling
INLINE_CODE_56 Set repeat penalty
INLINE_CODE_57 Set random seed
_
### Context and Memory
Command Description
--------- -------------
INLINE_CODE_58 Set context window size
INLINE_CODE_59 Set batch size
INLINE_CODE_60 Set number of threads

Custom Models

Creating Modelfiles

Command Description
INLINE_CODE_61 Create custom model
INLINE_CODE_62 Create with quantization

Ejemplos de perfil modelo

# Basic Modelfile
FROM llama3.1
PARAMETER temperature 0.8
PARAMETER top_p 0.9
SYSTEM "You are a helpful coding assistant."
# Advanced Modelfile
FROM codellama:7b
PARAMETER temperature 0.2
PARAMETER top_k 40
PARAMETER repeat_penalty 1.1
SYSTEM """You are an expert programmer. Always provide:
1. Clean, well-commented code
2. Explanation of the solution
3. Best practices and optimizations"""

Integración Ejemplos

Python Integration

import requests
import json

def chat_with_ollama(prompt, model="llama3.1"):
    url = "http://localhost:11434/api/generate"
    data = {
        "model": model,
        "prompt": prompt,
        "stream": False
    }
    response = requests.post(url, json=data)
    return response.json()["response"]

# Usage
result = chat_with_ollama("Explain quantum computing")
print(result)

Integración JavaScript

async function chatWithOllama(prompt, model = "llama3.1") {
    const response = await fetch("http://localhost:11434/api/generate", {
        method: "POST",
        headers: { "Content-Type": "application/json" },
        body: JSON.stringify({
            model: model,
            prompt: prompt,
            stream: false
        })
    });
    const data = await response.json();
    return data.response;
}

// Usage
chatWithOllama("Write a JavaScript function").then(console.log);

Bash Integration

#!/bin/bash
ollama_chat() {
    local prompt="$1"
    local model="${2:-llama3.1}"
    curl -s http://localhost:11434/api/generate \
        -d "{\"model\":\"$model\",\"prompt\":\"$prompt\",\"stream\":false}" \
        | jq -r '.response'
}

# Usage
ollama_chat "Explain Docker containers"

Performance Optimization

Command Description
INLINE_CODE_63 Use GPU acceleration
INLINE_CODE_64 Set memory limit
INLINE_CODE_65 Set CPU threads
INLINE_CODE_66 Optimize batch size

Environment Variables

Variable Description
INLINE_CODE_67 Set server host (default: 127.0.0.1:11434)
INLINE_CODE_68 Set models directory
INLINE_CODE_69 Number of parallel requests
INLINE_CODE_70 Max models in memory
INLINE_CODE_71 Enable flash attention
INLINE_CODE_72 GPU memory overhead

Docker Usage

Command Description
INLINE_CODE_73 Run Ollama in Docker
INLINE_CODE_74 Run model in container
INLINE_CODE_75 Pull model in container

Docker Compose

version: '3.8'
services:
  ollama:
    image: ollama/ollama
    ports:
      - "11434:11434"
    volumes:
      - ollama:/root/.ollama
    environment:
      - OLLAMA_HOST=0.0.0.0:11434
volumes:
  ollama:

Monitoring & Debugging

__TABLE_113_

Model Quantization

__TABLE_114_

Embedding Models

Command Description
INLINE_CODE_84 Pull text embedding model
INLINE_CODE_85 Generate embeddings

Troubleshooting

Command Description
INLINE_CODE_86 Show help information
INLINE_CODE_87 Show server options
INLINE_CODE_88 Check if Ollama is running
INLINE_CODE_89 Check port usage
INLINE_CODE_90 Remove all models

Buenas prácticas

  • Elija el tamaño del modelo basado en la RAM disponible (7B ♥ 4GB, 13B ♥ 8GB, 70B ♥ 40GB)
  • Utilice la aceleración de GPU cuando esté disponible para un mejor rendimiento
  • Implementar el correcto manejo de errores en las integraciones de API
  • Supervisar el uso de memoria al ejecutar múltiples modelos
  • Utilizar modelos cuantificados para entornos con capacitación en recursos
  • Modelos usados con frecuencia local
  • Establecer tamaños de contexto apropiados para su caso de uso
  • Utilizar streaming para respuestas largas para mejorar la experiencia del usuario
  • Implementar la limitación de tarifas para el uso de API de producción
  • Actualizaciones regulares de modelos para mejorar el rendimiento y las capacidades

Common Use Cases

Code Generation

ollama run codellama "Create a REST API in Python using FastAPI"

Text Analysis

ollama run llama3.1 "Analyze the sentiment of this text: 'I love this product!'"

Escritura Creativa

ollama run mistral "Write a short story about time travel"

Procesamiento de datos

ollama run llama3.1 "Convert this JSON to CSV format: {...}"