Zum Inhalt

Ollama

generieren

Ollama ist ein Werkzeug für den Betrieb großer Sprachmodelle vor Ort auf Ihrer Maschine, bietet Privatsphäre, Kontrolle und Offline-Zugang zu AI-Modellen wie Llama, Mistral und CodeLlama.

Installation und Inbetriebnahme

| | Command | Description | | | --- | --- | | | curl -fsSL https://ollama.ai/install.sh \ | sh | Install Ollama on Linux/macOS | | | | brew install ollama | Install via Homebrew (macOS) | | | | ollama --version | Check installed version | | | | ollama serve | Start Ollama server | | | | ollama ps | List running models | | | | ollama list | List installed models | |

Modellmanagement

| | Command | Description | | | --- | --- | | | ollama pull llama3.1 | Download Llama 3.1 model | | | | ollama pull mistral | Download Mistral model | | | | ollama pull codellama | Download CodeLlama model | | | | ollama pull gemma:7b | Download specific model size | | | | ollama show llama3.1 | Show model information | | | | ollama rm mistral | Remove model | |

Beliebte Modelle

Allgemeine Zielmodelle

| | Command | Description | | | --- | --- | | | ollama pull llama3.1:8b | Llama 3.1 8B parameters | | | | ollama pull llama3.1:70b | Llama 3.1 70B parameters | | | | ollama pull mistral:7b | Mistral 7B model | | | | ollama pull mixtral:8x7b | Mixtral 8x7B mixture of experts | | | | ollama pull gemma:7b | Google Gemma 7B | | | | ollama pull phi3:mini | Microsoft Phi-3 Mini | |

Code-Specialized Modelle

| | Command | Description | | | --- | --- | | | ollama pull codellama:7b | CodeLlama 7B for coding | | | | ollama pull codellama:13b | CodeLlama 13B for coding | | | | ollama pull codegemma:7b | CodeGemma for code generation | | | | ollama pull deepseek-coder:6.7b | DeepSeek Coder model | | | | ollama pull starcoder2:7b | StarCoder2 for code | |

Spezialisierte Modelle

| | Command | Description | | | --- | --- | | | ollama pull llava:7b | LLaVA multimodal model | | | | ollama pull nomic-embed-text | Text embedding model | | | | ollama pull all-minilm | Sentence embedding model | | | | ollama pull mxbai-embed-large | Large embedding model | |

Laufmodelle

| | Command | Description | | | --- | --- | | | ollama run llama3.1 | Start interactive chat with Llama 3.1 | | | | ollama run mistral "Hello, how are you?" | Single prompt to Mistral | | | | ollama run codellama "Write a Python function" | Code generation with CodeLlama | | | | ollama run llava "Describe this image" --image photo.jpg | Multimodal with image | |

Chat Schnittstelle

| | Command | Description | | | --- | --- | | | ollama run llama3.1 | Start interactive chat | | | | /bye | Exit chat session | | | | /clear | Clear chat history | | | | /save chat.txt | Save chat to file | | | | /load chat.txt | Load chat from file | | | | /multiline | Enable multiline input | |

API Verwendung

REST API

| | Command | Description | | | --- | --- | | | curl http://localhost:11434/api/generate -d '{"model":"llama3.1","prompt":"Hello"}' | Generate text via API | | | | curl http://localhost:11434/api/chat -d '{"model":"llama3.1","messages":[{"role":"user","content":"Hello"}]}' | Chat via API | | | | curl http://localhost:11434/api/tags | List models via API | | | | curl http://localhost:11434/api/show -d '{"name":"llama3.1"}' | Show model info via API | |

Antworten zu optimieren

| | Command | Description | | | --- | --- | | | curl http://localhost:11434/api/generate -d '{"model":"llama3.1","prompt":"Hello","stream":true}' | Stream response | | | | curl http://localhost:11434/api/chat -d '{"model":"llama3.1","messages":[{"role":"user","content":"Hello"}],"stream":true}' | Stream chat | |

Modellkonfiguration

Temperatur und Parameter

| | Command | Description | | | --- | --- | | | ollama run llama3.1 --temperature 0.7 | Set temperature | | | | ollama run llama3.1 --top-p 0.9 | Set top-p sampling | | | | ollama run llama3.1 --top-k 40 | Set top-k sampling | | | | ollama run llama3.1 --repeat-penalty 1.1 | Set repeat penalty | | | | ollama run llama3.1 --seed 42 | Set random seed | |

Kontext und Speicher

| | Command | Description | | | --- | --- | | | ollama run llama3.1 --ctx-size 4096 | Set context window size | | | | ollama run llama3.1 --batch-size 512 | Set batch size | | | | ollama run llama3.1 --threads 8 | Set number of threads | |

Kundenspezifische Modelle

Modelldateien erstellen

| | Command | Description | | | --- | --- | | | ollama create mymodel -f Modelfile | Create custom model | | | | ollama create mymodel -f Modelfile --quantize q4_0 | Create with quantization | |

Modelldateibeispiele

```dockerfile

Basic Modelfile

FROM llama3.1 PARAMETER temperature 0.8 PARAMETER top_p 0.9 SYSTEM "You are a helpful coding assistant." ```_

```dockerfile

Advanced Modelfile

FROM codellama:7b PARAMETER temperature 0.2 PARAMETER top_k 40 PARAMETER repeat_penalty 1.1 SYSTEM """You are an expert programmer. Always provide: 1. Clean, well-commented code 2. Explanation of the solution 3. Best practices and optimizations""" ```_

Integrationsbeispiele

Python Integration

```python import requests import json

def chat_with_ollama(prompt, model="llama3.1"): url = "http://localhost:11434/api/generate" data = { "model": model, "prompt": prompt, "stream": False } response = requests.post(url, json=data) return response.json()["response"]

Usage

result = chat_with_ollama("Explain quantum computing") print(result) ```_

Integration von JavaScript

```javascript async function chatWithOllama(prompt, model = "llama3.1") { const response = await fetch("http://localhost:11434/api/generate", { method: "POST", headers: { "Content-Type": "application/json" }, body: JSON.stringify({ model: model, prompt: prompt, stream: false }) }); const data = await response.json(); return data.response; }

// Usage chatWithOllama("Write a JavaScript function").then(console.log); ```_

Bash Integration

```bash

!/bin/bash

ollama_chat() { local prompt="$1" local model="${2:-llama3.1}" curl -s http://localhost:11434/api/generate \ -d "{\"model\":\"$model\",\"prompt\":\"$prompt\",\"stream\":false}" \ | jq -r '.response' }

Usage

ollama_chat "Explain Docker containers" ```_

Leistungsoptimierung

| | Command | Description | | | --- | --- | | | ollama run llama3.1 --gpu-layers 32 | Use GPU acceleration | | | | ollama run llama3.1 --memory-limit 8GB | Set memory limit | | | | ollama run llama3.1 --cpu-threads 8 | Set CPU threads | | | | ollama run llama3.1 --batch-size 1024 | Optimize batch size | |

Umweltvariablen

| | Variable | Description | | | --- | --- | | | OLLAMA_HOST | Set server host (default: 127.0.0.1:11434) | | | | OLLAMA_MODELS | Set models directory | | | | OLLAMA_NUM_PARALLEL | Number of parallel requests | | | | OLLAMA_MAX_LOADED_MODELS | Max models in memory | | | | OLLAMA_FLASH_ATTENTION | Enable flash attention | | | | OLLAMA_GPU_OVERHEAD | GPU memory overhead | |

Docker Nutzung

| | Command | Description | | | --- | --- | | | docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama | Run Ollama in Docker | | | | docker exec -it ollama ollama run llama3.1 | Run model in container | | | | docker exec -it ollama ollama pull mistral | Pull model in container | |

Docker komponiert

yaml version: '3.8' services: ollama: image: ollama/ollama ports: - "11434:11434" volumes: - ollama:/root/.ollama environment: - OLLAMA_HOST=0.0.0.0:11434 volumes: ollama:_

Überwachung und Debugging

| | Command | Description | | | --- | --- | | | ollama logs | View Ollama logs | | | | ollama ps | Show running models and memory usage | | | | curl http://localhost:11434/api/version | Check API version | | | | curl http://localhost:11434/api/tags | List available models | |

Modell Quantisierung

| | Command | Description | | | --- | --- | | | ollama create mymodel -f Modelfile --quantize q4_0 | 4-bit quantization | | | | ollama create mymodel -f Modelfile --quantize q5_0 | 5-bit quantization | | | | ollama create mymodel -f Modelfile --quantize q8_0 | 8-bit quantization | | | | ollama create mymodel -f Modelfile --quantize f16 | 16-bit float | |

Einbettungsmodelle

| | Command | Description | | | --- | --- | | | ollama pull nomic-embed-text | Pull text embedding model | | | | curl http://localhost:11434/api/embeddings -d '{"model":"nomic-embed-text","prompt":"Hello world"}' | Generate embeddings | |

Fehlerbehebung

| | Command | Description | | | --- | --- | | | ollama --help | Show help information | | | | ollama serve --help | Show server options | | | | ps aux \ | grep ollama | Check if Ollama is running | | | | lsof -i :11434 | Check port usage | | | | ollama rm --all | Remove all models | |

Best Practices

  • Modellgröße basierend auf verfügbarem RAM (7B ≈ 4GB, 13B ≈ 8GB, 70B ≈ 40GB)
  • Verwenden Sie GPU Beschleunigung, wenn verfügbar für bessere Leistung
  • Implementierung einer korrekten Fehlerbehandlung in API-Integrationen
  • Überwachen Sie die Speichernutzung bei mehreren Modellen
  • Verwenden Sie quantisierte Modelle für ressourcenorientierte Umgebungen
  • Cache häufig verwendete Modelle lokal
  • Stellen Sie geeignete Kontextgrößen für Ihren Anwendungsfall fest
  • Verwenden Sie Streaming für lange Antworten, um Benutzererlebnis zu verbessern
  • Implementierung Rate Begrenzung für die Produktion API Nutzung
  • Regelmäßige Modellaktualisierungen für verbesserte Leistung und Leistungsfähigkeit

Allgemeine Anwendungsfälle

Code Generation

bash ollama run codellama "Create a REST API in Python using FastAPI"_

Textanalyse

bash ollama run llama3.1 "Analyze the sentiment of this text: 'I love this product!'"_

Kreatives Schreiben

bash ollama run mistral "Write a short story about time travel"_

Datenverarbeitung

bash ollama run llama3.1 "Convert this JSON to CSV format: {...}"_