Zum Inhalt

Ollama

_

Ollama ist ein Werkzeug für den Betrieb großer Sprachmodelle vor Ort auf Ihrer Maschine, bietet Privatsphäre, Kontrolle und Offline-Zugang zu AI-Modellen wie Llama, Mistral und CodeLlama.

Installation und Inbetriebnahme

Command Description
INLINE_CODE_10 Install Ollama on Linux/macOS
INLINE_CODE_11 Install via Homebrew (macOS)
INLINE_CODE_12 Check installed version
INLINE_CODE_13 Start Ollama server
INLINE_CODE_14 List running models
INLINE_CODE_15 List installed models

/ Modellmanagement

Command Description
INLINE_CODE_16 Download Llama 3.1 model
INLINE_CODE_17 Download Mistral model
INLINE_CODE_18 Download CodeLlama model
INLINE_CODE_19 Download specific model size
INLINE_CODE_20 Show model information
INLINE_CODE_21 Remove model

Beliebte Modelle

Allgemeines Zielmodell

Command Description
INLINE_CODE_22 Llama 3.1 8B parameters
INLINE_CODE_23 Llama 3.1 70B parameters
INLINE_CODE_24 Mistral 7B model
INLINE_CODE_25 Mixtral 8x7B mixture of experts
INLINE_CODE_26 Google Gemma 7B
INLINE_CODE_27 Microsoft Phi-3 Mini

Code-Specialized Models

Command Description
INLINE_CODE_28 CodeLlama 7B for coding
INLINE_CODE_29 CodeLlama 13B for coding
INLINE_CODE_30 CodeGemma for code generation
INLINE_CODE_31 DeepSeek Coder model
INLINE_CODE_32 StarCoder2 for code

Spezialisierte Modelle

Command Description
INLINE_CODE_33 LLaVA multimodal model
INLINE_CODE_34 Text embedding model
INLINE_CODE_35 Sentence embedding model
INLINE_CODE_36 Large embedding model

Laufmodelle

Command Description
INLINE_CODE_37 Start interactive chat with Llama 3.1
INLINE_CODE_38 Single prompt to Mistral
INLINE_CODE_39 Code generation with CodeLlama
INLINE_CODE_40 Multimodal with image

oder Chat Schnittstelle

Command Description
INLINE_CODE_41 Start interactive chat
INLINE_CODE_42 Exit chat session
INLINE_CODE_43 Clear chat history
INLINE_CODE_44 Save chat to file
INLINE_CODE_45 Load chat from file
INLINE_CODE_46 Enable multiline input

Die API Verwendung

REST API

Command Description
INLINE_CODE_47 Generate text via API
INLINE_CODE_48 Chat via API
INLINE_CODE_49 List models via API
INLINE_CODE_50 Show model info via API

Streaming Responses

Command Description
INLINE_CODE_51 Stream response
INLINE_CODE_52 Stream chat

Modellkonfiguration

Temperatur und Parameter |Command|Description| |---------|-------------| |INLINE_CODE_53|Set temperature| |INLINE_CODE_54|Set top-p sampling| |INLINE_CODE_55|Set top-k sampling| |INLINE_CODE_56|Set repeat penalty| |INLINE_CODE_57|Set random seed|

Context and Memory

Command Description
INLINE_CODE_58 Set context window size
INLINE_CODE_59 Set batch size
INLINE_CODE_60 Set number of threads

Individuelle Modelle

Erstellen von Modelfiles

Command Description
INLINE_CODE_61 Create custom model
INLINE_CODE_62 Create with quantization

Modelfile Beispiele

# Basic Modelfile
FROM llama3.1
PARAMETER temperature 0.8
PARAMETER top_p 0.9
SYSTEM "You are a helpful coding assistant."
```_

```dockerfile
# Advanced Modelfile
FROM codellama:7b
PARAMETER temperature 0.2
PARAMETER top_k 40
PARAMETER repeat_penalty 1.1
SYSTEM """You are an expert programmer. Always provide:
1. Clean, well-commented code
2. Explanation of the solution
3. Best practices and optimizations"""

Integrationsbeispiele

Python Integration

```python import requests import json

def chat_with_ollama(prompt, model="llama3.1"): url = "http://localhost:11434/api/generate" data = { "model": model, "prompt": prompt, "stream": False } response = requests.post(url, json=data) return response.json()["response"]

Usage

result = chat_with_ollama("Explain quantum computing") print(result) ```_

JavaScript Integration

```javascript async function chatWithOllama(prompt, model = "llama3.1") { const response = await fetch("http://localhost:11434/api/generate", { method: "POST", headers: { "Content-Type": "application/json" }, body: JSON.stringify({ model: model, prompt: prompt, stream: false }) }); const data = await response.json(); return data.response; }

// Usage chatWithOllama("Write a JavaScript function").then(console.log); ```_

Bash Integration

```bash

!/bin/bash

ollama_chat() { local prompt="\(1" local model="\)" curl -s http://localhost:11434/api/generate \ -d "{\"model\":\"\(model\",\"prompt\":\"\)prompt\",\"stream\":false}" \ | jq -r '.response' }

Usage

ollama_chat "Explain Docker containers" ```_

 Leistungsoptimierung

Command Description
INLINE_CODE_63 Use GPU acceleration
INLINE_CODE_64 Set memory limit
INLINE_CODE_65 Set CPU threads
INLINE_CODE_66 Optimize batch size
_
Umweltvariablen
Variable Description
INLINE_CODE_67 Set server host (default: 127.0.0.1:11434)
INLINE_CODE_68 Set models directory
INLINE_CODE_69 Number of parallel requests
INLINE_CODE_70 Max models in memory
INLINE_CODE_71 Enable flash attention
INLINE_CODE_72 GPU memory overhead

/ Docker Nutzung

Command Description
INLINE_CODE_73 Run Ollama in Docker
INLINE_CODE_74 Run model in container
INLINE_CODE_75 Pull model in container

Docker Compose

yaml version: '3.8' services: ollama: image: ollama/ollama ports: - "11434:11434" volumes: - ollama:/root/.ollama environment: - OLLAMA_HOST=0.0.0.0:11434 volumes: ollama:_

Überwachung und Debugging

Command Description
INLINE_CODE_76 View Ollama logs
INLINE_CODE_77 Show running models and memory usage
INLINE_CODE_78 Check API version
INLINE_CODE_79 List available models

Modell Quantisierung

Command Description
INLINE_CODE_80 4-bit quantization
INLINE_CODE_81 5-bit quantization
INLINE_CODE_82 8-bit quantization
INLINE_CODE_83 16-bit float
_
Einbettungsmodelle
Command Description
INLINE_CODE_84 Pull text embedding model
INLINE_CODE_85 Generate embeddings
_
Fehlerbehebung
Command Description
INLINE_CODE_86 Show help information
INLINE_CODE_87 Show server options
INLINE_CODE_88 Check if Ollama is running
INLINE_CODE_89 Check port usage
INLINE_CODE_90 Remove all models
_
oder Best Practices
  • Modellgröße basierend auf verfügbarem RAM (7B ≈ 4GB, 13B ≈ 8GB, 70B ≈ 40GB)
  • Verwenden Sie GPU Beschleunigung, wenn verfügbar für bessere Leistung
  • Implementierung einer korrekten Fehlerbehandlung in API-Integrationen
  • Überwachen Sie die Speichernutzung bei mehreren Modellen
  • Verwenden Sie quantisierte Modelle für ressourcenbelastete Umgebungen
  • Cache häufig verwendete Modelle lokal
  • Stellen Sie geeignete Kontextgrößen für Ihren Anwendungsfall fest
  • Verwenden Sie Streaming für lange Antworten, um Benutzererlebnis zu verbessern
  • Implement Rate Begrenzung für die Produktion API Nutzung
  • Regelmäßige Modellaktualisierungen für verbesserte Leistung und Leistungsfähigkeit

Häufige Anwendungsfälle

Code Generation

bash ollama run codellama "Create a REST API in Python using FastAPI"_

Textanalyse

bash ollama run llama3.1 "Analyze the sentiment of this text: 'I love this product!'"_

Kreatives Schreiben

bash ollama run mistral "Write a short story about time travel"_

Datenverarbeitung

bash ollama run llama3.1 "Convert this JSON to CSV format: {...}"_