cli-tool
intermediate
utility
Ollama
Ollama
__HTML_TAG_92_ Todos los comandos_HTML_TAG_93__
Generar PDF
Ollama es una herramienta para ejecutar grandes modelos de lenguaje localmente en su máquina, proporcionando privacidad, control y acceso sin conexión a los modelos AI como Llama, Mistral y CodeLlama.
Instalación > Configuración
Command
Description
INLINE_CODE_10
Install Ollama on Linux/macOS
INLINE_CODE_11
Install via Homebrew (macOS)
INLINE_CODE_12
Check installed version
INLINE_CODE_13
Start Ollama server
INLINE_CODE_14
List running models
INLINE_CODE_15
List installed models
Model Management
Command
Description
INLINE_CODE_16
Download Llama 3.1 model
INLINE_CODE_17
Download Mistral model
INLINE_CODE_18
Download CodeLlama model
INLINE_CODE_19
Download specific model size
INLINE_CODE_20
Show model information
INLINE_CODE_21
Remove model
Popular Models
General Purpose Models
Command
Description
INLINE_CODE_22
Llama 3.1 8B parameters
INLINE_CODE_23
Llama 3.1 70B parameters
INLINE_CODE_24
Mistral 7B model
INLINE_CODE_25
Mixtral 8x7B mixture of experts
INLINE_CODE_26
Google Gemma 7B
INLINE_CODE_27
Microsoft Phi-3 Mini
Code-Specialized Models
Command
Description
INLINE_CODE_28
CodeLlama 7B for coding
INLINE_CODE_29
CodeLlama 13B for coding
INLINE_CODE_30
CodeGemma for code generation
INLINE_CODE_31
DeepSeek Coder model
INLINE_CODE_32
StarCoder2 for code
Specialized Models
Command
Description
INLINE_CODE_33
LLaVA multimodal model
INLINE_CODE_34
Text embedding model
INLINE_CODE_35
Sentence embedding model
INLINE_CODE_36
Large embedding model
Running Models
__TABLE_103_
Chat Interface
__TABLE_104_
API Usage
REST API_TABLE_105__
Streaming Responses
Command
Description
INLINE_CODE_51
Stream response
INLINE_CODE_52
Stream chat
Model Configuration
Temperatura y parámetros
Command
Description
INLINE_CODE_53
Set temperature
INLINE_CODE_54
Set top-p sampling
INLINE_CODE_55
Set top-k sampling
INLINE_CODE_56
Set repeat penalty
INLINE_CODE_57
Set random seed
_
### Context and Memory
Command
Description
---------
-------------
INLINE_CODE_58
Set context window size
INLINE_CODE_59
Set batch size
INLINE_CODE_60
Set number of threads
Custom Models
Creating Modelfiles
Command
Description
INLINE_CODE_61
Create custom model
INLINE_CODE_62
Create with quantization
Ejemplos de perfil modelo
# Basic Modelfile
FROM llama3.1
PARAMETER temperature 0 .8
PARAMETER top_p 0 .9
SYSTEM "You are a helpful coding assistant."
# Advanced Modelfile
FROM codellama:7b
PARAMETER temperature 0 .2
PARAMETER top_k 40
PARAMETER repeat_penalty 1 .1
SYSTEM """You are an expert programmer. Always provide:
1 . Clean, well-commented code
2 . Explanation of the solution
3 . Best practices and optimizations"""
Integración Ejemplos
Python Integration
import requests
import json
def chat_with_ollama ( prompt , model = "llama3.1" ):
url = "http://localhost:11434/api/generate"
data = {
"model" : model ,
"prompt" : prompt ,
"stream" : False
}
response = requests . post ( url , json = data )
return response . json ()[ "response" ]
# Usage
result = chat_with_ollama ( "Explain quantum computing" )
print ( result )
Integración JavaScript
async function chatWithOllama ( prompt , model = "llama3.1" ) {
const response = await fetch ( "http://localhost:11434/api/generate" , {
method : "POST" ,
headers : { "Content-Type" : "application/json" },
body : JSON . stringify ({
model : model ,
prompt : prompt ,
stream : false
})
});
const data = await response . json ();
return data . response ;
}
// Usage
chatWithOllama ( "Write a JavaScript function" ). then ( console . log );
Bash Integration
#!/bin/bash
ollama_chat() {
local prompt = " $1 "
local model = " ${ 2 :- llama3 .1 } "
curl -s http://localhost:11434/api/generate \
-d "{\"model\":\" $model \",\"prompt\":\" $prompt \",\"stream\":false}" \
| jq -r '.response'
}
# Usage
ollama_chat "Explain Docker containers"
Command
Description
INLINE_CODE_63
Use GPU acceleration
INLINE_CODE_64
Set memory limit
INLINE_CODE_65
Set CPU threads
INLINE_CODE_66
Optimize batch size
Environment Variables
Variable
Description
INLINE_CODE_67
Set server host (default: 127.0.0.1:11434)
INLINE_CODE_68
Set models directory
INLINE_CODE_69
Number of parallel requests
INLINE_CODE_70
Max models in memory
INLINE_CODE_71
Enable flash attention
INLINE_CODE_72
GPU memory overhead
Docker Usage
Command
Description
INLINE_CODE_73
Run Ollama in Docker
INLINE_CODE_74
Run model in container
INLINE_CODE_75
Pull model in container
Docker Compose
version : '3.8'
services :
ollama :
image : ollama/ollama
ports :
- "11434:11434"
volumes :
- ollama:/root/.ollama
environment :
- OLLAMA_HOST=0.0.0.0:11434
volumes :
ollama :
Monitoring & Debugging
__TABLE_113_
Model Quantization
__TABLE_114_
Embedding Models
Command
Description
INLINE_CODE_84
Pull text embedding model
INLINE_CODE_85
Generate embeddings
Troubleshooting
Command
Description
INLINE_CODE_86
Show help information
INLINE_CODE_87
Show server options
INLINE_CODE_88
Check if Ollama is running
INLINE_CODE_89
Check port usage
INLINE_CODE_90
Remove all models
Buenas prácticas
Elija el tamaño del modelo basado en la RAM disponible (7B ♥ 4GB, 13B ♥ 8GB, 70B ♥ 40GB)
Utilice la aceleración de GPU cuando esté disponible para un mejor rendimiento
Implementar el correcto manejo de errores en las integraciones de API
Supervisar el uso de memoria al ejecutar múltiples modelos
Utilizar modelos cuantificados para entornos con capacitación en recursos
Modelos usados con frecuencia local
Establecer tamaños de contexto apropiados para su caso de uso
Utilizar streaming para respuestas largas para mejorar la experiencia del usuario
Implementar la limitación de tarifas para el uso de API de producción
Actualizaciones regulares de modelos para mejorar el rendimiento y las capacidades
Common Use Cases
Code Generation
ollama run codellama "Create a REST API in Python using FastAPI"
Text Analysis
ollama run llama3.1 "Analyze the sentiment of this text: 'I love this product!'"
Escritura Creativa
ollama run mistral "Write a short story about time travel"
Procesamiento de datos
ollama run llama3.1 "Convert this JSON to CSV format: {...}"