Overview
Seldon Core is an open-source platform for deploying, scaling, and monitoring machine learning models on Kubernetes. It converts ML models into production-ready microservices with REST and gRPC APIs, providing enterprise-grade features like canary deployments, A/B testing, multi-armed bandit routing, inference graphs, model explainability, and outlier/drift detection.
Seldon Core uses a custom Kubernetes resource called SeldonDeployment to declaratively define model serving configurations. It supports models from any framework (Scikit-learn, TensorFlow, PyTorch, XGBoost, custom) through pre-packaged model servers or custom Docker containers. The platform integrates with Istio or Ambassador for traffic management, Prometheus for monitoring, and Grafana for dashboards. Seldon Core V2 introduces a new architecture with an inference scheduler, multi-model serving, and pipeline capabilities.
Installation
Install with Helm
# Add Seldon Helm chart repo
helm repo add seldon https://storage.googleapis.com/seldon-charts
helm repo update
# Install Seldon Core operator
kubectl create namespace seldon-system
helm install seldon-core seldon/seldon-core-operator \
--namespace seldon-system \
--set usageMetrics.enabled=true \
--set istio.enabled=true \
--set certManager.enabled=false
# Verify installation
kubectl get pods -n seldon-system
# Install with Ambassador instead of Istio
helm install seldon-core seldon/seldon-core-operator \
--namespace seldon-system \
--set ambassador.enabled=true \
--set istio.enabled=false
Install Seldon Core V2
# Install V2 with Helm
helm install seldon-core-v2 seldon/seldon-core-v2-setup \
--namespace seldon-mesh \
--create-namespace
helm install seldon-v2 seldon/seldon-core-v2-runtime \
--namespace seldon-mesh
Prerequisites
# Install Istio (if using Istio ingress)
istioctl install --set profile=default
kubectl label namespace default istio-injection=enabled
# Install cert-manager (optional, for TLS)
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.14.0/cert-manager.yaml
Core Deployments
Simple Model Deployment
# sklearn-deployment.yaml
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
name: iris-model
namespace: default
spec:
name: iris
predictors:
- name: default
replicas: 2
graph:
name: classifier
implementation: SKLEARN_SERVER
modelUri: gs://seldon-models/v1.19.0-dev/sklearn/iris
envSecretRefName: seldon-init-container-secret
componentSpecs:
- spec:
containers:
- name: classifier
resources:
requests:
cpu: "0.5"
memory: "1Gi"
limits:
cpu: "1"
memory: "2Gi"
XGBoost Model
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
name: xgboost-model
spec:
predictors:
- name: default
graph:
name: model
implementation: XGBOOST_SERVER
modelUri: s3://models/xgboost/model
parameters:
- name: method
type: STRING
value: predict_proba
Custom Model Server
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
name: custom-model
spec:
predictors:
- name: default
graph:
name: model
type: MODEL
componentSpecs:
- spec:
containers:
- name: model
image: my-registry/my-model:v1.0
ports:
- containerPort: 9000
name: http
env:
- name: MODEL_PATH
value: /models/model.pkl
Pre-Packaged Model Servers
| Server | Implementation | Supported Formats |
|---|
| SKLearn | SKLEARN_SERVER | Pickle, Joblib |
| XGBoost | XGBOOST_SERVER | BST, JSON, UBJ |
| TensorFlow | TENSORFLOW_SERVER | SavedModel |
| MLflow | MLFLOW_SERVER | MLflow model artifacts |
| Triton | TRITON_SERVER | ONNX, TensorRT, TF, PyTorch |
| Tempo | TEMPO_SERVER | Custom Python |
| Custom | N/A (use Docker image) | Any |
kubectl Commands
| Command | Description |
|---|
kubectl get seldondeployments | List all Seldon deployments |
kubectl get sdep | Short alias for above |
kubectl describe sdep <name> | Show deployment details |
kubectl apply -f deployment.yaml | Create/update deployment |
kubectl delete sdep <name> | Delete deployment |
kubectl get pods -l seldon-deployment-id=<name> | List pods for deployment |
kubectl logs -l seldon-deployment-id=<name> -c <container> | View model logs |
kubectl get svc -l seldon-deployment-id=<name> | List services |
Making Predictions
# REST prediction (via Istio gateway)
curl -X POST http://<istio-gateway>/seldon/default/iris-model/api/v1.0/predictions \
-H "Content-Type: application/json" \
-d '{
"data": {
"ndarray": [[5.1, 3.5, 1.4, 0.2]]
}
}'
# REST prediction (direct service)
curl -X POST http://iris-model-default.default.svc.cluster.local:8000/api/v1.0/predictions \
-H "Content-Type: application/json" \
-d '{"data": {"ndarray": [[5.1, 3.5, 1.4, 0.2]]}}'
# Feedback endpoint
curl -X POST http://<gateway>/seldon/default/iris-model/api/v1.0/feedback \
-H "Content-Type: application/json" \
-d '{
"request": {"data": {"ndarray": [[5.1, 3.5, 1.4, 0.2]]}},
"reward": 1.0,
"truth": {"data": {"ndarray": [0]}}
}'
# gRPC prediction
grpcurl -d '{"data": {"ndarray": [[5.1, 3.5, 1.4, 0.2]]}}' \
-plaintext localhost:5000 seldon.protos.Seldon/Predict
Advanced Deployments
Canary Deployment
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
name: canary-example
spec:
predictors:
- name: stable
traffic: 80
graph:
name: model
implementation: SKLEARN_SERVER
modelUri: gs://models/v1
- name: canary
traffic: 20
graph:
name: model
implementation: SKLEARN_SERVER
modelUri: gs://models/v2
A/B Test with Multi-Armed Bandit
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
name: ab-test
spec:
predictors:
- name: model-a
traffic: 50
graph:
name: model
implementation: SKLEARN_SERVER
modelUri: gs://models/model-a
- name: model-b
traffic: 50
graph:
name: model
implementation: SKLEARN_SERVER
modelUri: gs://models/model-b
Inference Graph (Preprocessing + Model + Postprocessing)
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
name: pipeline
spec:
predictors:
- name: default
graph:
name: model
implementation: SKLEARN_SERVER
modelUri: gs://models/classifier
children:
- name: preprocessor
type: TRANSFORMER
implementation: SKLEARN_SERVER
modelUri: gs://models/transformer
componentSpecs:
- spec:
containers:
- name: model
resources:
requests: {cpu: "1", memory: "2Gi"}
- name: preprocessor
resources:
requests: {cpu: "0.5", memory: "1Gi"}
Explainer (Model Interpretability)
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
name: model-with-explainer
spec:
predictors:
- name: default
graph:
name: model
implementation: SKLEARN_SERVER
modelUri: gs://models/classifier
explainer:
type: AnchorTabular
modelUri: gs://models/explainer
containerSpec:
resources:
requests: {cpu: "0.5", memory: "1Gi"}
# Request explanation
curl -X POST http://<gateway>/seldon/default/model-with-explainer/api/v1.0/explain \
-H "Content-Type: application/json" \
-d '{"data": {"ndarray": [[5.1, 3.5, 1.4, 0.2]]}}'
Configuration
Custom Python Model
# MyModel.py
class MyModel:
def __init__(self):
self.model = None
def load(self):
import joblib
self.model = joblib.load("/models/model.pkl")
def predict(self, X, features_names=None):
return self.model.predict(X)
def predict_proba(self, X, features_names=None):
return self.model.predict_proba(X)
def tags(self):
return {"version": "1.0", "framework": "sklearn"}
def metrics(self):
return [
{"type": "COUNTER", "key": "predictions_total", "value": 1},
]
Monitoring with Prometheus
# ServiceMonitor for Prometheus
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: seldon-metrics
spec:
selector:
matchLabels:
seldon-deployment-id: iris-model
endpoints:
- port: http
path: /prometheus
interval: 15s
Troubleshooting
| Issue | Solution |
|---|
| Pod stuck in Init | Check init container logs for model download errors. Verify modelUri and credentials |
| CrashLoopBackOff | Check container logs. Verify model file compatibility with server version |
| 503 Service Unavailable | Check pod readiness. Verify Istio/Ambassador gateway configuration |
| Model download fails from S3 | Create secret seldon-init-container-secret with AWS credentials |
| Prediction returns 500 | Check model server logs. Verify input format matches model expectations |
| Canary not routing correctly | Verify traffic percentages sum to 100. Check Istio VirtualService |
| Explainer timeout | Increase explainer resources. Reduce explanation sample size |
| Scaling not working | Verify HPA is configured. Check Seldon deployment annotations for autoscaling |
| gRPC connection refused | Ensure gRPC port (5001) is exposed. Check service port mapping |
| Metrics not appearing | Verify Prometheus ServiceMonitor is created. Check /prometheus endpoint |