Anubis
Overview
Section intitulée « Overview »Anubis is an open-source Web AI Firewall and anti-scraping reverse proxy that protects upstream resources from AI crawlers, scraper bots, and automated threats. It implements proof-of-work (SHA-256) challenges delivered via JavaScript to verify that requests come from legitimate browsers rather than AI crawlers or bot networks.
Created by Xe Iaso after experiencing significant resource exhaustion when Amazon crawlers overloaded their Git server, Anubis provides a lightweight, efficient protection layer written in Go. It sits between user traffic and your application, transparently filtering malicious automated access.
GitHub: TecharoHQ/anubis
License: MIT/Apache 2.0
Built With: Go, JavaScript
Installation
Section intitulée « Installation »Prerequisites
Section intitulée « Prerequisites »- Go 1.19+ or Docker
- Upstream server to protect
- TLS certificates (for HTTPS protection)
Build from Source
Section intitulée « Build from Source »# Clone repository
git clone https://github.com/TecharoHQ/anubis.git
cd anubis
# Build binary
go build -o anubis ./cmd/anubis
# Verify installation
./anubis --version
Docker Installation
Section intitulée « Docker Installation »# Pull Docker image
docker pull techarohq/anubis:latest
# Run container
docker run -d \
-p 8080:8080 \
-e UPSTREAM_URL=http://backend:3000 \
techarohq/anubis:latest
Docker Compose Setup
Section intitulée « Docker Compose Setup »version: '3.8'
services:
anubis:
image: techarohq/anubis:latest
ports:
- "8080:8080"
- "8443:8443"
environment:
UPSTREAM_URL: http://backend:3000
ENABLE_HTTPS: "true"
CHALLENGE_DIFFICULTY: "medium"
LOG_LEVEL: "info"
volumes:
- ./certs:/etc/anubis/certs
restart: unless-stopped
backend:
image: myapp:latest
expose:
- "3000"
System Service (Linux)
Section intitulée « System Service (Linux) »# Copy binary to system location
sudo cp anubis /usr/local/bin/
# Create systemd service
sudo tee /etc/systemd/system/anubis.service > /dev/null << 'EOF'
[Unit]
Description=Anubis Web AI Firewall
After=network.target
[Service]
Type=simple
User=anubis
ExecStart=/usr/local/bin/anubis -config /etc/anubis/config.yaml
Restart=on-failure
RestartSec=10
[Install]
WantedBy=multi-user.target
EOF
# Enable and start service
sudo systemctl daemon-reload
sudo systemctl enable anubis
sudo systemctl start anubis
Configuration
Section intitulée « Configuration »Basic Configuration
Section intitulée « Basic Configuration »# config.yaml
server:
listen: ":8080"
read_timeout: 30s
write_timeout: 30s
idle_timeout: 60s
upstream:
url: "http://localhost:3000"
timeout: 30s
max_idle_conns: 100
challenge:
enabled: true
difficulty: "medium"
timeout: 300s
cache_size: 10000
logging:
level: "info"
format: "json"
output: "stdout"
Environment Variables
Section intitulée « Environment Variables »# Core settings
UPSTREAM_URL=http://localhost:3000
LISTEN_ADDR=:8080
LOG_LEVEL=info
# Challenge settings
CHALLENGE_ENABLED=true
CHALLENGE_DIFFICULTY=medium
CHALLENGE_TIMEOUT=300
# Performance settings
MAX_IDLE_CONNS=100
REQUEST_TIMEOUT=30
CACHE_SIZE=10000
Core Commands
Section intitulée « Core Commands »| Command | Purpose | Example |
|---|---|---|
anubis | Start with default config | anubis |
anubis -config | Start with custom config | anubis -config /etc/anubis/config.yaml |
anubis -upstream | Set upstream URL | anubis -upstream http://app:3000 |
anubis -listen | Set listen address | anubis -listen :8443 |
anubis -help | Show help | anubis -help |
anubis -version | Show version | anubis -version |
Proof-of-Work Challenge System
Section intitulée « Proof-of-Work Challenge System »How Challenges Work
Section intitulée « How Challenges Work »Anubis challenges requests with a SHA-256 proof-of-work puzzle:
- Browser receives challenge HTML/JavaScript
- Client-side JavaScript computes SHA-256 hashes
- Once valid nonce found (matching difficulty), request continues
- Server validates proof-of-work before proxying
Challenge Difficulty Levels
Section intitulée « Challenge Difficulty Levels »challenge:
difficulty: easy # ~0.5 seconds CPU (home connections)
# OR
difficulty: medium # ~2 seconds CPU (default, bots filtered)
# OR
difficulty: hard # ~10 seconds CPU (heavy protection)
Challenge Response Example
Section intitulée « Challenge Response Example »// Browser receives challenge
{
"challenge": "find_nonce_for_this_hash",
"target": "00001234abcd...",
"difficulty": "medium"
}
// Browser solves and returns
{
"challenge": "...",
"nonce": "12345",
"proof": "valid_sha256_hash"
}
Request Flow
Section intitulée « Request Flow »Normal Browser Request
Section intitulée « Normal Browser Request »┌─────────────┐
│ Browser │
└──────┬──────┘
│ HTTP GET /page
▼
┌──────────────────┐
│ Anubis Firewall │ ◄─── JavaScript Challenge
├──────────────────┤ (SHA-256 PoW)
│ Challenge System │
│ Cache │
└──────┬───────────┘
│ HTTP Request (with PoW token)
▼
┌──────────────────┐
│ Upstream Server │
└──────────────────┘
Blocked AI Crawler Request
Section intitulée « Blocked AI Crawler Request »┌─────────────┐
│ AI Crawler │
└──────┬──────┘
│ HTTP GET /page
▼
┌──────────────────┐
│ Anubis Firewall │
├──────────────────┤
│ JavaScript │
│ Not Executed ✗ │
└──────────────────┘
▼
403 Forbidden (PoW Required)
Advanced Configuration
Section intitulée « Advanced Configuration »Rate Limiting Integration
Section intitulée « Rate Limiting Integration »rate_limit:
enabled: true
requests_per_second: 100
burst: 10
per_ip: true
challenge:
difficulty: medium
# Higher difficulty for repeated failures
escalate_on_failure: true
Custom Challenge Difficulty
Section intitulée « Custom Challenge Difficulty »challenge:
difficulty: "custom"
custom_difficulty_bits: 18 # Adjust PoW difficulty in bits
timeout: 600
# Difficulty scaling based on time of day
schedules:
- time: "08:00-18:00"
difficulty: "easy"
- time: "18:00-08:00"
difficulty: "hard"
Whitelist & Blacklist
Section intitulée « Whitelist & Blacklist »acl:
whitelist:
- "203.0.113.0/24" # Trusted networks
- "user-agent:GoogleBot" # Legitimate crawlers
blacklist:
- "1.2.3.4" # Known bad IPs
- "user-agent:BadBot" # Known malicious bots
# Whitelist never challenges
# Blacklist always blocked
challenges_required_for_others: true
HTTPS/TLS Configuration
Section intitulée « HTTPS/TLS Configuration »server:
listen: ":8443"
use_tls: true
tls:
cert_file: "/etc/anubis/cert.pem"
key_file: "/etc/anubis/key.pem"
# Auto-renew with Let's Encrypt
auto_renew: true
acme_email: "admin@example.com"
Deployment Examples
Section intitulée « Deployment Examples »Nginx Reverse Proxy + Anubis
Section intitulée « Nginx Reverse Proxy + Anubis »upstream anubis {
server localhost:8080;
}
server {
listen 80;
server_name example.com;
location / {
proxy_pass http://anubis;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}
}
Kubernetes Deployment
Section intitulée « Kubernetes Deployment »apiVersion: apps/v1
kind: Deployment
metadata:
name: anubis
spec:
replicas: 3
selector:
matchLabels:
app: anubis
template:
metadata:
labels:
app: anubis
spec:
containers:
- name: anubis
image: techarohq/anubis:latest
ports:
- containerPort: 8080
env:
- name: UPSTREAM_URL
value: "http://backend-service:3000"
- name: CHALLENGE_DIFFICULTY
value: "medium"
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 10
periodSeconds: 10
Docker Swarm Deployment
Section intitulée « Docker Swarm Deployment »# Create Anubis service
docker service create \
--name anubis \
--publish 8080:8080 \
--replicas 3 \
--env UPSTREAM_URL=http://backend:3000 \
--env CHALLENGE_DIFFICULTY=medium \
techarohq/anubis:latest
# Scale service
docker service scale anubis=5
Monitoring & Observability
Section intitulée « Monitoring & Observability »Health Check Endpoint
Section intitulée « Health Check Endpoint »# Check Anubis status
curl http://localhost:8080/health
# Response
{
"status": "healthy",
"upstream_healthy": true,
"challenges_served": 1542,
"challenges_solved": 1389,
"uptime_seconds": 86400
}
Metrics Endpoint
Section intitulée « Metrics Endpoint »# Prometheus-compatible metrics
curl http://localhost:8080/metrics
# Output includes:
# anubis_requests_total
# anubis_challenges_issued
# anubis_challenges_solved
# anubis_upstream_latency_ms
# anubis_bot_requests_blocked
Logging Configuration
Section intitulée « Logging Configuration »logging:
level: "info"
format: "json"
# Log to file
file:
enabled: true
path: "/var/log/anubis/anubis.log"
max_size_mb: 100
max_backups: 10
# Structured logging
fields:
request_id: true
user_agent: true
remote_ip: true
response_time: true
upstream_latency: true
Performance Tuning
Section intitulée « Performance Tuning »Connection Pooling
Section intitulée « Connection Pooling »upstream:
max_idle_conns: 200 # Increased from default
max_conns_per_host: 100
idle_conn_timeout: 90s
Challenge Caching
Section intitulée « Challenge Caching »challenge:
cache_size: 50000 # More cache for frequent users
cache_ttl: 3600s
cache_backend: "redis" # Optional: use Redis for distributed
Request Optimization
Section intitulée « Request Optimization »server:
read_timeout: 20s
write_timeout: 20s
idle_timeout: 30s
# Gzip compression
gzip:
enabled: true
level: 6
min_size: 1024
Bot Detection & Blocking
Section intitulée « Bot Detection & Blocking »User-Agent Based Rules
Section intitulée « User-Agent Based Rules »bot_detection:
block_headless_browsers: true
block_curl_wget: true
block_agents:
- "python-requests"
- "scrapy"
- "selenium"
- "beautifulsoup"
- "mechanize"
allow_agents:
- "googlebot"
- "bingbot"
- "applebot"
Behavioral Analysis
Section intitulée « Behavioral Analysis »bot_detection:
# Detect non-human-like behavior
require_js_execution: true
detect_headless: true
# Patterns that trigger challenges
patterns:
rapid_requests: "10/second"
sequential_urls: true
missing_referer: true
suspicious_headers: true
IP Reputation Integration
Section intitulée « IP Reputation Integration »threat_intelligence:
enabled: true
# External threat feeds
sources:
- "abuseipdb"
- "maxmind"
- "custom_internal_feed"
# Action on known bad IPs
actions:
reputation_score_above: 50
action: "block"
Integration with Applications
Section intitulée « Integration with Applications »Passing Challenge Info to Backend
Section intitulée « Passing Challenge Info to Backend »headers_to_upstream:
x-anubis-challenge-solved: true
x-anubis-solved-timestamp: "2024-01-15T10:30:00Z"
x-anubis-client-ip: true
x-anubis-difficulty-level: true
Custom Headers in JavaScript
Section intitulée « Custom Headers in JavaScript »// Browser JavaScript can access challenge info
const challengeInfo = {
solved: true,
difficulty: "medium",
duration_ms: 1234
};
// Send with next request
fetch(url, {
headers: {
'X-Challenge-Duration': challengeInfo.duration_ms
}
});
Troubleshooting
Section intitulée « Troubleshooting »High CPU Usage
Section intitulée « High CPU Usage »# Reduce challenge difficulty
challenge:
difficulty: "easy"
# Or increase cache size
challenge:
cache_size: 100000
# Check upstream performance
upstream:
timeout: 60s # Increase if backend slow
Challenge Failures
Section intitulée « Challenge Failures »# Enable debug logging
LOG_LEVEL=debug anubis
# Check JavaScript delivery
curl -v http://localhost:8080/
# Verify challenge endpoint
curl http://localhost:8080/challenge/verify
Backend Timeout Issues
Section intitulée « Backend Timeout Issues »upstream:
timeout: 60s
keepalive_timeout: 120s
server:
read_timeout: 30s
write_timeout: 30s
Redis Cache Issues (if enabled)
Section intitulée « Redis Cache Issues (if enabled) »# Check Redis connection
redis-cli PING
# Monitor cache
redis-cli MONITOR
# Clear cache if needed
redis-cli FLUSHDB
Best Practices
Section intitulée « Best Practices »Security
Section intitulée « Security »- Always use HTTPS for production deployments
- Whitelist legitimate crawlers if needed (e.g., Google, Bing)
- Monitor challenge metrics for anomalies
- Rotate TLS certificates regularly
- Keep upstream secret - don’t expose in logs/errors
Performance
Section intitulée « Performance »- Use Redis for distributed deployments
- Implement proper health checks in load balancers
- Cache aggressively - challenges are stateless
- Monitor upstream latency - Anubis is lightweight
- Scale horizontally - stateless design supports it
Operations
Section intitulée « Operations »# Monitor challenge rate
curl http://localhost:8080/metrics | grep challenges
# Check error rates
curl http://localhost:8080/metrics | grep errors
# Validate config before deployment
anubis -config config.yaml -validate
Q: Does Anubis block legitimate users?
A: No. Modern browsers execute JavaScript seamlessly. Only headless browsers and CLI tools are challenged.
Q: What about accessibility?
A: Implement fallback mechanisms for users who can’t complete challenges (optional contact form).
Q: Can it block specific content?
A: Anubis only protects with PoW challenges. Use WAF/firewall rules for content blocking.
Q: Performance impact?
A: Minimal (~1-2ms latency). Challenge computation happens client-side.
Resources
Section intitulée « Resources »- GitHub: https://github.com/TecharoHQ/anubis
- Documentation: https://anubis.techarohq.dev
- Issue Tracker: https://github.com/TecharoHQ/anubis/issues
- Discussions: https://github.com/TecharoHQ/anubis/discussions
Related Tools
Section intitulée « Related Tools »- Cloudflare Challenges (proprietary)
- AWS WAF (managed service)
- Nginx ModSecurity (open-source WAF)
- Datadome (bot management)