Anubis

Overview

Anubis is an open-source Web AI Firewall and anti-scraping reverse proxy that protects upstream resources from AI crawlers, scraper bots, and automated threats. It implements proof-of-work (SHA-256) challenges delivered via JavaScript to verify that requests come from legitimate browsers rather than AI crawlers or bot networks.

Created by Xe Iaso after experiencing significant resource exhaustion when Amazon crawlers overloaded their Git server, Anubis provides a lightweight, efficient protection layer written in Go. It sits between user traffic and your application, transparently filtering malicious automated access.

GitHub: TecharoHQ/anubis
License: MIT/Apache 2.0
Built With: Go, JavaScript

Installation

Prerequisites

Go 1.19+ or Docker
Upstream server to protect
TLS certificates (for HTTPS protection)

Build from Source

# Clone repository
git clone https://github.com/TecharoHQ/anubis.git
cd anubis

# Build binary
go build -o anubis ./cmd/anubis

# Verify installation
./anubis --version

Docker Installation

# Pull Docker image
docker pull techarohq/anubis:latest

# Run container
docker run -d \
  -p 8080:8080 \
  -e UPSTREAM_URL=http://backend:3000 \
  techarohq/anubis:latest

Docker Compose Setup

version: '3.8'

services:
  anubis:
    image: techarohq/anubis:latest
    ports:
      - "8080:8080"
      - "8443:8443"
    environment:
      UPSTREAM_URL: http://backend:3000
      ENABLE_HTTPS: "true"
      CHALLENGE_DIFFICULTY: "medium"
      LOG_LEVEL: "info"
    volumes:
      - ./certs:/etc/anubis/certs
    restart: unless-stopped

  backend:
    image: myapp:latest
    expose:
      - "3000"

System Service (Linux)

# Copy binary to system location
sudo cp anubis /usr/local/bin/

# Create systemd service
sudo tee /etc/systemd/system/anubis.service > /dev/null << 'EOF'
[Unit]
Description=Anubis Web AI Firewall
After=network.target

[Service]
Type=simple
User=anubis
ExecStart=/usr/local/bin/anubis -config /etc/anubis/config.yaml
Restart=on-failure
RestartSec=10

[Install]
WantedBy=multi-user.target
EOF

# Enable and start service
sudo systemctl daemon-reload
sudo systemctl enable anubis
sudo systemctl start anubis

Configuration

Basic Configuration

# config.yaml
server:
  listen: ":8080"
  read_timeout: 30s
  write_timeout: 30s
  idle_timeout: 60s

upstream:
  url: "http://localhost:3000"
  timeout: 30s
  max_idle_conns: 100

challenge:
  enabled: true
  difficulty: "medium"
  timeout: 300s
  cache_size: 10000

logging:
  level: "info"
  format: "json"
  output: "stdout"

Environment Variables

# Core settings
UPSTREAM_URL=http://localhost:3000
LISTEN_ADDR=:8080
LOG_LEVEL=info

# Challenge settings
CHALLENGE_ENABLED=true
CHALLENGE_DIFFICULTY=medium
CHALLENGE_TIMEOUT=300

# Performance settings
MAX_IDLE_CONNS=100
REQUEST_TIMEOUT=30
CACHE_SIZE=10000

Core Commands

Command	Purpose	Example
`anubis`	Start with default config	`anubis`
`anubis -config`	Start with custom config	`anubis -config /etc/anubis/config.yaml`
`anubis -upstream`	Set upstream URL	`anubis -upstream http://app:3000`
`anubis -listen`	Set listen address	`anubis -listen :8443`
`anubis -help`	Show help	`anubis -help`
`anubis -version`	Show version	`anubis -version`

Proof-of-Work Challenge System

How Challenges Work

Anubis challenges requests with a SHA-256 proof-of-work puzzle:

Browser receives challenge HTML/JavaScript
Client-side JavaScript computes SHA-256 hashes
Once valid nonce found (matching difficulty), request continues
Server validates proof-of-work before proxying

Challenge Difficulty Levels

challenge:
  difficulty: easy      # ~0.5 seconds CPU (home connections)
  # OR
  difficulty: medium    # ~2 seconds CPU (default, bots filtered)
  # OR
  difficulty: hard      # ~10 seconds CPU (heavy protection)

Challenge Response Example

// Browser receives challenge
{
  "challenge": "find_nonce_for_this_hash",
  "target": "00001234abcd...",
  "difficulty": "medium"
}

// Browser solves and returns
{
  "challenge": "...",
  "nonce": "12345",
  "proof": "valid_sha256_hash"
}

Request Flow

Normal Browser Request

┌─────────────┐
│  Browser    │
└──────┬──────┘
       │ HTTP GET /page
       ▼
┌──────────────────┐
│  Anubis Firewall │  ◄─── JavaScript Challenge
├──────────────────┤       (SHA-256 PoW)
│ Challenge System │
│ Cache            │
└──────┬───────────┘
       │ HTTP Request (with PoW token)
       ▼
┌──────────────────┐
│  Upstream Server │
└──────────────────┘

Blocked AI Crawler Request

┌─────────────┐
│  AI Crawler │
└──────┬──────┘
       │ HTTP GET /page
       ▼
┌──────────────────┐
│  Anubis Firewall │
├──────────────────┤
│ JavaScript       │
│ Not Executed ✗   │
└──────────────────┘
       ▼
    403 Forbidden (PoW Required)

Advanced Configuration

Rate Limiting Integration

rate_limit:
  enabled: true
  requests_per_second: 100
  burst: 10
  per_ip: true

challenge:
  difficulty: medium
  # Higher difficulty for repeated failures
  escalate_on_failure: true

Custom Challenge Difficulty

challenge:
  difficulty: "custom"
  custom_difficulty_bits: 18  # Adjust PoW difficulty in bits
  timeout: 600
  
  # Difficulty scaling based on time of day
  schedules:
    - time: "08:00-18:00"
      difficulty: "easy"
    - time: "18:00-08:00"
      difficulty: "hard"

Whitelist & Blacklist

acl:
  whitelist:
    - "203.0.113.0/24"        # Trusted networks
    - "user-agent:GoogleBot"  # Legitimate crawlers
    
  blacklist:
    - "1.2.3.4"               # Known bad IPs
    - "user-agent:BadBot"     # Known malicious bots
    
  # Whitelist never challenges
  # Blacklist always blocked
  challenges_required_for_others: true

HTTPS/TLS Configuration

server:
  listen: ":8443"
  use_tls: true
  
tls:
  cert_file: "/etc/anubis/cert.pem"
  key_file: "/etc/anubis/key.pem"
  # Auto-renew with Let's Encrypt
  auto_renew: true
  acme_email: "admin@example.com"

Deployment Examples

Nginx Reverse Proxy + Anubis

upstream anubis {
    server localhost:8080;
}

server {
    listen 80;
    server_name example.com;
    
    location / {
        proxy_pass http://anubis;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    }
}

Kubernetes Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: anubis
spec:
  replicas: 3
  selector:
    matchLabels:
      app: anubis
  template:
    metadata:
      labels:
        app: anubis
    spec:
      containers:
      - name: anubis
        image: techarohq/anubis:latest
        ports:
        - containerPort: 8080
        env:
        - name: UPSTREAM_URL
          value: "http://backend-service:3000"
        - name: CHALLENGE_DIFFICULTY
          value: "medium"
        resources:
          requests:
            memory: "256Mi"
            cpu: "250m"
          limits:
            memory: "512Mi"
            cpu: "500m"
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 10
          periodSeconds: 10

Docker Swarm Deployment

# Create Anubis service
docker service create \
  --name anubis \
  --publish 8080:8080 \
  --replicas 3 \
  --env UPSTREAM_URL=http://backend:3000 \
  --env CHALLENGE_DIFFICULTY=medium \
  techarohq/anubis:latest

# Scale service
docker service scale anubis=5

Monitoring & Observability

Health Check Endpoint

# Check Anubis status
curl http://localhost:8080/health

# Response
{
  "status": "healthy",
  "upstream_healthy": true,
  "challenges_served": 1542,
  "challenges_solved": 1389,
  "uptime_seconds": 86400
}

Metrics Endpoint

# Prometheus-compatible metrics
curl http://localhost:8080/metrics

# Output includes:
# anubis_requests_total
# anubis_challenges_issued
# anubis_challenges_solved
# anubis_upstream_latency_ms
# anubis_bot_requests_blocked

Logging Configuration

logging:
  level: "info"
  format: "json"
  
  # Log to file
  file:
    enabled: true
    path: "/var/log/anubis/anubis.log"
    max_size_mb: 100
    max_backups: 10
    
  # Structured logging
  fields:
    request_id: true
    user_agent: true
    remote_ip: true
    response_time: true
    upstream_latency: true

Performance Tuning

Connection Pooling

upstream:
  max_idle_conns: 200          # Increased from default
  max_conns_per_host: 100
  idle_conn_timeout: 90s

Challenge Caching

challenge:
  cache_size: 50000            # More cache for frequent users
  cache_ttl: 3600s
  cache_backend: "redis"       # Optional: use Redis for distributed

Request Optimization

server:
  read_timeout: 20s
  write_timeout: 20s
  idle_timeout: 30s
  
  # Gzip compression
  gzip:
    enabled: true
    level: 6
    min_size: 1024

Bot Detection & Blocking

User-Agent Based Rules

bot_detection:
  block_headless_browsers: true
  block_curl_wget: true
  
  block_agents:
    - "python-requests"
    - "scrapy"
    - "selenium"
    - "beautifulsoup"
    - "mechanize"
  
  allow_agents:
    - "googlebot"
    - "bingbot"
    - "applebot"

Behavioral Analysis

bot_detection:
  # Detect non-human-like behavior
  require_js_execution: true
  detect_headless: true
  
  # Patterns that trigger challenges
  patterns:
    rapid_requests: "10/second"
    sequential_urls: true
    missing_referer: true
    suspicious_headers: true

IP Reputation Integration

threat_intelligence:
  enabled: true
  
  # External threat feeds
  sources:
    - "abuseipdb"
    - "maxmind"
    - "custom_internal_feed"
  
  # Action on known bad IPs
  actions:
    reputation_score_above: 50
    action: "block"

Integration with Applications

Passing Challenge Info to Backend

headers_to_upstream:
  x-anubis-challenge-solved: true
  x-anubis-solved-timestamp: "2024-01-15T10:30:00Z"
  x-anubis-client-ip: true
  x-anubis-difficulty-level: true

Custom Headers in JavaScript

// Browser JavaScript can access challenge info
const challengeInfo = {
  solved: true,
  difficulty: "medium",
  duration_ms: 1234
};

// Send with next request
fetch(url, {
  headers: {
    'X-Challenge-Duration': challengeInfo.duration_ms
  }
});

Troubleshooting

High CPU Usage

# Reduce challenge difficulty
challenge:
  difficulty: "easy"
  
# Or increase cache size
challenge:
  cache_size: 100000

# Check upstream performance
upstream:
  timeout: 60s  # Increase if backend slow

Challenge Failures

# Enable debug logging
LOG_LEVEL=debug anubis

# Check JavaScript delivery
curl -v http://localhost:8080/

# Verify challenge endpoint
curl http://localhost:8080/challenge/verify

Backend Timeout Issues

upstream:
  timeout: 60s
  keepalive_timeout: 120s
  
server:
  read_timeout: 30s
  write_timeout: 30s

Redis Cache Issues (if enabled)

# Check Redis connection
redis-cli PING

# Monitor cache
redis-cli MONITOR

# Clear cache if needed
redis-cli FLUSHDB

Best Practices

Security

Always use HTTPS for production deployments
Whitelist legitimate crawlers if needed (e.g., Google, Bing)
Monitor challenge metrics for anomalies
Rotate TLS certificates regularly
Keep upstream secret - don’t expose in logs/errors

Performance

Use Redis for distributed deployments
Implement proper health checks in load balancers
Cache aggressively - challenges are stateless
Monitor upstream latency - Anubis is lightweight
Scale horizontally - stateless design supports it

Operations

# Monitor challenge rate
curl http://localhost:8080/metrics | grep challenges

# Check error rates
curl http://localhost:8080/metrics | grep errors

# Validate config before deployment
anubis -config config.yaml -validate

FAQ

Q: Does Anubis block legitimate users?
A: No. Modern browsers execute JavaScript seamlessly. Only headless browsers and CLI tools are challenged.

Q: What about accessibility?
A: Implement fallback mechanisms for users who can’t complete challenges (optional contact form).

Q: Can it block specific content?
A: Anubis only protects with PoW challenges. Use WAF/firewall rules for content blocking.

Q: Performance impact?
A: Minimal (~1-2ms latency). Challenge computation happens client-side.

Resources

GitHub: https://github.com/TecharoHQ/anubis
Documentation: https://anubis.techarohq.dev
Issue Tracker: https://github.com/TecharoHQ/anubis/issues
Discussions: https://github.com/TecharoHQ/anubis/discussions

Cloudflare Challenges (proprietary)
AWS WAF (managed service)
Nginx ModSecurity (open-source WAF)
Datadome (bot management)