Saltar a contenido

SpiderFoot Cheat Sheet

"Clase de la hoja" id="copy-btn" class="copy-btn" onclick="copyAllCommands()" Copiar todos los comandos id="pdf-btn" class="pdf-btn" onclick="generatePDF()" Generar PDF seleccionado/button ■/div titulada

Sinopsis

SpiderFoot es una herramienta de automatización de inteligencia de código abierto (OSINT) que realiza reconocimiento e información reuniendo objetivos como direcciones IP, nombres de dominio, direcciones de correo electrónico y nombres. Se integra con más de 200 fuentes de datos para recopilar inteligencia e identificar riesgos de seguridad, lo que lo convierte en un instrumento esencial para los testadores de penetración, investigadores de seguridad y analistas de inteligencia de amenazas.

NOVEDAD Nota: Herramienta de código abierto con edición HX comercial disponible. Siempre asegúrese de tener una autorización adecuada antes de escanear objetivos.

Instalación y configuración

Métodos de instalación

# Method 1: Install from PyPI
pip3 install spiderfoot

# Method 2: Install from GitHub (latest development version)
git clone https://github.com/smicallef/spiderfoot.git
cd spiderfoot
pip3 install -r requirements.txt

# Method 3: Docker installation
docker pull spiderfoot/spiderfoot
docker run -p 5001:5001 spiderfoot/spiderfoot

# Method 4: Using package managers
# Ubuntu/Debian
sudo apt update && sudo apt install spiderfoot

# Arch Linux
yay -S spiderfoot

# macOS with Homebrew
brew install spiderfoot

Configuración inicial

# Create configuration directory
mkdir -p ~/.spiderfoot
cd ~/.spiderfoot

# Generate default configuration
spiderfoot -C

# Edit configuration file
nano spiderfoot.conf

# Key configuration options:
# - __webaddr: Web interface bind address (default: 127.0.0.1)
# - __webport: Web interface port (default: 5001)
# - __database: Database file location
# - __logfile: Log file location
# - __modules: Module directory path

Configuración de claves de API

# Configure API keys for enhanced data sources
# Edit ~/.spiderfoot/spiderfoot.conf

# Common API configurations:
[api_keys]
# VirusTotal API key
virustotal_api_key = your_virustotal_api_key

# Shodan API key
shodan_api_key = your_shodan_api_key

# Have I Been Pwned API key
hibp_api_key = your_hibp_api_key

# SecurityTrails API key
securitytrails_api_key = your_securitytrails_api_key

# PassiveTotal API key
passivetotal_api_key = your_passivetotal_api_key
passivetotal_username = your_passivetotal_username

# AlienVault OTX API key
otx_api_key = your_otx_api_key

# Censys API credentials
censys_api_id = your_censys_api_id
censys_api_secret = your_censys_api_secret

# Hunter.io API key
hunter_api_key = your_hunter_api_key

# Clearbit API key
clearbit_api_key = your_clearbit_api_key

Uso de la línea de comandos

Operaciones básicas de exploración

# Basic domain scan
spiderfoot -s example.com -t DOMAIN

# IP address scan
spiderfoot -s 192.168.1.1 -t IP_ADDRESS

# Email address investigation
spiderfoot -s user@example.com -t EMAILADDR

# Human name investigation
spiderfoot -s "John Smith" -t HUMAN_NAME

# Multiple targets scan
spiderfoot -s "example.com,192.168.1.1" -t DOMAIN,IP_ADDRESS

# Scan with specific modules only
spiderfoot -s example.com -t DOMAIN -m sfp_dnsresolve,sfp_whois,sfp_virustotal

# Exclude specific modules
spiderfoot -s example.com -t DOMAIN -x sfp_social,sfp_pgp

Opciones avanzadas de exploración

# Passive scan only (no active probing)
spiderfoot -s example.com -t DOMAIN -p

# Scan with custom user agent
spiderfoot -s example.com -t DOMAIN -u "Mozilla/5.0 Custom Agent"

# Scan with proxy
spiderfoot -s example.com -t DOMAIN -y http://proxy.example.com:8080

# Scan with custom timeout
spiderfoot -s example.com -t DOMAIN -w 30

# Scan with maximum threads
spiderfoot -s example.com -t DOMAIN -T 10

# Scan with output to file
spiderfoot -s example.com -t DOMAIN -o json > scan_results.json

# Scan with specific data types
spiderfoot -s example.com -t DOMAIN -f SUBDOMAIN,IP_ADDRESS,EMAILADDR

Scan Management

# List available modules
spiderfoot -M

# Get module information
spiderfoot -M sfp_virustotal

# List available data types
spiderfoot -F

# List running scans
spiderfoot -l

# Stop a running scan
spiderfoot -q scan_id

# Delete scan data
spiderfoot -d scan_id

# Export scan results
spiderfoot -e scan_id -o json > results.json
spiderfoot -e scan_id -o csv > results.csv
spiderfoot -e scan_id -o xml > results.xml

Uso de la interfaz web

Inicio de la Interfaz Web

# Start web interface (default: http://127.0.0.1:5001)
spiderfoot -l 127.0.0.1:5001

# Start web interface on all interfaces
spiderfoot -l 0.0.0.0:5001

# Start with custom configuration
spiderfoot -c /path/to/custom.conf -l 127.0.0.1:5001

# Start in background
nohup spiderfoot -l 127.0.0.1:5001 &

# Start with Docker
docker run -p 5001:5001 -d spiderfoot/spiderfoot

Web Interface Navegación

# Access web interface
# Navigate to http://localhost:5001

# Main sections:
# - New Scan: Create new scans
# - Scans: View and manage existing scans
# - Browse: Browse scan results by data type
# - Search: Search across all scan data
# - Settings: Configure modules and API keys
# - About: System information and statistics

# Scan creation workflow:
# 1. Enter target (domain, IP, email, etc.)
# 2. Select scan modules or use presets
# 3. Configure scan options
# 4. Start scan and monitor progress
# 5. Review results and export data

Prestaciones de configuración de exploración

{
  "scan_presets": {
    "passive_recon": {
      "description": "Passive reconnaissance without active probing",
      "modules": [
        "sfp_dnsresolve",
        "sfp_whois",
        "sfp_virustotal",
        "sfp_shodan",
        "sfp_censys",
        "sfp_passivetotal",
        "sfp_securitytrails",
        "sfp_threatcrowd",
        "sfp_otx"
      ],
      "passive_only": true
    },
    "comprehensive_domain": {
      "description": "Comprehensive domain investigation",
      "modules": [
        "sfp_dnsresolve",
        "sfp_dnsbrute",
        "sfp_whois",
        "sfp_virustotal",
        "sfp_shodan",
        "sfp_censys",
        "sfp_subdomain_enum",
        "sfp_ssl_analyze",
        "sfp_port_scan",
        "sfp_web_crawl"
      ]
    },
    "email_investigation": {
      "description": "Email address and person investigation",
      "modules": [
        "sfp_hunter",
        "sfp_clearbit",
        "sfp_hibp",
        "sfp_social",
        "sfp_pgp",
        "sfp_gravatar",
        "sfp_fullcontact",
        "sfp_pipl"
      ]
    },
    "threat_intelligence": {
      "description": "Threat intelligence gathering",
      "modules": [
        "sfp_virustotal",
        "sfp_otx",
        "sfp_threatcrowd",
        "sfp_malwaredomains",
        "sfp_reputation",
        "sfp_blacklist",
        "sfp_phishing",
        "sfp_malware"
      ]
    }
  }
}

Configuración y personalización del módulo

Resumen de los módulos básicos

# DNS and Domain Modules
sfp_dnsresolve      # DNS resolution and record enumeration
sfp_dnsbrute        # DNS subdomain brute forcing
sfp_whois           # WHOIS information gathering
sfp_subdomain_enum  # Subdomain enumeration from various sources

# Network and Infrastructure
sfp_port_scan       # Port scanning and service detection
sfp_ssl_analyze     # SSL/TLS certificate analysis
sfp_banner_grab     # Service banner grabbing
sfp_traceroute      # Network path tracing

# Threat Intelligence
sfp_virustotal      # VirusTotal integration
sfp_otx             # AlienVault OTX integration
sfp_threatcrowd     # ThreatCrowd integration
sfp_reputation      # Reputation checking across sources

# Search Engines and OSINT
sfp_google          # Google search integration
sfp_bing            # Bing search integration
sfp_shodan          # Shodan integration
sfp_censys          # Censys integration

# Social Media and People
sfp_social          # Social media profile discovery
sfp_hunter          # Email address discovery
sfp_clearbit        # Company and person enrichment
sfp_hibp            # Have I Been Pwned integration

# Web Application
sfp_web_crawl       # Web application crawling
sfp_web_analyze     # Web technology identification
sfp_robots          # robots.txt analysis
sfp_sitemap         # Sitemap discovery and analysis

Desarrollo de módulos personalizados

# Example custom module: sfp_custom_example.py
import re
import requests
from spiderfoot import SpiderFootEvent, SpiderFootPlugin

class sfp_custom_example(SpiderFootPlugin):
    """Custom SpiderFoot module example"""

    meta = {
        'name': "Custom Example Module",
        'summary': "Example custom module for demonstration",
        'flags': [""],
        'useCases': ["Investigate", "Passive"],
        'categories': ["Search Engines"],
        'dataSource': {
            'website': "https://example.com",
            'model': "FREE_NOAUTH_UNLIMITED",
            'references': ["https://example.com/api"],
            'favIcon': "https://example.com/favicon.ico",
            'logo': "https://example.com/logo.png",
            'description': "Example data source for custom module"
        }
    }

    # Default options
    opts = {
        'api_key': '',
        'max_results': 100,
        'timeout': 30
    }

    # Option descriptions
    optdescs = {
        'api_key': "API key for the service",
        'max_results': "Maximum number of results to return",
        'timeout': "Request timeout in seconds"
    }

    # What events this module accepts for input
    events = {
        'DOMAIN_NAME': ['SUBDOMAIN', 'EMAILADDR'],
        'IP_ADDRESS': ['GEOINFO', 'NETBLOCK_OWNER']
    }

    def setup(self, sfc, userOpts=dict()):
        self.sf = sfc
        self.results = self.tempStorage()

        # Override default options with user settings
        for opt in list(userOpts.keys()):
            self.opts[opt] = userOpts[opt]

    def watchedEvents(self):
        """Events this module will accept as input"""
        return list(self.events.keys())

    def producedEvents(self):
        """Events this module will produce"""
        evts = []
        for eventType in self.events:
            evts.extend(self.events[eventType])
        return evts

    def handleEvent(self, event):
        """Handle incoming events"""
        eventName = event.eventType
        srcModuleName = event.module
        eventData = event.data

        # Don't process events from ourselves
        if srcModuleName == "sfp_custom_example":
            return

        # Check if we've already processed this data
        if eventData in self.results:
            return

        self.results[eventData] = True

        self.sf.debug(f"Received event, {eventName}, from {srcModuleName}")

        # Process different event types
        if eventName == 'DOMAIN_NAME':
            self.processDomain(eventData, event)
        elif eventName == 'IP_ADDRESS':
            self.processIP(eventData, event)

    def processDomain(self, domain, parentEvent):
        """Process domain name events"""
        try:
            # Example: Query custom API for domain information
            url = f"https://api.example.com/domain/{domain}"
            headers = {'Authorization': f'Bearer {self.opts["api_key"]}'}

            response = requests.get(
                url, 
                headers=headers, 
                timeout=self.opts['timeout']
            )

            if response.status_code == 200:
                data = response.json()

                # Extract subdomains
                if 'subdomains' in data:
                    for subdomain in data['subdomains'][:self.opts['max_results']]:
                        evt = SpiderFootEvent(
                            'SUBDOMAIN', 
                            subdomain, 
                            self.__name__, 
                            parentEvent
                        )
                        self.notifyListeners(evt)

                # Extract email addresses
                if 'emails' in data:
                    for email in data['emails'][:self.opts['max_results']]:
                        evt = SpiderFootEvent(
                            'EMAILADDR', 
                            email, 
                            self.__name__, 
                            parentEvent
                        )
                        self.notifyListeners(evt)

        except Exception as e:
            self.sf.error(f"Error processing domain {domain}: {str(e)}")

    def processIP(self, ip, parentEvent):
        """Process IP address events"""
        try:
            # Example: Query custom API for IP information
            url = f"https://api.example.com/ip/{ip}"
            headers = {'Authorization': f'Bearer {self.opts["api_key"]}'}

            response = requests.get(
                url, 
                headers=headers, 
                timeout=self.opts['timeout']
            )

            if response.status_code == 200:
                data = response.json()

                # Extract geolocation info
                if 'location' in data:
                    location = f"{data['location']['city']}, {data['location']['country']}"
                    evt = SpiderFootEvent(
                        'GEOINFO', 
                        location, 
                        self.__name__, 
                        parentEvent
                    )
                    self.notifyListeners(evt)

        except Exception as e:
            self.sf.error(f"Error processing IP {ip}: {str(e)}")

# Install custom module:
# 1. Save as sfp_custom_example.py in modules/ directory
# 2. Restart SpiderFoot
# 3. Module will appear in available modules list

Archivos de configuración del módulo

# ~/.spiderfoot/modules.conf
# Module-specific configuration

[sfp_virustotal]
api_key = your_virustotal_api_key
max_results = 100
timeout = 30

[sfp_shodan]
api_key = your_shodan_api_key
max_results = 50
timeout = 20

[sfp_hunter]
api_key = your_hunter_api_key
max_results = 25
verify_emails = true

[sfp_censys]
api_id = your_censys_api_id
api_secret = your_censys_api_secret
max_results = 100

[sfp_passivetotal]
api_key = your_passivetotal_api_key
username = your_passivetotal_username
max_results = 200

[sfp_hibp]
api_key = your_hibp_api_key
check_pastes = true
truncate_response = false

Técnicas avanzadas de OSINT

Investigación integral del dominio

# Multi-stage domain investigation
# Stage 1: Passive reconnaissance
spiderfoot -s target.com -t DOMAIN -m sfp_dnsresolve,sfp_whois,sfp_virustotal,sfp_passivetotal -p

# Stage 2: Subdomain enumeration
spiderfoot -s target.com -t DOMAIN -m sfp_dnsbrute,sfp_subdomain_enum,sfp_crt,sfp_google

# Stage 3: Infrastructure analysis
spiderfoot -s target.com -t DOMAIN -m sfp_port_scan,sfp_ssl_analyze,sfp_banner_grab,sfp_web_analyze

# Stage 4: Threat intelligence
spiderfoot -s target.com -t DOMAIN -m sfp_reputation,sfp_blacklist,sfp_malware,sfp_phishing

# Combined comprehensive scan
spiderfoot -s target.com -t DOMAIN \
  -m sfp_dnsresolve,sfp_dnsbrute,sfp_whois,sfp_virustotal,sfp_shodan,sfp_censys,sfp_passivetotal,sfp_subdomain_enum,sfp_ssl_analyze,sfp_port_scan,sfp_web_crawl,sfp_reputation \
  -f SUBDOMAIN,IP_ADDRESS,EMAILADDR,SSL_CERTIFICATE,WEBSERVER_TECHNOLOGY,VULNERABILITY

Email and Person Investigation

# Email address investigation
spiderfoot -s user@target.com -t EMAILADDR \
  -m sfp_hunter,sfp_clearbit,sfp_hibp,sfp_social,sfp_pgp,sfp_gravatar,sfp_fullcontact \
  -f EMAILADDR,SOCIAL_MEDIA,PHONE_NUMBER,PHYSICAL_ADDRESS,BREACH_DATA

# Person name investigation
spiderfoot -s "John Smith" -t HUMAN_NAME \
  -m sfp_social,sfp_pipl,sfp_fullcontact,sfp_clearbit,sfp_google,sfp_bing \
  -f SOCIAL_MEDIA,EMAILADDR,PHONE_NUMBER,PHYSICAL_ADDRESS,COMPANY_NAME

# Company investigation
spiderfoot -s "Example Corp" -t COMPANY_NAME \
  -m sfp_clearbit,sfp_hunter,sfp_google,sfp_bing,sfp_social \
  -f EMAILADDR,DOMAIN_NAME,PHYSICAL_ADDRESS,PHONE_NUMBER,SOCIAL_MEDIA

IP Address and Network Analysis

# IP address investigation
spiderfoot -s 192.168.1.1 -t IP_ADDRESS \
  -m sfp_shodan,sfp_censys,sfp_virustotal,sfp_reputation,sfp_geoip,sfp_port_scan \
  -f GEOINFO,NETBLOCK_OWNER,OPEN_TCP_PORT,WEBSERVER_TECHNOLOGY,VULNERABILITY

# Network range investigation
spiderfoot -s 192.168.1.0/24 -t NETBLOCK_OWNER \
  -m sfp_shodan,sfp_censys,sfp_port_scan,sfp_banner_grab \
  -f IP_ADDRESS,OPEN_TCP_PORT,WEBSERVER_TECHNOLOGY,OPERATING_SYSTEM

# ASN investigation
spiderfoot -s AS12345 -t BGP_AS_OWNER \
  -m sfp_bgpview,sfp_shodan,sfp_censys \
  -f NETBLOCK_OWNER,IP_ADDRESS,DOMAIN_NAME

Social Media and Dark Web Investigation

# Social media investigation
spiderfoot -s target.com -t DOMAIN \
  -m sfp_social,sfp_twitter,sfp_linkedin,sfp_facebook,sfp_instagram \
  -f SOCIAL_MEDIA,EMAILADDR,HUMAN_NAME,PHONE_NUMBER

# Dark web monitoring
spiderfoot -s target.com -t DOMAIN \
  -m sfp_darkweb,sfp_onion,sfp_pastebin,sfp_leakdb \
  -f DARKWEB_MENTION,BREACH_DATA,LEAKED_DOCUMENT,PASTE_SITE

# Cryptocurrency investigation
spiderfoot -s 1A1zP1eP5QGefi2DMPTfTL5SLmv7DivfNa -t BITCOIN_ADDRESS \
  -m sfp_blockchain,sfp_bitcoinabuse,sfp_cryptocurrency \
  -f BITCOIN_ADDRESS,CRYPTOCURRENCY_ADDRESS,MALICIOUS_CRYPTOCURRENCY

Integración y automatización de API

Python API Usage

# SpiderFoot Python API integration
import requests
import json
import time
from typing import Dict, List, Optional

class SpiderFootAPI:
    def __init__(self, base_url: str = "http://localhost:5001"):
        self.base_url = base_url.rstrip('/')
        self.session = requests.Session()

    def start_scan(self, target: str, target_type: str, modules: List[str] = None) -> str:
        """Start a new scan and return scan ID"""

        scan_data = {
            'scanname': f"API Scan - {target}",
            'scantarget': target,
            'targettype': target_type,
            'modulelist': ','.join(modules) if modules else '',
            'typelist': 'all'
        }

        response = self.session.post(
            f"{self.base_url}/startscan",
            data=scan_data
        )

        if response.status_code == 200:
            result = response.json()
            return result.get('id')
        else:
            raise Exception(f"Failed to start scan: {response.text}")

    def get_scan_status(self, scan_id: str) -> Dict:
        """Get scan status and progress"""
        response = self.session.get(f"{self.base_url}/scanstatus?id={scan_id}")

        if response.status_code == 200:
            return response.json()
        else:
            raise Exception(f"Failed to get scan status: {response.text}")

    def wait_for_scan_completion(self, scan_id: str, timeout: int = 3600) -> Dict:
        """Wait for scan to complete with timeout"""
        start_time = time.time()

        while time.time() - start_time < timeout:
            status = self.get_scan_status(scan_id)

            if status.get('status') in ['FINISHED', 'ABORTED', 'FAILED']:
                return status

            print(f"Scan progress: {status.get('status')} - {status.get('progress', 0)}%")
            time.sleep(30)

        raise TimeoutError("Scan did not complete within timeout")

    def get_scan_results(self, scan_id: str, data_type: str = None) -> List[Dict]:
        """Get scan results, optionally filtered by data type"""
        params = {'id': scan_id}
        if data_type:
            params['type'] = data_type

        response = self.session.get(f"{self.base_url}/scaneventresults", params=params)

        if response.status_code == 200:
            return response.json()
        else:
            raise Exception(f"Failed to get scan results: {response.text}")

    def export_scan_results(self, scan_id: str, format: str = 'json') -> str:
        """Export scan results in specified format"""
        params = {
            'id': scan_id,
            'type': format
        }

        response = self.session.get(f"{self.base_url}/scaneventresultexport", params=params)

        if response.status_code == 200:
            return response.text
        else:
            raise Exception(f"Failed to export scan results: {response.text}")

    def delete_scan(self, scan_id: str) -> bool:
        """Delete scan and its data"""
        response = self.session.get(f"{self.base_url}/scandelete?id={scan_id}")

        return response.status_code == 200

    def get_available_modules(self) -> Dict:
        """Get list of available modules"""
        response = self.session.get(f"{self.base_url}/modules")

        if response.status_code == 200:
            return response.json()
        else:
            raise Exception(f"Failed to get modules: {response.text}")

    def bulk_scan(self, targets: List[Dict], wait_for_completion: bool = True) -> List[Dict]:
        """Perform bulk scanning of multiple targets"""
        scan_results = []

        for target_info in targets:
            target = target_info['target']
            target_type = target_info['type']
            modules = target_info.get('modules', [])

            print(f"Starting scan for {target}")

            try:
                scan_id = self.start_scan(target, target_type, modules)

                scan_info = {
                    'target': target,
                    'scan_id': scan_id,
                    'status': 'started'
                }

                if wait_for_completion:
                    final_status = self.wait_for_scan_completion(scan_id)
                    scan_info['status'] = final_status.get('status')
                    scan_info['results'] = self.get_scan_results(scan_id)

                scan_results.append(scan_info)

            except Exception as e:
                print(f"Failed to scan {target}: {str(e)}")
                scan_results.append({
                    'target': target,
                    'status': 'failed',
                    'error': str(e)
                })

        return scan_results

# Usage example
api = SpiderFootAPI("http://localhost:5001")

# Single target scan
scan_id = api.start_scan("example.com", "DOMAIN", ["sfp_dnsresolve", "sfp_whois", "sfp_virustotal"])
print(f"Started scan: {scan_id}")

# Wait for completion
final_status = api.wait_for_scan_completion(scan_id)
print(f"Scan completed: {final_status['status']}")

# Get results
results = api.get_scan_results(scan_id)
print(f"Found {len(results)} events")

# Export results
json_results = api.export_scan_results(scan_id, 'json')
with open(f"scan_{scan_id}_results.json", 'w') as f:
    f.write(json_results)

# Bulk scanning
targets = [
    {'target': 'example.com', 'type': 'DOMAIN', 'modules': ['sfp_dnsresolve', 'sfp_whois']},
    {'target': 'test.com', 'type': 'DOMAIN', 'modules': ['sfp_dnsresolve', 'sfp_virustotal']},
    {'target': '192.168.1.1', 'type': 'IP_ADDRESS', 'modules': ['sfp_shodan', 'sfp_geoip']}
]

bulk_results = api.bulk_scan(targets)
print(f"Completed {len(bulk_results)} scans")

Informes y análisis automatizados

# Automated SpiderFoot reporting and analysis
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from collections import defaultdict, Counter
import re

class SpiderFootAnalyzer:
    def __init__(self, api: SpiderFootAPI):
        self.api = api

    def analyze_scan_results(self, scan_id: str) -> Dict:
        """Comprehensive analysis of scan results"""

        # Get all scan results
        results = self.api.get_scan_results(scan_id)

        # Convert to DataFrame for analysis
        df = pd.DataFrame(results)

        # Basic statistics
        stats = {
            'total_events': len(results),
            'unique_data_types': df['type'].nunique(),
            'data_type_distribution': df['type'].value_counts().to_dict(),
            'module_distribution': df['module'].value_counts().to_dict(),
            'confidence_distribution': df['confidence'].value_counts().to_dict()
        }

        # Risk analysis
        risk_analysis = self.analyze_risk_indicators(results)

        # Network analysis
        network_analysis = self.analyze_network_data(results)

        # Email and person analysis
        person_analysis = self.analyze_person_data(results)

        # Threat intelligence analysis
        threat_analysis = self.analyze_threat_intelligence(results)

        return {
            'statistics': stats,
            'risk_analysis': risk_analysis,
            'network_analysis': network_analysis,
            'person_analysis': person_analysis,
            'threat_analysis': threat_analysis
        }

    def analyze_risk_indicators(self, results: List[Dict]) -> Dict:
        """Analyze risk indicators from scan results"""

        risk_indicators = {
            'high_risk': [],
            'medium_risk': [],
            'low_risk': [],
            'informational': []
        }

        # Define risk patterns
        high_risk_patterns = [
            'VULNERABILITY',
            'MALWARE',
            'BLACKLISTED',
            'BREACH_DATA',
            'DARKWEB_MENTION'
        ]

        medium_risk_patterns = [
            'OPEN_TCP_PORT',
            'SSL_CERTIFICATE_EXPIRED',
            'WEBSERVER_TECHNOLOGY',
            'OPERATING_SYSTEM'
        ]

        for result in results:
            data_type = result.get('type', '')
            data_value = result.get('data', '')

            if any(pattern in data_type for pattern in high_risk_patterns):
                risk_indicators['high_risk'].append(result)
            elif any(pattern in data_type for pattern in medium_risk_patterns):
                risk_indicators['medium_risk'].append(result)
            elif data_type in ['SUBDOMAIN', 'IP_ADDRESS', 'EMAILADDR']:
                risk_indicators['low_risk'].append(result)
            else:
                risk_indicators['informational'].append(result)

        # Calculate risk score
        risk_score = (
            len(risk_indicators['high_risk']) * 10 +
            len(risk_indicators['medium_risk']) * 5 +
            len(risk_indicators['low_risk']) * 1
        )

        return {
            'risk_score': risk_score,
            'risk_indicators': risk_indicators,
            'risk_summary': {
                'high_risk_count': len(risk_indicators['high_risk']),
                'medium_risk_count': len(risk_indicators['medium_risk']),
                'low_risk_count': len(risk_indicators['low_risk']),
                'informational_count': len(risk_indicators['informational'])
            }
        }

    def analyze_network_data(self, results: List[Dict]) -> Dict:
        """Analyze network-related data"""

        network_data = {
            'subdomains': [],
            'ip_addresses': [],
            'open_ports': [],
            'ssl_certificates': [],
            'web_technologies': []
        }

        for result in results:
            data_type = result.get('type', '')
            data_value = result.get('data', '')

            if data_type == 'SUBDOMAIN':
                network_data['subdomains'].append(data_value)
            elif data_type == 'IP_ADDRESS':
                network_data['ip_addresses'].append(data_value)
            elif data_type == 'OPEN_TCP_PORT':
                network_data['open_ports'].append(data_value)
            elif data_type == 'SSL_CERTIFICATE':
                network_data['ssl_certificates'].append(data_value)
            elif data_type == 'WEBSERVER_TECHNOLOGY':
                network_data['web_technologies'].append(data_value)

        # Analysis
        analysis = {
            'subdomain_count': len(set(network_data['subdomains'])),
            'ip_count': len(set(network_data['ip_addresses'])),
            'unique_ports': list(set([port.split(':')[-1] for port in network_data['open_ports']])),
            'technology_stack': Counter(network_data['web_technologies']),
            'attack_surface': {
                'external_subdomains': len(set(network_data['subdomains'])),
                'exposed_services': len(network_data['open_ports']),
                'ssl_endpoints': len(network_data['ssl_certificates'])
            }
        }

        return analysis

    def analyze_person_data(self, results: List[Dict]) -> Dict:
        """Analyze person and email related data"""

        person_data = {
            'email_addresses': [],
            'social_media': [],
            'phone_numbers': [],
            'physical_addresses': [],
            'breach_data': []
        }

        for result in results:
            data_type = result.get('type', '')
            data_value = result.get('data', '')

            if data_type == 'EMAILADDR':
                person_data['email_addresses'].append(data_value)
            elif data_type == 'SOCIAL_MEDIA':
                person_data['social_media'].append(data_value)
            elif data_type == 'PHONE_NUMBER':
                person_data['phone_numbers'].append(data_value)
            elif data_type == 'PHYSICAL_ADDRESS':
                person_data['physical_addresses'].append(data_value)
            elif data_type == 'BREACH_DATA':
                person_data['breach_data'].append(data_value)

        # Email domain analysis
        email_domains = [email.split('@')[1] for email in person_data['email_addresses'] if '@' in email]

        analysis = {
            'email_count': len(set(person_data['email_addresses'])),
            'email_domains': Counter(email_domains),
            'social_media_count': len(person_data['social_media']),
            'breach_exposure': len(person_data['breach_data']),
            'contact_info_exposure': {
                'emails': len(person_data['email_addresses']),
                'phones': len(person_data['phone_numbers']),
                'addresses': len(person_data['physical_addresses'])
            }
        }

        return analysis

    def generate_report(self, scan_id: str, output_file: str = None) -> str:
        """Generate comprehensive HTML report"""

        analysis = self.analyze_scan_results(scan_id)

        html_report = f"""
        <!DOCTYPE html>
        <html>
        <head>
            <title>SpiderFoot Scan Analysis Report</title>
            <style>
                body {{ font-family: Arial, sans-serif; margin: 40px; }}
                .header {{ background-color: #2c3e50; color: white; padding: 20px; }}
                .section {{ margin: 20px 0; padding: 15px; border-left: 4px solid #3498db; }}
                .risk-high {{ border-left-color: #e74c3c; }}
                .risk-medium {{ border-left-color: #f39c12; }}
                .risk-low {{ border-left-color: #27ae60; }}
                table {{ border-collapse: collapse; width: 100%; }}
                th, td {{ border: 1px solid #ddd; padding: 8px; text-align: left; }}
                th {{ background-color: #f2f2f2; }}
            </style>
        </head>
        <body>
            <div class="header">
                <h1>SpiderFoot Scan Analysis Report</h1>
                <p>Scan ID: {scan_id}</p>
                <p>Generated: {pd.Timestamp.now().strftime('%Y-%m-%d %H:%M:%S')}</p>
            </div>

            <div class="section">
                <h2>Executive Summary</h2>
                <p><strong>Risk Score:</strong> {analysis['risk_analysis']['risk_score']}</p>
                <p><strong>Total Events:</strong> {analysis['statistics']['total_events']}</p>
                <p><strong>High Risk Findings:</strong> {analysis['risk_analysis']['risk_summary']['high_risk_count']}</p>
                <p><strong>Attack Surface:</strong> {analysis['network_analysis']['attack_surface']['external_subdomains']} subdomains, {analysis['network_analysis']['attack_surface']['exposed_services']} exposed services</p>
            </div>

            <div class="section risk-high">
                <h2>High Risk Findings</h2>
                <p>Found {analysis['risk_analysis']['risk_summary']['high_risk_count']} high-risk indicators requiring immediate attention.</p>
            </div>

            <div class="section">
                <h2>Network Analysis</h2>
                <table>
                    <tr><th>Metric</th><th>Count</th></tr>
                    <tr><td>Unique Subdomains</td><td>{analysis['network_analysis']['subdomain_count']}</td></tr>
                    <tr><td>IP Addresses</td><td>{analysis['network_analysis']['ip_count']}</td></tr>
                    <tr><td>Exposed Services</td><td>{analysis['network_analysis']['attack_surface']['exposed_services']}</td></tr>
                    <tr><td>SSL Endpoints</td><td>{analysis['network_analysis']['attack_surface']['ssl_endpoints']}</td></tr>
                </table>
            </div>

            <div class="section">
                <h2>Data Exposure Analysis</h2>
                <table>
                    <tr><th>Data Type</th><th>Count</th></tr>
                    <tr><td>Email Addresses</td><td>{analysis['person_analysis']['email_count']}</td></tr>
                    <tr><td>Social Media Profiles</td><td>{analysis['person_analysis']['social_media_count']}</td></tr>
                    <tr><td>Breach Exposures</td><td>{analysis['person_analysis']['breach_exposure']}</td></tr>
                </table>
            </div>

            <div class="section">
                <h2>Recommendations</h2>
                <ul>
                    <li>Review and remediate high-risk findings immediately</li>
                    <li>Implement subdomain monitoring for {analysis['network_analysis']['subdomain_count']} discovered subdomains</li>
                    <li>Secure exposed services and unnecessary open ports</li>
                    <li>Monitor for data breaches affecting discovered email addresses</li>
                    <li>Implement security awareness training for exposed personnel</li>
                </ul>
            </div>
        </body>
        </html>
        """

        if output_file:
            with open(output_file, 'w') as f:
                f.write(html_report)
            return output_file
        else:
            return html_report

# Usage example
api = SpiderFootAPI()
analyzer = SpiderFootAnalyzer(api)

# Analyze scan results
analysis = analyzer.analyze_scan_results("scan_123")
print(f"Risk Score: {analysis['risk_analysis']['risk_score']}")
print(f"High Risk Findings: {analysis['risk_analysis']['risk_summary']['high_risk_count']}")

# Generate report
report_file = analyzer.generate_report("scan_123", "spiderfoot_analysis_report.html")
print(f"Report generated: {report_file}")

Integración con otras herramientas

Integración con Metasploit

# Export SpiderFoot results for Metasploit
| spiderfoot -e scan_id -o csv | grep "IP_ADDRESS\ | OPEN_TCP_PORT" > targets.csv |

# Convert to Metasploit workspace format
python3 << 'EOF'
import csv
import xml.etree.ElementTree as ET

# Create Metasploit XML import format
root = ET.Element("nmaprun")
hosts = {}

with open('targets.csv', 'r') as f:
    reader = csv.DictReader(f)
    for row in reader:
        if row['type'] == 'IP_ADDRESS':
            ip = row['data']
            if ip not in hosts:
                hosts[ip] = {'ports': []}
        elif row['type'] == 'OPEN_TCP_PORT':
            ip, port = row['data'].split(':')
            if ip in hosts:
                hosts[ip]['ports'].append(port)

for ip, data in hosts.items():
    host = ET.SubElement(root, "host")
    address = ET.SubElement(host, "address")
    address.set("addr", ip)
    address.set("addrtype", "ipv4")

    ports_elem = ET.SubElement(host, "ports")
    for port in data['ports']:
        port_elem = ET.SubElement(ports_elem, "port")
        port_elem.set("portid", port)
        port_elem.set("protocol", "tcp")

        state = ET.SubElement(port_elem, "state")
        state.set("state", "open")

tree = ET.ElementTree(root)
tree.write("spiderfoot_targets.xml")
print("Metasploit import file created: spiderfoot_targets.xml")
EOF

# Import into Metasploit
msfconsole -q -x "
workspace -a spiderfoot_scan;
db_import spiderfoot_targets.xml;
hosts;
services;
exit"

Integración con Nmap

# Extract IP addresses and ports for Nmap scanning
| spiderfoot -e scan_id -o json | jq -r '.[] | select(.type=="IP_ADDRESS") | .data' | sort -u > ips.txt |
| spiderfoot -e scan_id -o json | jq -r '.[] | select(.type=="OPEN_TCP_PORT") | .data | split(":")[1]' | sort -u > ports.txt |

# Perform targeted Nmap scan
| nmap -iL ips.txt -p $(cat ports.txt | tr '\n' ',' | sed 's/,$//') -sV -sC -oA spiderfoot_nmap |

# Combine results
| echo "SpiderFoot discovered $(cat ips.txt | wc -l) IP addresses and $(cat ports.txt | wc -l) unique ports" |
echo "Nmap scan results saved to spiderfoot_nmap.*"

Integración con TheHarvester

# Use SpiderFoot domains with TheHarvester
| spiderfoot -e scan_id -o json | jq -r '.[] | select(.type=="DOMAIN_NAME") | .data' | sort -u > domains.txt |

# Run TheHarvester on discovered domains
while read domain; do
    echo "Harvesting $domain..."
    theHarvester -d "$domain" -b all -f "${domain}_harvest"
done < domains.txt

# Combine email results
| cat *_harvest.json | jq -r '.emails[]?' | sort -u > combined_emails.txt |
echo "Found $(cat combined_emails.txt | wc -l) unique email addresses"

Integración con Amass

# Use SpiderFoot results to seed Amass enumeration
| spiderfoot -e scan_id -o json | jq -r '.[] | select(.type=="DOMAIN_NAME") | .data' | sort -u > seed_domains.txt |

# Run Amass with SpiderFoot seeds
amass enum -df seed_domains.txt -active -brute -o amass_results.txt

# Compare results
| echo "SpiderFoot found $(spiderfoot -e scan_id -o json | jq -r '.[] | select(.type=="SUBDOMAIN") | .data' | sort -u | wc -l) subdomains" |
echo "Amass found $(cat amass_results.txt | wc -l) subdomains"

# Find new subdomains discovered by Amass
| comm -13 <(spiderfoot -e scan_id -o json | jq -r '.[] | select(.type=="SUBDOMAIN") | .data' | sort -u) <(sort -u amass_results.txt) > new_subdomains.txt |
echo "Amass discovered $(cat new_subdomains.txt | wc -l) additional subdomains"

Optimización del rendimiento y solución de problemas

Performance Tuning

# Optimize SpiderFoot performance
# Edit ~/.spiderfoot/spiderfoot.conf

# Increase thread count for faster scanning
__threads = 20

# Adjust request delays to avoid rate limiting
__delay = 1

# Increase timeout for slow responses
__timeout = 30

# Optimize database settings
__database = /tmp/spiderfoot.db  # Use faster storage
__dbpragmas = journal_mode=WAL,synchronous=NORMAL,cache_size=10000

# Memory optimization
__maxmemory = 2048  # MB

# Network optimization
__useragent = Mozilla/5.0 (compatible; SpiderFoot)
__proxy = http://proxy.example.com:8080  # Use proxy for better performance

Vigilancia y registro

# Enable detailed logging
spiderfoot -l 127.0.0.1:5001 -d

# Monitor scan progress
tail -f ~/.spiderfoot/spiderfoot.log

# Check system resources
ps aux | grep spiderfoot
netstat -tulpn | grep 5001

# Database optimization
sqlite3 ~/.spiderfoot/spiderfoot.db "VACUUM;"
sqlite3 ~/.spiderfoot/spiderfoot.db "ANALYZE;"

# Clean old scan data
spiderfoot -D  # Delete all scan data
# Or delete specific scans via web interface

Problemas y soluciones comunes

# Issue: API rate limiting
# Solution: Configure delays and use API keys
echo "api_delay = 2" >> ~/.spiderfoot/spiderfoot.conf

# Issue: Memory usage too high
# Solution: Limit concurrent modules and use passive scanning
spiderfoot -s target.com -t DOMAIN -T 5 -p

# Issue: Slow database performance
# Solution: Use WAL mode and optimize database
sqlite3 ~/.spiderfoot/spiderfoot.db "PRAGMA journal_mode=WAL;"
sqlite3 ~/.spiderfoot/spiderfoot.db "PRAGMA synchronous=NORMAL;"

# Issue: Module errors
# Solution: Check module configuration and API keys
spiderfoot -M sfp_virustotal  # Check specific module
grep "ERROR" ~/.spiderfoot/spiderfoot.log | tail -20

# Issue: Web interface not accessible
# Solution: Check binding and firewall
netstat -tulpn | grep 5001
sudo ufw allow 5001/tcp  # If using UFW firewall

Despliegue y escalado personalizados

# Docker deployment with custom configuration
docker run -d \
  --name spiderfoot \
  -p 5001:5001 \
  -v /path/to/config:/home/spiderfoot/.spiderfoot \
  -v /path/to/data:/home/spiderfoot/data \
  -e SF_THREADS=20 \
  -e SF_DELAY=1 \
  spiderfoot/spiderfoot

# Docker Compose for production deployment
cat > docker-compose.yml << 'EOF'
version: '3.8'
services:
  spiderfoot:
    image: spiderfoot/spiderfoot:latest
    ports:
      - "5001:5001"
    volumes:
      - ./config:/home/spiderfoot/.spiderfoot
      - ./data:/home/spiderfoot/data
    environment:
      - SF_THREADS=20
      - SF_DELAY=1
      - SF_TIMEOUT=30
    restart: unless-stopped

  nginx:
    image: nginx:alpine
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf
      - ./ssl:/etc/nginx/ssl
    depends_on:
      - spiderfoot
    restart: unless-stopped
EOF

# Kubernetes deployment
cat > spiderfoot-deployment.yaml << 'EOF'
apiVersion: apps/v1
kind: Deployment
metadata:
  name: spiderfoot
spec:
  replicas: 1
  selector:
    matchLabels:
      app: spiderfoot
  template:
    metadata:
      labels:
        app: spiderfoot
    spec:
      containers:
      - name: spiderfoot
        image: spiderfoot/spiderfoot:latest
        ports:
        - containerPort: 5001
        env:
        - name: SF_THREADS
          value: "20"
        - name: SF_DELAY
          value: "1"
        volumeMounts:
        - name: config
          mountPath: /home/spiderfoot/.spiderfoot
        - name: data
          mountPath: /home/spiderfoot/data
      volumes:
      - name: config
        configMap:
          name: spiderfoot-config
      - name: data
        persistentVolumeClaim:
          claimName: spiderfoot-data
---
apiVersion: v1
kind: Service
metadata:
  name: spiderfoot-service
spec:
  selector:
    app: spiderfoot
  ports:
  - port: 80
    targetPort: 5001
  type: LoadBalancer
EOF

kubectl apply -f spiderfoot-deployment.yaml

Recursos

Documentación y comunidad

Formación y Tutoriales

Herramientas y recursos relacionados