Photon
Overview
Abschnitt betitelt „Overview“Photon is a high-speed web crawler and OSINT tool designed for reconnaissance and information gathering. Built to be lightweight and fast, it rapidly extracts valuable data from web pages including URLs, emails, and metadata. Photon is ideal for authorized penetration testers and security researchers performing web reconnaissance.
Key Features:
- Ultra-fast web crawling with minimal memory footprint
- Email and URL extraction from target websites
- Metadata and file type discovery
- JavaScript execution support
- Multi-threaded operation for speed
- Export data in multiple formats
Installation
Abschnitt betitelt „Installation“Using pip
Abschnitt betitelt „Using pip“pip3 install photon-crawler
From Source
Abschnitt betitelt „From Source“git clone https://github.com/s0md3v/photon.git
cd photon
pip3 install -r requirements.txt
docker run -it s0md3v/photon:latest
Verification
Abschnitt betitelt „Verification“photon --version
python3 photon.py --help
Basic Usage
Abschnitt betitelt „Basic Usage“Simple Crawl
Abschnitt betitelt „Simple Crawl“photon -u https://example.com
Crawl with Output Directory
Abschnitt betitelt „Crawl with Output Directory“photon -u https://example.com -o results
Limit Pages Crawled
Abschnitt betitelt „Limit Pages Crawled“photon -u https://example.com --level 2
Specify Thread Count
Abschnitt betitelt „Specify Thread Count“photon -u https://example.com -t 10
Core Commands
Abschnitt betitelt „Core Commands“| Command | Description |
|---|---|
-u, --url | Target URL to crawl |
-o, --output | Output directory name |
-l, --level | Crawl depth level (default: 2) |
-t, --threads | Number of threads (default: 5) |
--timeout | Request timeout in seconds |
--headers | Custom HTTP headers |
--cookies | HTTP cookies to use |
--user-agent | Custom user agent string |
--proxy | HTTP/SOCKS proxy URL |
--retries | Retry failed requests |
--verbose | Verbose output |
--quiet | Suppress output |
Advanced Options
Abschnitt betitelt „Advanced Options“Crawling Control
Abschnitt betitelt „Crawling Control“# Follow redirects
photon -u https://example.com --follow-redirects
# Crawl specific domain only
photon -u https://example.com --domain-only
# Exclude patterns
photon -u https://example.com --exclude-patterns "*.pdf|*.zip"
# Include file types
photon -u https://example.com --include-formats "html|pdf|doc"
Authentication
Abschnitt betitelt „Authentication“# Basic authentication
photon -u https://example.com --auth-user admin --auth-pass password
# Custom headers
photon -u https://example.com --headers "Authorization: Bearer token"
# Session cookies
photon -u https://example.com --cookies "session=abc123"
Data Extraction
Abschnitt betitelt „Data Extraction“# Extract emails only
photon -u https://example.com -o results --extract emails
# Extract URLs and files
photon -u https://example.com -o results --extract urls,files
# Extract metadata
photon -u https://example.com -o results --extract metadata
Use Cases
Abschnitt betitelt „Use Cases“Reconnaissance and Mapping
Abschnitt betitelt „Reconnaissance and Mapping“# Map complete web structure
photon -u https://target.com --level 3 -t 20 -o target_map
# Find hidden directories
photon -u https://target.com --extract urls --verbose
Email Harvesting
Abschnitt betitelt „Email Harvesting“# Extract all emails from domain
photon -u https://target.com -o emails_found
# Browse output/emails_found for discovered emails
cat output/emails.txt
File and Metadata Discovery
Abschnitt betitelt „File and Metadata Discovery“# Find document files
photon -u https://target.com --include-formats "pdf|doc|docx"
# Extract metadata from found files
photon -u https://target.com --extract metadata
Subdomain Discovery
Abschnitt betitelt „Subdomain Discovery“# Discover subdomains through crawling
photon -u https://example.com --level 2 --verbose
# Grep results for subdomains
grep "http" output/urls.txt | cut -d'/' -f3 | sort -u
Output Formats
Abschnitt betitelt „Output Formats“JSON Export
Abschnitt betitelt „JSON Export“photon -u https://example.com -o results --format json
CSV Export
Abschnitt betitelt „CSV Export“photon -u https://example.com -o results --format csv
Reading Output Files
Abschnitt betitelt „Reading Output Files“# List all output files
ls output/
# View discovered URLs
cat output/urls.txt
# View discovered emails
cat output/emails.txt
# View external URLs
cat output/external.txt
Performance Tuning
Abschnitt betitelt „Performance Tuning“Optimize Speed
Abschnitt betitelt „Optimize Speed“# High thread count for fast crawling
photon -u https://target.com -t 50 --timeout 5
# Disable SSL verification (risky)
photon -u https://target.com --no-verify-ssl
# Increase timeout for slow servers
photon -u https://target.com --timeout 30
Memory Management
Abschnitt betitelt „Memory Management“# Limit crawl depth to reduce memory
photon -u https://target.com --level 1
# Stream results to avoid buffering
photon -u https://target.com --stream
Rate Limiting
Abschnitt betitelt „Rate Limiting“# Slow down requests (seconds between requests)
photon -u https://target.com --delay 1
# Randomize user agents
photon -u https://target.com --random-agent
Proxy and Privacy
Abschnitt betitelt „Proxy and Privacy“Using Proxies
Abschnitt betitelt „Using Proxies“# HTTP proxy
photon -u https://target.com --proxy http://127.0.0.1:8080
# SOCKS5 proxy
photon -u https://target.com --proxy socks5://127.0.0.1:9050
# Multiple proxies (rotation)
photon -u https://target.com --proxy-list proxies.txt
User Agent Rotation
Abschnitt betitelt „User Agent Rotation“# Random user agent
photon -u https://target.com --random-agent
# Custom user agent
photon -u https://target.com --user-agent "Mozilla/5.0..."
JavaScript Execution
Abschnitt betitelt „JavaScript Execution“Enable JavaScript Rendering
Abschnitt betitelt „Enable JavaScript Rendering“# Render JavaScript content
photon -u https://target.com --javascript
# Wait for AJAX content
photon -u https://target.com --javascript --wait 5
Combining Results
Abschnitt betitelt „Combining Results“Merge Multiple Crawls
Abschnitt betitelt „Merge Multiple Crawls“# Crawl multiple related domains
photon -u https://example1.com -o results1
photon -u https://example2.com -o results2
# Combine results
cat results1/urls.txt results2/urls.txt | sort -u > combined.txt
Post-Processing Results
Abschnitt betitelt „Post-Processing Results“# Find interesting patterns
cat output/urls.txt | grep -E "admin|config|backup|test"
# Extract domains from URLs
cat output/urls.txt | cut -d'/' -f3 | sort -u
# Filter by extension
grep -E "\.pdf|\.xlsx|\.doc" output/urls.txt
Practical Examples
Abschnitt betitelt „Practical Examples“Complete OSINT Crawl
Abschnitt betitelt „Complete OSINT Crawl“# Full reconnaissance with verbose output
photon -u https://target.com \
-o target_osint \
--level 2 \
-t 20 \
--extract urls,emails,metadata \
--verbose
Stealth Crawling
Abschnitt betitelt „Stealth Crawling“# Slow, stealthy approach with rotating agents
photon -u https://target.com \
-o stealthy_results \
--level 1 \
-t 3 \
--random-agent \
--delay 2 \
--timeout 10
Deep Reconnaissance
Abschnitt betitelt „Deep Reconnaissance“# Deep crawl with advanced extraction
photon -u https://target.com \
-o deep_recon \
--level 3 \
-t 25 \
--proxy socks5://127.0.0.1:9050 \
--javascript \
--extract urls,emails,metadata,comments
Best Practices
Abschnitt betitelt „Best Practices“Ethical Usage
Abschnitt betitelt „Ethical Usage“- Always have authorization before crawling any website
- Respect robots.txt and crawl-delay directives
- Use appropriate rate limiting to avoid DoS
- Identify yourself with custom User-Agent headers
Effective Reconnaissance
Abschnitt betitelt „Effective Reconnaissance“- Start with level 1 crawls, increase depth as needed
- Use thread count appropriate for target stability
- Combine with other OSINT tools for better results
- Save output for historical comparison
Performance
Abschnitt betitelt „Performance“- Adjust thread count based on target response time
- Use appropriate timeout values for slow servers
- Disable unnecessary features to increase speed
- Monitor resource usage on long crawls
Troubleshooting
Abschnitt betitelt „Troubleshooting“SSL Certificate Errors
Abschnitt betitelt „SSL Certificate Errors“# Disable SSL verification (for testing only)
photon -u https://target.com --no-verify-ssl
Timeout Issues
Abschnitt betitelt „Timeout Issues“# Increase timeout values
photon -u https://target.com --timeout 30 --retries 3
Memory Usage
Abschnitt betitelt „Memory Usage“# Reduce memory footprint
photon -u https://target.com --level 1 -t 5
Getting Blocked
Abschnitt betitelt „Getting Blocked“# Slow down and rotate user agents
photon -u https://target.com --delay 2 --random-agent --proxy-list proxies.txt
Common Workflows
Abschnitt betitelt „Common Workflows“Subdomain Enumeration
Abschnitt betitelt „Subdomain Enumeration“# Extract all subdomains via crawling
photon -u https://example.com --level 2 --verbose 2>&1 | \
grep -oE 'https?://[^/]+' | cut -d'/' -f3 | sort -u
Sensitive File Discovery
Abschnitt betitelt „Sensitive File Discovery“# Find backup and config files
photon -u https://target.com -o results
grep -E "\.bak|\.conf|\.config|\.sql" results/urls.txt
Email Extraction for Phishing
Abschnitt betitelt „Email Extraction for Phishing“# Harvest all emails from target
photon -u https://target.com -o results
cat results/emails.txt | sort -u > harvested_emails.txt
wc -l harvested_emails.txt
Integration with Other Tools
Abschnitt betitelt „Integration with Other Tools“Feed to nmap
Abschnitt betitelt „Feed to nmap“# Extract IPs from crawled URLs for scanning
photon -u https://target.com -o results
cat results/urls.txt | cut -d'/' -f3 | cut -d':' -f1 | sort -u | tee ips.txt
nmap -iL ips.txt -p 80,443 -sV
Feed to Nuclei
Abschnitt betitelt „Feed to Nuclei“# Scan discovered URLs with Nuclei
photon -u https://target.com -o results
nuclei -l results/urls.txt -t /path/to/templates
Version and Support
Abschnitt betitelt „Version and Support“Check current version and updates:
photon --version
pip3 install --upgrade photon-crawler
Legal and Ethical Considerations
Abschnitt betitelt „Legal and Ethical Considerations“Important: Only use Photon on systems and networks where you have explicit authorization. Unauthorized web crawling and reconnaissance is illegal. Always comply with laws and obtain proper written permission before conducting security testing.