콘텐츠로 이동

Photon

Overview

Photon is a high-speed web crawler and OSINT tool designed for reconnaissance and information gathering. Built to be lightweight and fast, it rapidly extracts valuable data from web pages including URLs, emails, and metadata. Photon is ideal for authorized penetration testers and security researchers performing web reconnaissance.

Key Features:

  • Ultra-fast web crawling with minimal memory footprint
  • Email and URL extraction from target websites
  • Metadata and file type discovery
  • JavaScript execution support
  • Multi-threaded operation for speed
  • Export data in multiple formats

Installation

Using pip

pip3 install photon-crawler

From Source

git clone https://github.com/s0md3v/photon.git
cd photon
pip3 install -r requirements.txt

Docker

docker run -it s0md3v/photon:latest

Verification

photon --version
python3 photon.py --help

Basic Usage

Simple Crawl

photon -u https://example.com

Crawl with Output Directory

photon -u https://example.com -o results

Limit Pages Crawled

photon -u https://example.com --level 2

Specify Thread Count

photon -u https://example.com -t 10

Core Commands

CommandDescription
-u, --urlTarget URL to crawl
-o, --outputOutput directory name
-l, --levelCrawl depth level (default: 2)
-t, --threadsNumber of threads (default: 5)
--timeoutRequest timeout in seconds
--headersCustom HTTP headers
--cookiesHTTP cookies to use
--user-agentCustom user agent string
--proxyHTTP/SOCKS proxy URL
--retriesRetry failed requests
--verboseVerbose output
--quietSuppress output

Advanced Options

Crawling Control

# Follow redirects
photon -u https://example.com --follow-redirects

# Crawl specific domain only
photon -u https://example.com --domain-only

# Exclude patterns
photon -u https://example.com --exclude-patterns "*.pdf|*.zip"

# Include file types
photon -u https://example.com --include-formats "html|pdf|doc"

Authentication

# Basic authentication
photon -u https://example.com --auth-user admin --auth-pass password

# Custom headers
photon -u https://example.com --headers "Authorization: Bearer token"

# Session cookies
photon -u https://example.com --cookies "session=abc123"

Data Extraction

# Extract emails only
photon -u https://example.com -o results --extract emails

# Extract URLs and files
photon -u https://example.com -o results --extract urls,files

# Extract metadata
photon -u https://example.com -o results --extract metadata

Use Cases

Reconnaissance and Mapping

# Map complete web structure
photon -u https://target.com --level 3 -t 20 -o target_map

# Find hidden directories
photon -u https://target.com --extract urls --verbose

Email Harvesting

# Extract all emails from domain
photon -u https://target.com -o emails_found

# Browse output/emails_found for discovered emails
cat output/emails.txt

File and Metadata Discovery

# Find document files
photon -u https://target.com --include-formats "pdf|doc|docx"

# Extract metadata from found files
photon -u https://target.com --extract metadata

Subdomain Discovery

# Discover subdomains through crawling
photon -u https://example.com --level 2 --verbose

# Grep results for subdomains
grep "http" output/urls.txt | cut -d'/' -f3 | sort -u

Output Formats

JSON Export

photon -u https://example.com -o results --format json

CSV Export

photon -u https://example.com -o results --format csv

Reading Output Files

# List all output files
ls output/

# View discovered URLs
cat output/urls.txt

# View discovered emails
cat output/emails.txt

# View external URLs
cat output/external.txt

Performance Tuning

Optimize Speed

# High thread count for fast crawling
photon -u https://target.com -t 50 --timeout 5

# Disable SSL verification (risky)
photon -u https://target.com --no-verify-ssl

# Increase timeout for slow servers
photon -u https://target.com --timeout 30

Memory Management

# Limit crawl depth to reduce memory
photon -u https://target.com --level 1

# Stream results to avoid buffering
photon -u https://target.com --stream

Rate Limiting

# Slow down requests (seconds between requests)
photon -u https://target.com --delay 1

# Randomize user agents
photon -u https://target.com --random-agent

Proxy and Privacy

Using Proxies

# HTTP proxy
photon -u https://target.com --proxy http://127.0.0.1:8080

# SOCKS5 proxy
photon -u https://target.com --proxy socks5://127.0.0.1:9050

# Multiple proxies (rotation)
photon -u https://target.com --proxy-list proxies.txt

User Agent Rotation

# Random user agent
photon -u https://target.com --random-agent

# Custom user agent
photon -u https://target.com --user-agent "Mozilla/5.0..."

JavaScript Execution

Enable JavaScript Rendering

# Render JavaScript content
photon -u https://target.com --javascript

# Wait for AJAX content
photon -u https://target.com --javascript --wait 5

Combining Results

Merge Multiple Crawls

# Crawl multiple related domains
photon -u https://example1.com -o results1
photon -u https://example2.com -o results2

# Combine results
cat results1/urls.txt results2/urls.txt | sort -u > combined.txt

Post-Processing Results

# Find interesting patterns
cat output/urls.txt | grep -E "admin|config|backup|test"

# Extract domains from URLs
cat output/urls.txt | cut -d'/' -f3 | sort -u

# Filter by extension
grep -E "\.pdf|\.xlsx|\.doc" output/urls.txt

Practical Examples

Complete OSINT Crawl

# Full reconnaissance with verbose output
photon -u https://target.com \
  -o target_osint \
  --level 2 \
  -t 20 \
  --extract urls,emails,metadata \
  --verbose

Stealth Crawling

# Slow, stealthy approach with rotating agents
photon -u https://target.com \
  -o stealthy_results \
  --level 1 \
  -t 3 \
  --random-agent \
  --delay 2 \
  --timeout 10

Deep Reconnaissance

# Deep crawl with advanced extraction
photon -u https://target.com \
  -o deep_recon \
  --level 3 \
  -t 25 \
  --proxy socks5://127.0.0.1:9050 \
  --javascript \
  --extract urls,emails,metadata,comments

Best Practices

Ethical Usage

  • Always have authorization before crawling any website
  • Respect robots.txt and crawl-delay directives
  • Use appropriate rate limiting to avoid DoS
  • Identify yourself with custom User-Agent headers

Effective Reconnaissance

  • Start with level 1 crawls, increase depth as needed
  • Use thread count appropriate for target stability
  • Combine with other OSINT tools for better results
  • Save output for historical comparison

Performance

  • Adjust thread count based on target response time
  • Use appropriate timeout values for slow servers
  • Disable unnecessary features to increase speed
  • Monitor resource usage on long crawls

Troubleshooting

SSL Certificate Errors

# Disable SSL verification (for testing only)
photon -u https://target.com --no-verify-ssl

Timeout Issues

# Increase timeout values
photon -u https://target.com --timeout 30 --retries 3

Memory Usage

# Reduce memory footprint
photon -u https://target.com --level 1 -t 5

Getting Blocked

# Slow down and rotate user agents
photon -u https://target.com --delay 2 --random-agent --proxy-list proxies.txt

Common Workflows

Subdomain Enumeration

# Extract all subdomains via crawling
photon -u https://example.com --level 2 --verbose 2>&1 | \
  grep -oE 'https?://[^/]+' | cut -d'/' -f3 | sort -u

Sensitive File Discovery

# Find backup and config files
photon -u https://target.com -o results
grep -E "\.bak|\.conf|\.config|\.sql" results/urls.txt

Email Extraction for Phishing

# Harvest all emails from target
photon -u https://target.com -o results
cat results/emails.txt | sort -u > harvested_emails.txt
wc -l harvested_emails.txt

Integration with Other Tools

Feed to nmap

# Extract IPs from crawled URLs for scanning
photon -u https://target.com -o results
cat results/urls.txt | cut -d'/' -f3 | cut -d':' -f1 | sort -u | tee ips.txt
nmap -iL ips.txt -p 80,443 -sV

Feed to Nuclei

# Scan discovered URLs with Nuclei
photon -u https://target.com -o results
nuclei -l results/urls.txt -t /path/to/templates

Version and Support

Check current version and updates:

photon --version
pip3 install --upgrade photon-crawler

Important: Only use Photon on systems and networks where you have explicit authorization. Unauthorized web crawling and reconnaissance is illegal. Always comply with laws and obtain proper written permission before conducting security testing.