waybackurls
Overview
waybackurls is a command-line tool by Tom Hudson (@tomnomnom) that retrieves all known URLs for a target domain from the Wayback Machine Internet Archive. It’s essential for reconnaissance and discovering previously exposed endpoints, parameters, and functionality that may contain vulnerabilities. Perfect for bug bounty hunters and penetration testers.
Key Features
- Fetch all archived URLs for a domain from Wayback Machine
- Supports multiple domain formats and subdomains
- Fast batch processing with parallel requests
- Filters and sorts results
- Integrates seamlessly with other tools in reconnaissance pipelines
- Cross-platform (Linux, macOS, Windows)
- No authentication required
Installation
Linux / Kali Linux
# Install via apt (if available)
sudo apt-get install waybackurls
# Or using Go (recommended)
go install github.com/tomnomnom/waybackurls@latest
# Verify installation
waybackurls -h
macOS
# Via Homebrew
brew install waybackurls
# Or using Go
go install github.com/tomnomnom/waybackurls@latest
# Verify installation
waybackurls -h
Windows
# Using Go
go install github.com/tomnomnom/waybackurls@latest
# Or download binary from GitHub releases
# https://github.com/tomnomnom/waybackurls/releases
# Add to PATH if needed
$env:PATH += ";$env:USERPROFILE\go\bin"
Manual Installation from Source
# Clone repository
git clone https://github.com/tomnomnom/waybackurls.git
cd waybackurls
# Build binary
go build -o waybackurls
# Make executable (Linux/macOS)
chmod +x waybackurls
# Test
./waybackurls
Docker Installation
# Build Docker image
docker build -t waybackurls .
# Run in container
docker run -it waybackurls example.com
Basic Usage
Simple Domain Lookup
# Fetch all URLs for a domain
waybackurls example.com
# Fetch URLs for subdomain
waybackurls subdomain.example.com
Output to File
# Save results to file
waybackurls example.com > urls.txt
# View results
cat urls.txt
Filter by File Extension
# Extract JavaScript files
waybackurls example.com | grep -E "\.js$"
# Extract PHP files
waybackurls example.com | grep -E "\.php$"
# Extract API endpoints
waybackurls example.com | grep -E "api|/v[0-9]/"
Count URLs
# Count total URLs found
waybackurls example.com | wc -l
# Count unique URLs
waybackurls example.com | sort -u | wc -l
Command Options
| Option | Description |
|---|---|
-d DOMAIN | Specify domain to query |
-filter PATTERN | Filter results by regex pattern |
-mc | Match common (success) status codes |
-h | Display help information |
-version | Show version number |
-timeout | Request timeout in seconds |
Advanced Filtering Techniques
Filter by Status Code
# Get URLs that returned 200 OK
waybackurls example.com | while read url; do
status=$(curl -s -o /dev/null -w "%{http_code}" "$url")
[ "$status" -eq 200 ] && echo "$url"
done
Extract Specific Parameters
# Find URLs with ID parameters
waybackurls example.com | grep -E "[?&]id="
# Find search parameters
waybackurls example.com | grep -E "[?&]search=|[?&]q="
# Find user parameters
waybackurls example.com | grep -E "[?&]user=|[?&]username="
Extract Unique Endpoints
# Get unique paths only
waybackurls example.com | sed 's/[?#].*//' | sort -u
# Get top 20 most common endpoints
waybackurls example.com | sed 's/[?#].*//' | sort | uniq -c | sort -rn | head -20
Filter by Sensitive Patterns
# Find potential admin panels
waybackurls example.com | grep -iE "admin|panel|dashboard|control"
# Find API endpoints
waybackurls example.com | grep -iE "/api|/v[0-9]"
# Find upload endpoints
waybackurls example.com | grep -iE "upload|file|attachment"
# Find configuration files
waybackurls example.com | grep -iE "\.conf|\.config|\.json|\.xml|\.yml"
Real-World Examples
Example 1: Full Reconnaissance
# Get all URLs
waybackurls example.com > all_urls.txt
# Extract unique endpoints
cat all_urls.txt | sed 's/[?#].*//' | sort -u > endpoints.txt
# Count endpoints
wc -l endpoints.txt
# Find JavaScript files
grep -E "\.js$" all_urls.txt > javascript_files.txt
Example 2: Parameter Discovery
# Extract all query parameters
waybackurls example.com | grep -o "[?&][^=]*=[^&]*" | sort -u > parameters.txt
# Find interesting parameters
grep -E "id=|user=|pass=|token=|key=" parameters.txt
Example 3: API Endpoint Enumeration
# Get all API endpoints
waybackurls example.com | grep -iE "/api|/rest|/v[0-9]" | sort -u > api_endpoints.txt
# Check which endpoints are still accessible
while read url; do
response=$(curl -s -o /dev/null -w "%{http_code}" "https://example.com$url" 2>/dev/null)
echo "$url - $response"
done < api_endpoints.txt
Example 4: Integration with Other Tools
# Pipe to gf (by tomnomnom) for pattern matching
waybackurls example.com | gf xss
# Pipe to qsreplace for quick parameter testing
waybackurls example.com | qsreplace 'PAYLOAD'
# Pipe to httpx for status checking
waybackurls example.com | httpx -status-code -title
Integration with Reconnaissance Pipelines
Combined with Subfinder
# Find subdomains, then get URLs for each
subfinder -d example.com | while read subdomain; do
echo "[*] Processing $subdomain"
waybackurls "$subdomain" >> all_urls.txt
done
# Get unique URLs
sort -u all_urls.txt > unique_urls.txt
Combined with HTTPX
# Get URLs and check which are still alive
waybackurls example.com | httpx \
-status-code \
-title \
-content-type \
-web-server \
-o live_urls.txt
Combined with Nuclei
# Get URLs and scan with Nuclei templates
waybackurls example.com | nuclei \
-t ~/nuclei-templates/ \
-severity high \
-o nuclei_results.txt
Combined with Burp Suite
# Save URLs to file for importing into Burp
waybackurls example.com > burp_import.txt
# In Burp: Extender > BurpSuite > Import > Choose file
Practical Automation Scripts
Bash Script: Full Reconnaissance
Create file recon.sh:
#!/bin/bash
DOMAIN=$1
if [ -z "$DOMAIN" ]; then
echo "Usage: ./recon.sh example.com"
exit 1
fi
echo "[*] Starting reconnaissance for $DOMAIN"
# Create output directory
mkdir -p "$DOMAIN"
cd "$DOMAIN"
# Get all URLs from Wayback Machine
echo "[+] Fetching URLs from Wayback Machine..."
waybackurls "$DOMAIN" > all_urls.txt
# Extract unique endpoints
echo "[+] Extracting unique endpoints..."
cat all_urls.txt | sed 's/[?#].*//' | sort -u > endpoints.txt
# Extract JavaScript files
echo "[+] Extracting JavaScript files..."
grep -E "\.js$" all_urls.txt > javascript_files.txt
# Extract API endpoints
echo "[+] Extracting API endpoints..."
grep -iE "/api|/rest|/v[0-9]" all_urls.txt | sort -u > api_endpoints.txt
# Extract parameters
echo "[+] Extracting parameters..."
grep -o "[?&][^=]*" all_urls.txt | sort -u > parameters.txt
# Check live URLs
echo "[+] Checking live URLs..."
if command -v httpx &> /dev/null; then
httpx -l all_urls.txt -status-code -title -o live_urls.txt
else
echo "[!] httpx not installed, skipping live URL check"
fi
# Summary
echo ""
echo "[*] Reconnaissance complete!"
echo "[*] Results stored in ./$DOMAIN/"
echo "[*] Total URLs: $(wc -l < all_urls.txt)"
echo "[*] Unique endpoints: $(wc -l < endpoints.txt)"
echo "[*] JavaScript files: $(wc -l < javascript_files.txt)"
echo "[*] API endpoints: $(wc -l < api_endpoints.txt)"
echo "[*] Parameters: $(wc -l < parameters.txt)"
Run script:
chmod +x recon.sh
./recon.sh example.com
Python Script: Advanced Filtering
Create file wayback_filter.py:
#!/usr/bin/env python3
import sys
import subprocess
from urllib.parse import urlparse, parse_qs
def get_wayback_urls(domain):
"""Fetch URLs from waybackurls"""
try:
result = subprocess.run(
['waybackurls', domain],
capture_output=True,
text=True
)
return result.stdout.strip().split('\n')
except Exception as e:
print(f"Error: {e}")
return []
def filter_urls(urls, pattern):
"""Filter URLs by pattern"""
return [url for url in urls if pattern.lower() in url.lower()]
def extract_parameters(urls):
"""Extract all parameters from URLs"""
params = set()
for url in urls:
parsed = urlparse(url)
if parsed.query:
qs = parse_qs(parsed.query)
params.update(qs.keys())
return sorted(params)
def main():
if len(sys.argv) < 2:
print("Usage: python3 wayback_filter.py <domain> [pattern]")
sys.exit(1)
domain = sys.argv[1]
pattern = sys.argv[2] if len(sys.argv) > 2 else None
print(f"[*] Fetching URLs for {domain}...")
urls = get_wayback_urls(domain)
if not urls:
print("[!] No URLs found")
sys.exit(1)
if pattern:
urls = filter_urls(urls, pattern)
# Print results
for url in urls:
print(url)
# Print summary
print(f"\n[*] Total URLs: {len(urls)}", file=sys.stderr)
# Extract and show parameters
params = extract_parameters(urls)
print(f"[*] Unique parameters: {len(params)}", file=sys.stderr)
if params:
print("[*] Parameters found:", file=sys.stderr)
for param in params[:10]: # Show first 10
print(f" - {param}", file=sys.stderr)
if __name__ == "__main__":
main()
Run script:
chmod +x wayback_filter.py
python3 wayback_filter.py example.com
python3 wayback_filter.py example.com "api" # Filter for API
Performance Optimization
Speed Up Multiple Domains
# Process multiple domains in parallel
domains="example.com example.org example.net"
for domain in $domains; do
waybackurls "$domain" > "${domain}_urls.txt" &
done
wait # Wait for all background jobs
# Combine results
cat *_urls.txt > all_domains_urls.txt
sort -u all_domains_urls.txt > unique_urls.txt
Using GNU Parallel
# Install GNU Parallel if needed
sudo apt-get install parallel
# Process domains in parallel
cat domains.txt | parallel "waybackurls {} > {}.txt"
# Combine results
cat *.txt | sort -u > all_urls.txt
Output Analysis
Generate Report
# Create summary report
{
echo "=== Wayback Machine URL Report ==="
echo "Domain: example.com"
echo "Date: $(date)"
echo ""
URLS=$(waybackurls example.com)
TOTAL=$(echo "$URLS" | wc -l)
UNIQUE=$(echo "$URLS" | sort -u | wc -l)
echo "Total URLs: $TOTAL"
echo "Unique URLs: $UNIQUE"
echo ""
echo "=== File Types ==="
echo "$URLS" | sed 's/.*\.//' | sort | uniq -c | sort -rn | head -10
echo ""
echo "=== Top Endpoints ==="
echo "$URLS" | sed 's/[?#].*//' | sed 's#.*/##' | sort | uniq -c | sort -rn | head -10
} > report.txt
cat report.txt
Troubleshooting
Issue: No URLs Returned
# Check if domain is in Wayback Machine
curl -s "http://archive.org/wayback/available?url=example.com&output=json" | jq
# Try without www prefix
waybackurls example.com
Issue: Slow Queries
# Wayback Machine can be slow, add patience
timeout 300 waybackurls example.com > urls.txt
# Or increase request timeout if available
# (check tool documentation for timeout options)
Issue: Rate Limiting
# Add delays between requests
waybackurls example.com | while read url; do
echo "$url"
sleep 0.1
done
Best Practices
Effective Reconnaissance
- Start broad: Get all URLs for the main domain
- Expand scope: Include all known subdomains
- Filter results: Focus on interesting endpoints
- Identify patterns: Look for parameter names, API versions
- Check accessibility: Verify which URLs are still active
- Document findings: Save all results for analysis
Security Considerations
- Respect scope: Only test domains you have authorization for
- Rate limiting: Don’t overwhelm the Wayback Machine API
- Data privacy: Be careful with sensitive information in URLs
- Archive responsibility: Don’t exploit archived data for malicious purposes
Resources
- Official GitHub: https://github.com/tomnomnom/waybackurls
- Wayback Machine API: https://archive.org/help/wayback_api.php
- Tom Hudson’s Tools: https://github.com/tomnomnom
- Reconnaissance Guide: https://owasp.org/www-project-web-security-testing-guide/v41/4-Web_Application_Security_Testing/01-Initial_Reconnaissance.html
- Internet Archive: https://archive.org/
Related Tools
Similar or complementary reconnaissance tools:
| Tool | Purpose |
|---|---|
| Subfinder | Subdomain enumeration |
| HTTPX | HTTP client for batching requests |
| Nuclei | Vulnerability scanning |
| gf | Pattern matching for URLs |
| qsreplace | Query string manipulation |
| Feroxbuster | Directory and endpoint discovery |
| OWASP Amass | Subdomain and API enumeration |
Legal Disclaimer
waybackurls is designed for authorized security testing and reconnaissance only. Authorized testing means:
- Testing systems you own or have explicit written permission to test
- Following responsible disclosure practices
- Respecting the Wayback Machine’s terms of service
- Not using discovered information for malicious purposes
- Complying with all applicable laws and regulations
Always obtain written authorization before conducting security assessments on any system you do not own.