تخطَّ إلى المحتوى

waybackurls

Overview

waybackurls is a command-line tool by Tom Hudson (@tomnomnom) that retrieves all known URLs for a target domain from the Wayback Machine Internet Archive. It’s essential for reconnaissance and discovering previously exposed endpoints, parameters, and functionality that may contain vulnerabilities. Perfect for bug bounty hunters and penetration testers.

Key Features

  • Fetch all archived URLs for a domain from Wayback Machine
  • Supports multiple domain formats and subdomains
  • Fast batch processing with parallel requests
  • Filters and sorts results
  • Integrates seamlessly with other tools in reconnaissance pipelines
  • Cross-platform (Linux, macOS, Windows)
  • No authentication required

Installation

Linux / Kali Linux

# Install via apt (if available)
sudo apt-get install waybackurls

# Or using Go (recommended)
go install github.com/tomnomnom/waybackurls@latest

# Verify installation
waybackurls -h

macOS

# Via Homebrew
brew install waybackurls

# Or using Go
go install github.com/tomnomnom/waybackurls@latest

# Verify installation
waybackurls -h

Windows

# Using Go
go install github.com/tomnomnom/waybackurls@latest

# Or download binary from GitHub releases
# https://github.com/tomnomnom/waybackurls/releases

# Add to PATH if needed
$env:PATH += ";$env:USERPROFILE\go\bin"

Manual Installation from Source

# Clone repository
git clone https://github.com/tomnomnom/waybackurls.git
cd waybackurls

# Build binary
go build -o waybackurls

# Make executable (Linux/macOS)
chmod +x waybackurls

# Test
./waybackurls

Docker Installation

# Build Docker image
docker build -t waybackurls .

# Run in container
docker run -it waybackurls example.com

Basic Usage

Simple Domain Lookup

# Fetch all URLs for a domain
waybackurls example.com

# Fetch URLs for subdomain
waybackurls subdomain.example.com

Output to File

# Save results to file
waybackurls example.com > urls.txt

# View results
cat urls.txt

Filter by File Extension

# Extract JavaScript files
waybackurls example.com | grep -E "\.js$"

# Extract PHP files
waybackurls example.com | grep -E "\.php$"

# Extract API endpoints
waybackurls example.com | grep -E "api|/v[0-9]/"

Count URLs

# Count total URLs found
waybackurls example.com | wc -l

# Count unique URLs
waybackurls example.com | sort -u | wc -l

Command Options

OptionDescription
-d DOMAINSpecify domain to query
-filter PATTERNFilter results by regex pattern
-mcMatch common (success) status codes
-hDisplay help information
-versionShow version number
-timeoutRequest timeout in seconds

Advanced Filtering Techniques

Filter by Status Code

# Get URLs that returned 200 OK
waybackurls example.com | while read url; do
  status=$(curl -s -o /dev/null -w "%{http_code}" "$url")
  [ "$status" -eq 200 ] && echo "$url"
done

Extract Specific Parameters

# Find URLs with ID parameters
waybackurls example.com | grep -E "[?&]id="

# Find search parameters
waybackurls example.com | grep -E "[?&]search=|[?&]q="

# Find user parameters
waybackurls example.com | grep -E "[?&]user=|[?&]username="

Extract Unique Endpoints

# Get unique paths only
waybackurls example.com | sed 's/[?#].*//' | sort -u

# Get top 20 most common endpoints
waybackurls example.com | sed 's/[?#].*//' | sort | uniq -c | sort -rn | head -20

Filter by Sensitive Patterns

# Find potential admin panels
waybackurls example.com | grep -iE "admin|panel|dashboard|control"

# Find API endpoints
waybackurls example.com | grep -iE "/api|/v[0-9]"

# Find upload endpoints
waybackurls example.com | grep -iE "upload|file|attachment"

# Find configuration files
waybackurls example.com | grep -iE "\.conf|\.config|\.json|\.xml|\.yml"

Real-World Examples

Example 1: Full Reconnaissance

# Get all URLs
waybackurls example.com > all_urls.txt

# Extract unique endpoints
cat all_urls.txt | sed 's/[?#].*//' | sort -u > endpoints.txt

# Count endpoints
wc -l endpoints.txt

# Find JavaScript files
grep -E "\.js$" all_urls.txt > javascript_files.txt

Example 2: Parameter Discovery

# Extract all query parameters
waybackurls example.com | grep -o "[?&][^=]*=[^&]*" | sort -u > parameters.txt

# Find interesting parameters
grep -E "id=|user=|pass=|token=|key=" parameters.txt

Example 3: API Endpoint Enumeration

# Get all API endpoints
waybackurls example.com | grep -iE "/api|/rest|/v[0-9]" | sort -u > api_endpoints.txt

# Check which endpoints are still accessible
while read url; do
  response=$(curl -s -o /dev/null -w "%{http_code}" "https://example.com$url" 2>/dev/null)
  echo "$url - $response"
done < api_endpoints.txt

Example 4: Integration with Other Tools

# Pipe to gf (by tomnomnom) for pattern matching
waybackurls example.com | gf xss

# Pipe to qsreplace for quick parameter testing
waybackurls example.com | qsreplace 'PAYLOAD'

# Pipe to httpx for status checking
waybackurls example.com | httpx -status-code -title

Integration with Reconnaissance Pipelines

Combined with Subfinder

# Find subdomains, then get URLs for each
subfinder -d example.com | while read subdomain; do
  echo "[*] Processing $subdomain"
  waybackurls "$subdomain" >> all_urls.txt
done

# Get unique URLs
sort -u all_urls.txt > unique_urls.txt

Combined with HTTPX

# Get URLs and check which are still alive
waybackurls example.com | httpx \
  -status-code \
  -title \
  -content-type \
  -web-server \
  -o live_urls.txt

Combined with Nuclei

# Get URLs and scan with Nuclei templates
waybackurls example.com | nuclei \
  -t ~/nuclei-templates/ \
  -severity high \
  -o nuclei_results.txt

Combined with Burp Suite

# Save URLs to file for importing into Burp
waybackurls example.com > burp_import.txt

# In Burp: Extender > BurpSuite > Import > Choose file

Practical Automation Scripts

Bash Script: Full Reconnaissance

Create file recon.sh:

#!/bin/bash

DOMAIN=$1

if [ -z "$DOMAIN" ]; then
  echo "Usage: ./recon.sh example.com"
  exit 1
fi

echo "[*] Starting reconnaissance for $DOMAIN"

# Create output directory
mkdir -p "$DOMAIN"
cd "$DOMAIN"

# Get all URLs from Wayback Machine
echo "[+] Fetching URLs from Wayback Machine..."
waybackurls "$DOMAIN" > all_urls.txt

# Extract unique endpoints
echo "[+] Extracting unique endpoints..."
cat all_urls.txt | sed 's/[?#].*//' | sort -u > endpoints.txt

# Extract JavaScript files
echo "[+] Extracting JavaScript files..."
grep -E "\.js$" all_urls.txt > javascript_files.txt

# Extract API endpoints
echo "[+] Extracting API endpoints..."
grep -iE "/api|/rest|/v[0-9]" all_urls.txt | sort -u > api_endpoints.txt

# Extract parameters
echo "[+] Extracting parameters..."
grep -o "[?&][^=]*" all_urls.txt | sort -u > parameters.txt

# Check live URLs
echo "[+] Checking live URLs..."
if command -v httpx &> /dev/null; then
  httpx -l all_urls.txt -status-code -title -o live_urls.txt
else
  echo "[!] httpx not installed, skipping live URL check"
fi

# Summary
echo ""
echo "[*] Reconnaissance complete!"
echo "[*] Results stored in ./$DOMAIN/"
echo "[*] Total URLs: $(wc -l < all_urls.txt)"
echo "[*] Unique endpoints: $(wc -l < endpoints.txt)"
echo "[*] JavaScript files: $(wc -l < javascript_files.txt)"
echo "[*] API endpoints: $(wc -l < api_endpoints.txt)"
echo "[*] Parameters: $(wc -l < parameters.txt)"

Run script:

chmod +x recon.sh
./recon.sh example.com

Python Script: Advanced Filtering

Create file wayback_filter.py:

#!/usr/bin/env python3

import sys
import subprocess
from urllib.parse import urlparse, parse_qs

def get_wayback_urls(domain):
    """Fetch URLs from waybackurls"""
    try:
        result = subprocess.run(
            ['waybackurls', domain],
            capture_output=True,
            text=True
        )
        return result.stdout.strip().split('\n')
    except Exception as e:
        print(f"Error: {e}")
        return []

def filter_urls(urls, pattern):
    """Filter URLs by pattern"""
    return [url for url in urls if pattern.lower() in url.lower()]

def extract_parameters(urls):
    """Extract all parameters from URLs"""
    params = set()
    for url in urls:
        parsed = urlparse(url)
        if parsed.query:
            qs = parse_qs(parsed.query)
            params.update(qs.keys())
    return sorted(params)

def main():
    if len(sys.argv) < 2:
        print("Usage: python3 wayback_filter.py <domain> [pattern]")
        sys.exit(1)
    
    domain = sys.argv[1]
    pattern = sys.argv[2] if len(sys.argv) > 2 else None
    
    print(f"[*] Fetching URLs for {domain}...")
    urls = get_wayback_urls(domain)
    
    if not urls:
        print("[!] No URLs found")
        sys.exit(1)
    
    if pattern:
        urls = filter_urls(urls, pattern)
    
    # Print results
    for url in urls:
        print(url)
    
    # Print summary
    print(f"\n[*] Total URLs: {len(urls)}", file=sys.stderr)
    
    # Extract and show parameters
    params = extract_parameters(urls)
    print(f"[*] Unique parameters: {len(params)}", file=sys.stderr)
    if params:
        print("[*] Parameters found:", file=sys.stderr)
        for param in params[:10]:  # Show first 10
            print(f"    - {param}", file=sys.stderr)

if __name__ == "__main__":
    main()

Run script:

chmod +x wayback_filter.py
python3 wayback_filter.py example.com
python3 wayback_filter.py example.com "api"  # Filter for API

Performance Optimization

Speed Up Multiple Domains

# Process multiple domains in parallel
domains="example.com example.org example.net"

for domain in $domains; do
  waybackurls "$domain" > "${domain}_urls.txt" &
done

wait  # Wait for all background jobs

# Combine results
cat *_urls.txt > all_domains_urls.txt
sort -u all_domains_urls.txt > unique_urls.txt

Using GNU Parallel

# Install GNU Parallel if needed
sudo apt-get install parallel

# Process domains in parallel
cat domains.txt | parallel "waybackurls {} > {}.txt"

# Combine results
cat *.txt | sort -u > all_urls.txt

Output Analysis

Generate Report

# Create summary report
{
  echo "=== Wayback Machine URL Report ==="
  echo "Domain: example.com"
  echo "Date: $(date)"
  echo ""
  
  URLS=$(waybackurls example.com)
  TOTAL=$(echo "$URLS" | wc -l)
  UNIQUE=$(echo "$URLS" | sort -u | wc -l)
  
  echo "Total URLs: $TOTAL"
  echo "Unique URLs: $UNIQUE"
  echo ""
  
  echo "=== File Types ==="
  echo "$URLS" | sed 's/.*\.//' | sort | uniq -c | sort -rn | head -10
  echo ""
  
  echo "=== Top Endpoints ==="
  echo "$URLS" | sed 's/[?#].*//' | sed 's#.*/##' | sort | uniq -c | sort -rn | head -10
  
} > report.txt

cat report.txt

Troubleshooting

Issue: No URLs Returned

# Check if domain is in Wayback Machine
curl -s "http://archive.org/wayback/available?url=example.com&output=json" | jq

# Try without www prefix
waybackurls example.com

Issue: Slow Queries

# Wayback Machine can be slow, add patience
timeout 300 waybackurls example.com > urls.txt

# Or increase request timeout if available
# (check tool documentation for timeout options)

Issue: Rate Limiting

# Add delays between requests
waybackurls example.com | while read url; do
  echo "$url"
  sleep 0.1
done

Best Practices

Effective Reconnaissance

  1. Start broad: Get all URLs for the main domain
  2. Expand scope: Include all known subdomains
  3. Filter results: Focus on interesting endpoints
  4. Identify patterns: Look for parameter names, API versions
  5. Check accessibility: Verify which URLs are still active
  6. Document findings: Save all results for analysis

Security Considerations

  • Respect scope: Only test domains you have authorization for
  • Rate limiting: Don’t overwhelm the Wayback Machine API
  • Data privacy: Be careful with sensitive information in URLs
  • Archive responsibility: Don’t exploit archived data for malicious purposes

Resources


Similar or complementary reconnaissance tools:

ToolPurpose
SubfinderSubdomain enumeration
HTTPXHTTP client for batching requests
NucleiVulnerability scanning
gfPattern matching for URLs
qsreplaceQuery string manipulation
FeroxbusterDirectory and endpoint discovery
OWASP AmassSubdomain and API enumeration

waybackurls is designed for authorized security testing and reconnaissance only. Authorized testing means:

  • Testing systems you own or have explicit written permission to test
  • Following responsible disclosure practices
  • Respecting the Wayback Machine’s terms of service
  • Not using discovered information for malicious purposes
  • Complying with all applicable laws and regulations

Always obtain written authorization before conducting security assessments on any system you do not own.