getallurls (gau)
Overview
Abschnitt betitelt „Overview“getallurls (gau) is an OSINT command-line tool that fetches all known URLs for a given domain from multiple historical sources including AlienVault OTX, Wayback Machine, and Common Crawl. It aggregates URL intelligence to build comprehensive attack surface maps and discover hidden endpoints.
getallurls is valuable for:
- Domain reconnaissance and endpoint discovery
- Identifying legacy or forgotten endpoints
- Finding parameter patterns and API endpoints
- Vulnerability assessment and bug hunting
- Web application penetration testing
- OSINT and threat intelligence gathering
Installation
Abschnitt betitelt „Installation“Prerequisites
Abschnitt betitelt „Prerequisites“- Go 1.14+ (for compilation)
- Linux/macOS/Windows
- Internet connectivity
- API keys (optional for rate limit increases)
Install via Go
Abschnitt betitelt „Install via Go“# Install with Go package manager
go install github.com/lc/gau/v2/cmd/gau@latest
# Verify installation
gau -version
# Expected output: gau version 2.x.x
Install from Source
Abschnitt betitelt „Install from Source“# Clone and compile
git clone https://github.com/lc/gau.git
cd gau/cmd/gau
go build -o gau
# Move to PATH
sudo mv gau /usr/local/bin/
gau -version
Docker Installation
Abschnitt betitelt „Docker Installation“# Run in Docker container
docker pull projectdiscovery/gau:latest
docker run projectdiscovery/gau:latest gau -h
# Create alias for convenience
alias gau='docker run projectdiscovery/gau:latest gau'
Verify Installation
Abschnitt betitelt „Verify Installation“# Check version and help
gau -h
gau -version
# Test basic functionality
gau example.com
Core Commands and Options
Abschnitt betitelt „Core Commands and Options“Basic Usage
Abschnitt betitelt „Basic Usage“| Command | Purpose | Example |
|---|---|---|
gau <domain> | Fetch all URLs for domain | gau example.com |
gau -providers | List available data sources | gau -providers |
gau -h | Show help and options | gau -h |
gau -version | Display version information | gau -version |
Provider Options
Abschnitt betitelt „Provider Options“| Option | Provider | Description |
|---|---|---|
otx | AlienVault OTX | Open threat exchange historical URLs |
wayback | Wayback Machine | Internet Archive snapshots |
commoncrawl | Common Crawl | Web crawl database |
-providers | All enabled | List active providers |
Filter Options
Abschnitt betitelt „Filter Options“| Option | Purpose | Example |
|---|---|---|
-filter | Include specific patterns | gau -filter "\.js$" |
-blacklist | Exclude patterns | gau -blacklist "\.css$" |
-o | Output file | gau -o urls.txt example.com |
-t | Timeout per request | gau -t 10 example.com |
Basic Usage Examples
Abschnitt betitelt „Basic Usage Examples“Fetch All URLs for Domain
Abschnitt betitelt „Fetch All URLs for Domain“# Get all known URLs
gau example.com
# Output shows URLs from all providers:
# https://example.com/path/to/page
# https://example.com/api/endpoint
# https://example.com/admin/panel
# ...
Output to File
Abschnitt betitelt „Output to File“# Save results to file
gau example.com -o urls.txt
# Check results
wc -l urls.txt # Count URLs
head -20 urls.txt # View first 20
List Available Providers
Abschnitt betitelt „List Available Providers“# See all data sources
gau -providers
# Output:
# otx
# wayback
# commoncrawl
Advanced Filtering
Abschnitt betitelt „Advanced Filtering“Filter by File Extension
Abschnitt betitelt „Filter by File Extension“# Find only JavaScript files
gau example.com -filter "\.js$"
# Find API endpoints
gau example.com -filter "api/v[0-9]"
# Find admin panels
gau example.com -filter "admin|control|dashboard"
Filter by Parameters
Abschnitt betitelt „Filter by Parameters“# Find URLs with specific parameters
gau example.com -filter "id=|user=|email="
# Find common vulnerability parameters
gau example.com -filter "file=|path=|url=|input="
Blacklist Unwanted Content
Abschnitt betitelt „Blacklist Unwanted Content“# Exclude CSS and images
gau example.com -blacklist "\.css$|\.png$|\.jpg$|\.gif$"
# Exclude metrics and analytics
gau example.com -blacklist "analytics|metrics|tracking"
# Exclude CDN and external resources
gau example.com -blacklist "cdn\.|static\.|resources\."
Processing Large Result Sets
Abschnitt betitelt „Processing Large Result Sets“Chain with Other Tools
Abschnitt betitelt „Chain with Other Tools“# Sort and deduplicate
gau example.com | sort -u > urls.txt
# Find unique endpoints
gau example.com | cut -d'?' -f1 | sort -u
# Count URLs
gau example.com | wc -l
Extract Parameters
Abschnitt betitelt „Extract Parameters“# Get all URLs with query parameters
gau example.com | grep "?"
# Extract parameter names
gau example.com | grep "?" | grep -o "[a-zA-Z_]*=" | sort -u
# Find potential injection points
gau example.com | grep -E "id=|search=|q=|query="
Identify Hidden Paths
Abschnitt betitelt „Identify Hidden Paths“# Find interesting paths
gau example.com | grep -E "/admin|/api|/config|/test|/backup"
# Look for backup files
gau example.com | grep -E "\.bak|\.old|\.backup|\.sql"
# Find source maps
gau example.com | grep "\.map"
Domain Reconnaissance Workflow
Abschnitt betitelt „Domain Reconnaissance Workflow“Comprehensive Domain Analysis
Abschnitt betitelt „Comprehensive Domain Analysis“# 1. Fetch all URLs
gau example.com -o example_urls.txt
# 2. Analyze results
echo "Total URLs: $(wc -l < example_urls.txt)"
echo "Unique hosts: $(cut -d'/' -f3 example_urls.txt | sort -u | wc -l)"
# 3. Extract endpoints only
cut -d'?' -f1 example_urls.txt | sort -u > endpoints.txt
# 4. Find JavaScript files
grep "\.js$" example_urls.txt > javascript.txt
# 5. Find API endpoints
grep "api" example_urls.txt > api_endpoints.txt
Multi-Domain Reconnaissance
Abschnitt betitelt „Multi-Domain Reconnaissance“# Process multiple domains
for domain in example.com other.com third.com; do
gau $domain -o ${domain}_urls.txt
done
# Combine results
cat *_urls.txt | sort -u > all_urls.txt
# Analyze combined data
echo "Total unique URLs: $(wc -l < all_urls.txt)"
Vulnerability Discovery Techniques
Abschnitt betitelt „Vulnerability Discovery Techniques“Find Potential Parameter Injection
Abschnitt betitelt „Find Potential Parameter Injection“# Search for vulnerable parameters
gau example.com | grep -iE "id=|file=|path=|url=|input=|cmd=" > injection_targets.txt
# Analyze parameter types
grep "=" example_urls.txt | cut -d'=' -f1 | rev | cut -d'?' -f1 | rev | sort | uniq -c
Identify API Endpoints
Abschnitt betitelt „Identify API Endpoints“# Find API patterns
gau example.com | grep -iE "api/v[0-9]|rest|json|graphql" > apis.txt
# Extract API routes
grep "api" example_urls.txt | cut -d'?' -f1 | sort -u
# Look for REST patterns
grep -E "/get|/post|/put|/delete|/list|/create" example_urls.txt
Locate Configuration Files
Abschnitt betitelt „Locate Configuration Files“# Find config file patterns
gau example.com | grep -iE "config|settings|\.env|\.conf|\.ini" > configs.txt
# Look for common config files
gau example.com | grep -iE "web\.config|app\.config|nginx\.conf"
JavaScript Endpoint Discovery
Abschnitt betitelt „JavaScript Endpoint Discovery“Extract Endpoints from JavaScript
Abschnitt betitelt „Extract Endpoints from JavaScript“# Fetch JavaScript files
gau example.com -filter "\.js$" -o javascript.txt
# Test each JavaScript file
for js_url in $(cat javascript.txt); do
echo "Analyzing: $js_url"
curl -s "$js_url" | grep -oE "(https?://[^\s\"']+|/[a-zA-Z0-9/_-]+)" | sort -u
done
Source Map Analysis
Abschnitt betitelt „Source Map Analysis“# Find source maps
gau example.com -filter "\.js\.map$"
# Analyze source maps for endpoints
curl -s "https://example.com/path/to/bundle.js.map" | jq '.sources[]'
Subdomain Enumeration
Abschnitt betitelt „Subdomain Enumeration“Extract Subdomains from URLs
Abschnitt betitelt „Extract Subdomains from URLs“# Get all subdomains
gau example.com | cut -d'/' -f3 | grep "\.example\.com$" | sort -u
# Count subdomains
gau example.com | cut -d'/' -f3 | grep "example\.com" | sort -u | wc -l
# Save subdomains
gau example.com | cut -d'/' -f3 | grep "example\.com" | sort -u > subdomains.txt
Performance Optimization
Abschnitt betitelt „Performance Optimization“Timeout Configuration
Abschnitt betitelt „Timeout Configuration“# Set custom timeout (seconds)
gau example.com -t 5
# Quick scan with short timeout
gau example.com -t 3
# Extended timeout for large sites
gau example.com -t 30
Limit Results
Abschnitt betitelt „Limit Results“# Take first N results
gau example.com | head -1000 > sample.txt
# Random sampling
gau example.com | shuf | head -500
Integration with Other Tools
Abschnitt betitelt „Integration with Other Tools“Chain with httpx for Live Testing
Abschnitt betitelt „Chain with httpx for Live Testing“# Find live URLs
gau example.com | httpx -status-code -o live_urls.txt
# Get status codes
gau example.com | httpx -title -status-code
Use with Nuclei for Scanning
Abschnitt betitelt „Use with Nuclei for Scanning“# Generate template input
gau example.com > endpoints.txt
# Run Nuclei scan
nuclei -l endpoints.txt -templates cves/
Combine with Aquatone for Visualization
Abschnitt betitelt „Combine with Aquatone for Visualization“# Get URLs and take screenshots
gau example.com | aquatone
# View results
open aquatone_report.html
Process with Waybackurls Alternative
Abschnitt betitelt „Process with Waybackurls Alternative“# If gau unavailable, use waybackurls
echo "example.com" | waybackurls > urls.txt
# Compare sources
comm -23 <(gau example.com | sort) <(waybackurls | sort)
Output Processing Techniques
Abschnitt betitelt „Output Processing Techniques“Clean and Normalize Output
Abschnitt betitelt „Clean and Normalize Output“# Remove duplicates and sort
gau example.com | sort -u > clean_urls.txt
# Remove query strings
gau example.com | cut -d'?' -f1 | sort -u
# Extract domains from URLs
gau example.com | cut -d'/' -f3 | sort -u
Convert to Different Formats
Abschnitt betitelt „Convert to Different Formats“# URLs to newline-separated list
gau example.com > urls.txt
# CSV format with URL and status
gau example.com | while read url; do
status=$(curl -s -o /dev/null -w "%{http_code}" "$url")
echo "$url,$status"
done > urls.csv
# JSON format
gau example.com | jq -R '{url: .}' | jq -s '.' > urls.json
Automation Scripts
Abschnitt betitelt „Automation Scripts“Batch Domain Processing
Abschnitt betitelt „Batch Domain Processing“#!/bin/bash
# Process multiple domains efficiently
DOMAINS=("example.com" "other.com" "target.com")
OUTPUT_DIR="reconnaissance"
mkdir -p "$OUTPUT_DIR"
for domain in "${DOMAINS[@]}"; do
echo "Processing $domain..."
gau "$domain" | sort -u > "$OUTPUT_DIR/${domain}_urls.txt"
# Extract statistics
total=$(wc -l < "$OUTPUT_DIR/${domain}_urls.txt")
echo "$domain: $total URLs"
done
# Combine all results
cat "$OUTPUT_DIR"/*_urls.txt | sort -u > "$OUTPUT_DIR/all_urls.txt"
Daily Reconnaissance Update
Abschnitt betitelt „Daily Reconnaissance Update“#!/bin/bash
# Schedule daily URL discovery
TARGET_DOMAIN="example.com"
OUTPUT_DIR="reconnaissance"
DATE=$(date +%Y%m%d)
# Fetch URLs
gau "$TARGET_DOMAIN" | sort -u > "$OUTPUT_DIR/${DATE}_urls.txt"
# Compare with previous
if [ -f "$OUTPUT_DIR/latest_urls.txt" ]; then
NEW_URLS=$(comm -13 <(sort "$OUTPUT_DIR/latest_urls.txt") <(sort "$OUTPUT_DIR/${DATE}_urls.txt"))
echo "New URLs found:"
echo "$NEW_URLS"
fi
# Update latest
cp "$OUTPUT_DIR/${DATE}_urls.txt" "$OUTPUT_DIR/latest_urls.txt"
Tips and Best Practices
Abschnitt betitelt „Tips and Best Practices“- Use multiple providers: Leverage all data sources for comprehensive coverage
- Filter aggressively: Reduce noise by filtering irrelevant file types early
- Archive results: Keep historical URL datasets for comparison
- Combine with active scanning: Use discovered URLs with vulnerability scanners
- Process systematically: Organize URLs by type (API, admin, static, etc.)
- Monitor changes: Track new URLs over time for emerging attack surfaces
- Respect rate limits: Use appropriate timeouts and intervals
- Verify findings: Test discovered URLs before reporting
Common Workflows
Abschnitt betitelt „Common Workflows“Quick Reconnaissance
Abschnitt betitelt „Quick Reconnaissance“# 1-minute overview of domain
gau example.com | grep -E "api|admin|config|backup" | head -20
Comprehensive Assessment
Abschnitt betitelt „Comprehensive Assessment“# Full domain analysis
gau example.com -o example_urls.txt
grep "\.js$" example_urls.txt > javascript.txt
grep "api" example_urls.txt > apis.txt
cut -d'/' -f3 example_urls.txt | sort -u > subdomains.txt
Vulnerability Research
Abschnitt betitelt „Vulnerability Research“# Find specific vulnerability indicators
gau example.com | grep -iE "cms|framework|version" > tech_indicators.txt
gau example.com | grep -E "password|secret|key|token" > sensitive.txt
Troubleshooting
Abschnitt betitelt „Troubleshooting“| Issue | Solution |
|---|---|
| No results | Verify domain exists; check network connectivity |
| Timeout errors | Increase timeout with -t flag |
| Rate limiting | Use appropriate delays between requests |
| Memory issues | Process in chunks or use filters |
| Old data | Results reflect historical snapshots |
Rate Limiting and Ethics
Abschnitt betitelt „Rate Limiting and Ethics“Responsible Usage
Abschnitt betitelt „Responsible Usage“# Respect rate limits
gau example.com -t 10 # Generous timeout
# Add delays between requests
for domain in $(cat domains.txt); do
gau "$domain"
sleep 5
done
# Only scan authorized targets
Resources
Abschnitt betitelt „Resources“- GitHub: https://github.com/lc/gau
- Wayback Machine: https://web.archive.org/
- Common Crawl: https://commoncrawl.org/
- AlienVault OTX: https://otx.alienvault.com/
Summary
Abschnitt betitelt „Summary“getallurls (gau) aggregates historical URL data from multiple authoritative sources:
- AlienVault OTX - Threat intelligence platform
- Wayback Machine - Internet Archive snapshots
- Common Crawl - Large-scale web crawl index
Key capabilities include:
- Comprehensive endpoint discovery
- Multi-source data aggregation
- Flexible filtering and processing
- Integration with security tools
- Automated reconnaissance workflows
Use gau as foundation for reconnaissance, vulnerability assessment, and security testing activities.