Skip to content

getallurls (gau)

getallurls (gau) is an OSINT command-line tool that fetches all known URLs for a given domain from multiple historical sources including AlienVault OTX, Wayback Machine, and Common Crawl. It aggregates URL intelligence to build comprehensive attack surface maps and discover hidden endpoints.

getallurls is valuable for:

  • Domain reconnaissance and endpoint discovery
  • Identifying legacy or forgotten endpoints
  • Finding parameter patterns and API endpoints
  • Vulnerability assessment and bug hunting
  • Web application penetration testing
  • OSINT and threat intelligence gathering
  • Go 1.14+ (for compilation)
  • Linux/macOS/Windows
  • Internet connectivity
  • API keys (optional for rate limit increases)
# Install with Go package manager
go install github.com/lc/gau/v2/cmd/gau@latest

# Verify installation
gau -version

# Expected output: gau version 2.x.x
# Clone and compile
git clone https://github.com/lc/gau.git
cd gau/cmd/gau
go build -o gau

# Move to PATH
sudo mv gau /usr/local/bin/
gau -version
# Run in Docker container
docker pull projectdiscovery/gau:latest
docker run projectdiscovery/gau:latest gau -h

# Create alias for convenience
alias gau='docker run projectdiscovery/gau:latest gau'
# Check version and help
gau -h
gau -version

# Test basic functionality
gau example.com
CommandPurposeExample
gau <domain>Fetch all URLs for domaingau example.com
gau -providersList available data sourcesgau -providers
gau -hShow help and optionsgau -h
gau -versionDisplay version informationgau -version
OptionProviderDescription
otxAlienVault OTXOpen threat exchange historical URLs
waybackWayback MachineInternet Archive snapshots
commoncrawlCommon CrawlWeb crawl database
-providersAll enabledList active providers
OptionPurposeExample
-filterInclude specific patternsgau -filter "\.js$"
-blacklistExclude patternsgau -blacklist "\.css$"
-oOutput filegau -o urls.txt example.com
-tTimeout per requestgau -t 10 example.com
# Get all known URLs
gau example.com

# Output shows URLs from all providers:
# https://example.com/path/to/page
# https://example.com/api/endpoint
# https://example.com/admin/panel
# ...
# Save results to file
gau example.com -o urls.txt

# Check results
wc -l urls.txt           # Count URLs
head -20 urls.txt        # View first 20
# See all data sources
gau -providers

# Output:
# otx
# wayback
# commoncrawl
# Find only JavaScript files
gau example.com -filter "\.js$"

# Find API endpoints
gau example.com -filter "api/v[0-9]"

# Find admin panels
gau example.com -filter "admin|control|dashboard"
# Find URLs with specific parameters
gau example.com -filter "id=|user=|email="

# Find common vulnerability parameters
gau example.com -filter "file=|path=|url=|input="
# Exclude CSS and images
gau example.com -blacklist "\.css$|\.png$|\.jpg$|\.gif$"

# Exclude metrics and analytics
gau example.com -blacklist "analytics|metrics|tracking"

# Exclude CDN and external resources
gau example.com -blacklist "cdn\.|static\.|resources\."
# Sort and deduplicate
gau example.com | sort -u > urls.txt

# Find unique endpoints
gau example.com | cut -d'?' -f1 | sort -u

# Count URLs
gau example.com | wc -l
# Get all URLs with query parameters
gau example.com | grep "?"

# Extract parameter names
gau example.com | grep "?" | grep -o "[a-zA-Z_]*=" | sort -u

# Find potential injection points
gau example.com | grep -E "id=|search=|q=|query="
# Find interesting paths
gau example.com | grep -E "/admin|/api|/config|/test|/backup"

# Look for backup files
gau example.com | grep -E "\.bak|\.old|\.backup|\.sql"

# Find source maps
gau example.com | grep "\.map"
# 1. Fetch all URLs
gau example.com -o example_urls.txt

# 2. Analyze results
echo "Total URLs: $(wc -l < example_urls.txt)"
echo "Unique hosts: $(cut -d'/' -f3 example_urls.txt | sort -u | wc -l)"

# 3. Extract endpoints only
cut -d'?' -f1 example_urls.txt | sort -u > endpoints.txt

# 4. Find JavaScript files
grep "\.js$" example_urls.txt > javascript.txt

# 5. Find API endpoints
grep "api" example_urls.txt > api_endpoints.txt
# Process multiple domains
for domain in example.com other.com third.com; do
  gau $domain -o ${domain}_urls.txt
done

# Combine results
cat *_urls.txt | sort -u > all_urls.txt

# Analyze combined data
echo "Total unique URLs: $(wc -l < all_urls.txt)"
# Search for vulnerable parameters
gau example.com | grep -iE "id=|file=|path=|url=|input=|cmd=" > injection_targets.txt

# Analyze parameter types
grep "=" example_urls.txt | cut -d'=' -f1 | rev | cut -d'?' -f1 | rev | sort | uniq -c
# Find API patterns
gau example.com | grep -iE "api/v[0-9]|rest|json|graphql" > apis.txt

# Extract API routes
grep "api" example_urls.txt | cut -d'?' -f1 | sort -u

# Look for REST patterns
grep -E "/get|/post|/put|/delete|/list|/create" example_urls.txt
# Find config file patterns
gau example.com | grep -iE "config|settings|\.env|\.conf|\.ini" > configs.txt

# Look for common config files
gau example.com | grep -iE "web\.config|app\.config|nginx\.conf"
# Fetch JavaScript files
gau example.com -filter "\.js$" -o javascript.txt

# Test each JavaScript file
for js_url in $(cat javascript.txt); do
  echo "Analyzing: $js_url"
  curl -s "$js_url" | grep -oE "(https?://[^\s\"']+|/[a-zA-Z0-9/_-]+)" | sort -u
done
# Find source maps
gau example.com -filter "\.js\.map$"

# Analyze source maps for endpoints
curl -s "https://example.com/path/to/bundle.js.map" | jq '.sources[]'
# Get all subdomains
gau example.com | cut -d'/' -f3 | grep "\.example\.com$" | sort -u

# Count subdomains
gau example.com | cut -d'/' -f3 | grep "example\.com" | sort -u | wc -l

# Save subdomains
gau example.com | cut -d'/' -f3 | grep "example\.com" | sort -u > subdomains.txt
# Set custom timeout (seconds)
gau example.com -t 5

# Quick scan with short timeout
gau example.com -t 3

# Extended timeout for large sites
gau example.com -t 30
# Take first N results
gau example.com | head -1000 > sample.txt

# Random sampling
gau example.com | shuf | head -500
# Find live URLs
gau example.com | httpx -status-code -o live_urls.txt

# Get status codes
gau example.com | httpx -title -status-code
# Generate template input
gau example.com > endpoints.txt

# Run Nuclei scan
nuclei -l endpoints.txt -templates cves/
# Get URLs and take screenshots
gau example.com | aquatone

# View results
open aquatone_report.html
# If gau unavailable, use waybackurls
echo "example.com" | waybackurls > urls.txt

# Compare sources
comm -23 <(gau example.com | sort) <(waybackurls | sort)
# Remove duplicates and sort
gau example.com | sort -u > clean_urls.txt

# Remove query strings
gau example.com | cut -d'?' -f1 | sort -u

# Extract domains from URLs
gau example.com | cut -d'/' -f3 | sort -u
# URLs to newline-separated list
gau example.com > urls.txt

# CSV format with URL and status
gau example.com | while read url; do
  status=$(curl -s -o /dev/null -w "%{http_code}" "$url")
  echo "$url,$status"
done > urls.csv

# JSON format
gau example.com | jq -R '{url: .}' | jq -s '.' > urls.json
#!/bin/bash
# Process multiple domains efficiently

DOMAINS=("example.com" "other.com" "target.com")
OUTPUT_DIR="reconnaissance"

mkdir -p "$OUTPUT_DIR"

for domain in "${DOMAINS[@]}"; do
  echo "Processing $domain..."
  gau "$domain" | sort -u > "$OUTPUT_DIR/${domain}_urls.txt"
  
  # Extract statistics
  total=$(wc -l < "$OUTPUT_DIR/${domain}_urls.txt")
  echo "$domain: $total URLs"
done

# Combine all results
cat "$OUTPUT_DIR"/*_urls.txt | sort -u > "$OUTPUT_DIR/all_urls.txt"
#!/bin/bash
# Schedule daily URL discovery

TARGET_DOMAIN="example.com"
OUTPUT_DIR="reconnaissance"
DATE=$(date +%Y%m%d)

# Fetch URLs
gau "$TARGET_DOMAIN" | sort -u > "$OUTPUT_DIR/${DATE}_urls.txt"

# Compare with previous
if [ -f "$OUTPUT_DIR/latest_urls.txt" ]; then
  NEW_URLS=$(comm -13 <(sort "$OUTPUT_DIR/latest_urls.txt") <(sort "$OUTPUT_DIR/${DATE}_urls.txt"))
  echo "New URLs found:"
  echo "$NEW_URLS"
fi

# Update latest
cp "$OUTPUT_DIR/${DATE}_urls.txt" "$OUTPUT_DIR/latest_urls.txt"
  • Use multiple providers: Leverage all data sources for comprehensive coverage
  • Filter aggressively: Reduce noise by filtering irrelevant file types early
  • Archive results: Keep historical URL datasets for comparison
  • Combine with active scanning: Use discovered URLs with vulnerability scanners
  • Process systematically: Organize URLs by type (API, admin, static, etc.)
  • Monitor changes: Track new URLs over time for emerging attack surfaces
  • Respect rate limits: Use appropriate timeouts and intervals
  • Verify findings: Test discovered URLs before reporting
# 1-minute overview of domain
gau example.com | grep -E "api|admin|config|backup" | head -20
# Full domain analysis
gau example.com -o example_urls.txt
grep "\.js$" example_urls.txt > javascript.txt
grep "api" example_urls.txt > apis.txt
cut -d'/' -f3 example_urls.txt | sort -u > subdomains.txt
# Find specific vulnerability indicators
gau example.com | grep -iE "cms|framework|version" > tech_indicators.txt
gau example.com | grep -E "password|secret|key|token" > sensitive.txt
IssueSolution
No resultsVerify domain exists; check network connectivity
Timeout errorsIncrease timeout with -t flag
Rate limitingUse appropriate delays between requests
Memory issuesProcess in chunks or use filters
Old dataResults reflect historical snapshots
# Respect rate limits
gau example.com -t 10        # Generous timeout

# Add delays between requests
for domain in $(cat domains.txt); do
  gau "$domain"
  sleep 5
done

# Only scan authorized targets

getallurls (gau) aggregates historical URL data from multiple authoritative sources:

  1. AlienVault OTX - Threat intelligence platform
  2. Wayback Machine - Internet Archive snapshots
  3. Common Crawl - Large-scale web crawl index

Key capabilities include:

  • Comprehensive endpoint discovery
  • Multi-source data aggregation
  • Flexible filtering and processing
  • Integration with security tools
  • Automated reconnaissance workflows

Use gau as foundation for reconnaissance, vulnerability assessment, and security testing activities.