Skip to content

HTTrack

HTTrack Website Copier is a free, portable utility that downloads entire websites to your computer, creating a complete offline mirror. It’s invaluable during security assessments for analyzing web applications, discovering hidden directories and files, identifying server configurations, and understanding application architecture. HTTrack is available in Kali Linux and supports multiple platforms.

The tool handles cookies, authentication, JavaScript execution, and can filter content by MIME type, URL patterns, and file extensions. It’s particularly useful for analyzing large web applications and discovering security misconfigurations.

# Kali Linux (pre-installed)
httrack --version

# Debian/Ubuntu
sudo apt-get install httrack webhttrack

# macOS
brew install httrack

# From source
git clone https://github.com/xroche/httrack
cd httrack
./configure && make
sudo make install
httrack <options> <url> [<url2> ...] [-O <folder>]
CommandDescription
httrack http://example.comMirror entire website locally
httrack https://example.com -O ./mirrorSave to custom directory
httrack http://example.com/pathMirror specific path only
httrack --helpDisplay help information
httrack --versionShow version information
# Mirror a simple website
httrack http://example.com

# Mirror with custom output directory
httrack http://example.com -O ./website_mirror

# Mirror HTTPS website
httrack https://example.com -O ./secure_mirror

# Mirror multiple URLs
httrack http://example.com http://subdomain.example.com -O ./multi_mirror
OptionDescriptionExample
-rSet recursion depthhttrack -r 5 http://example.com
-mMaximum file size (KB)httrack -m 50000 http://example.com
-cNumber of simultaneous connectionshttrack -c 8 http://example.com
-eExecution level (0=none, 1=JS, etc.)httrack -e 1 http://example.com
OptionDescriptionExample
-AAccept MIME typeshttrack -A text/html,text/css
-RReject MIME typeshttrack -R .exe,.zip,.iso
-%FFollow FTP linkshttrack -%F http://example.com
--spiderSpider mode (no download)httrack --spider http://example.com
OptionDescriptionExample
-NNever overwrite existing fileshttrack -N http://example.com
-nMaximum files to downloadhttrack -n 10000 http://example.com
-TConnection timeout (seconds)httrack -T 60 http://example.com
-IIdentify as browser/bothttrack -I http://example.com
# Provide authentication
httrack http://user:password@example.com -O ./authenticated

# Cookie handling
httrack http://example.com -%c -O ./with_cookies

# Custom User-Agent
httrack http://example.com -u "Mozilla/5.0" -O ./custom_ua
# Include only specific paths
httrack http://example.com/app/* http://example.com/api/* -O ./filtered

# Exclude specific paths
httrack http://example.com -* +*.jpg +*.png -O ./images_only

# Mirror external links (within domain)
httrack http://example.com -%e -O ./external

# Mirror subdomains
httrack http://example.com http://*.example.com -O ./subdomains
# Deep recursive mirror (level 10)
httrack -r 10 http://example.com -O ./deep_mirror

# Large file limits
httrack -m 500000 http://example.com -O ./large_files

# Multiple connections (faster)
httrack -c 16 http://example.com -O ./fast

# Disable Java, Flash, etc.
httrack -* +*.html +*.htm +*.css +*.js +*.jpg +*.gif +*.png http://example.com
# Mirror target application
httrack https://target-app.com -r 8 -O ./target_mirror

# Analyze directory structure
find ./target_mirror -type d | head -20

# Identify file types
find ./target_mirror -type f | sed 's/.*\.//' | sort | uniq -c

# Extract all URLs
grep -roP 'href="[^"]*"' ./target_mirror/html | cut -d'"' -f2 | sort -u
# Mirror API documentation
httrack https://api.example.com -r 6 -O ./api_mirror

# Extract API endpoints
grep -roP '/(api|v[0-9]+)/?[a-zA-Z0-9/_-]*' ./api_mirror/html | sort -u

# Find parameter patterns
grep -roP '\?[a-zA-Z0-9_&=]*' ./api_mirror/html | sort -u
# Mirror entire site
httrack http://example.com -r 10 -O ./full_mirror

# Search for config files
find ./full_mirror -name "*.conf" -o -name "*.config" -o -name "*.json" -o -name "*.xml"

# Look for hardcoded credentials
grep -r "password\|apikey\|token\|secret" ./full_mirror/html

# Extract JavaScript for analysis
find ./full_mirror -name "*.js" -type f | head -20
# Mirror simple website with default settings
httrack http://example.com -O ./mirror_$(date +%Y%m%d)

# Navigate to results
cd mirror_example.com/
ls -la

# Open in browser
firefox index.html
# Mirror with aggressive settings for app discovery
httrack \
  -r 10 \
  -m 100000 \
  -c 16 \
  -T 60 \
  http://target.local:8080 \
  -O ./deep_analysis

# Search for interesting files
find ./deep_analysis -type f \( \
  -name "*.js" -o \
  -name "*.json" -o \
  -name "*.xml" -o \
  -name "*.config" \
\) | wc -l
# Mirror API with specific patterns
httrack \
  -r 6 \
  "https://api.example.com/*" \
  "https://api.example.com/v1/*" \
  "https://api.example.com/v2/*" \
  -O ./api_analysis

# Extract endpoints
grep -roP '/(api|v[0-9]+)/[a-zA-Z0-9/_-]*' ./api_analysis/html | sort -u > endpoints.txt

# Count discovered endpoints
wc -l endpoints.txt
# Mirror only JavaScript and HTML files
httrack \
  -r 8 \
  -* \
  +*.html \
  +*.htm \
  +*.js \
  http://example.com \
  -O ./js_analysis

# Analyze JavaScript sizes
du -h ./js_analysis/html -s
find ./js_analysis -name "*.js" | wc -l
mirror_example.com/
├── index.html                 # Site homepage
├── hts-cache/                # HTTrack cache files
│   ├── new.txt              # Newly discovered URLs
│   ├── seen.txt             # Already processed URLs
│   └── cache.txt            # Cache information
├── backblue.gif
├── cookies.txt              # Saved cookies
└── html/
    ├── example.com/
    │   ├── index.html
    │   ├── about/
    │   ├── contact/
    │   └── ...
# Count total files downloaded
find ./mirror -type f | wc -l

# Find all JavaScript files
find ./mirror -name "*.js" | wc -l

# List largest files
du -h ./mirror -S | sort -rh | head -20

# Extract all URLs from HTML
grep -roP '(href|src|action)="[^"]*"' ./mirror -h | cut -d'"' -f2 | sort -u

# Find commented-out code
grep -r "<!--" ./mirror/html | head -20

# Search for API endpoints in JS
grep -r "fetch\|XMLHttpRequest\|axios\|jQuery.ajax" ./mirror -h | head -20
# Launch graphical interface
webhttrack

# Or via command line
webhttrack http://example.com
  • Point-and-click URL configuration
  • Visual progress monitoring
  • Pause/resume capability
  • Browser-based interface
  • Project management and history
# Start web interface (port 8080)
webhttrack

# Access at http://localhost:8080
# Configure URLs, options, and monitor progress
# Faster download with more connections
httrack -c 16 http://example.com -O ./fast_mirror

# For very large sites with files
httrack -c 32 -r 10 -m 200000 http://large-site.com -O ./large_mirror
# Limit bandwidth (1MB/s)
httrack --max-rate 1000 http://example.com

# Smaller timeout for unresponsive servers
httrack -T 30 http://slow-server.com
IssueSolution
Connection refusedCheck URL, firewall, or proxy settings
Incomplete mirrorIncrease recursion depth with -r
Large downloadsSet size limit with -m or use file filters
Authentication failedProvide credentials in URL: http://user:pass@host
JavaScript not executedEnable with -e 1 flag
Timeout errorsIncrease timeout: -T 120
# Comprehensive mirror for security analysis
httrack \
  -r 10 \
  -m 100000 \
  -c 16 \
  -T 60 \
  -u "Mozilla/5.0 (Windows)" \
  https://target.com \
  -O ./security_assessment

# Archive the mirror
tar -czf target_mirror_$(date +%Y%m%d_%H%M%S).tar.gz ./security_assessment

# Create inventory
find ./security_assessment -type f > inventory.txt
wc -l inventory.txt
# Mirror current version
httrack https://target.com -O ./version_current

# Later, compare with previous
diff -r ./version_previous ./version_current > changes.diff

# Or use find to identify new files
find ./version_current -newer ./version_previous -type f
  • Authorization: Only mirror websites you own or have explicit written permission to test
  • Robots.txt Compliance: HTTrack respects robots.txt by default; override with care
  • Rate Limiting: Use appropriate concurrency settings to avoid DoS-like behavior
  • Copyright: Respect copyright laws; use mirrors for authorized security testing only
  • Confidentiality: Protect downloaded content containing sensitive information
  • wget: Command-line download utility
  • curl: HTTP client for single-file downloads
  • Burp Suite: Professional web application security testing
  • OWASP ZAP: Free automated web security scanning
  • curl: HTTP client for detailed analysis
  • grep/find: Content analysis and file discovery