FRONTMATTER_89_# Katana Web Crawler Cheat Sheet¶
Panoramica¶
Katana è un framework web crawling veloce e personalizzabile sviluppato da Project Discovery. È progettato per strisciare i siti web in modo efficiente per raccogliere informazioni e scoprire endpoint. Katana si distingue dagli altri web crawler grazie alla sua velocità, flessibilità e attenzione ai casi di utilizzo dei test di sicurezza.
Ciò che rende Katana unica è la sua capacità di strisciare intelligentemente applicazioni web moderne, tra cui applicazioni di singola pagina (SPA) che si basano pesantemente su JavaScript. Può gestire tecnologie web complesse ed estrarre informazioni preziose come URL, file JavaScript, endpoint API e altri beni web. Katana è costruito con i professionisti della sicurezza in mente, rendendolo uno strumento eccellente per la ricognizione durante le valutazioni di sicurezza e la caccia di taglie di bug.
Katana supporta varie strategie di strisciamento, tra cui strisciamento standard, JavaScript parsing, e sitemap-based crawling. Può essere personalizzato per concentrarsi su specifici tipi di risorse o seguire particolari modelli, rendendolo adattabile a diversi scenari di test di sicurezza. Lo strumento è progettato per essere facilmente integrato nei flussi di lavoro di test di sicurezza e può essere combinato con altri strumenti Project Discovery per una ricognizione completa.
Installazione¶
Using Go¶
# Install using Go (requires Go 1.20 or later)
go install -v github.com/projectdiscovery/katana/cmd/katana@latest
# Verify installation
katana -version
Using Docker¶
# Pull the latest Docker image
docker pull projectdiscovery/katana:latest
# Run Katana using Docker
docker run -it projectdiscovery/katana:latest -h
Using Homebrew (macOS)¶
Using PDTM (Project Discovery Tools Manager)¶
# Install PDTM first if not already installed
go install -v github.com/projectdiscovery/pdtm/cmd/pdtm@latest
# Install Katana using PDTM
pdtm -i katana
# Verify installation
katana -version
On Kali Linux¶
Uso di base¶
Crawling a Single URL¶
# Crawl a single URL
katana -u https://example.com
# Crawl with increased verbosity
katana -u https://example.com -v
# Crawl with debug information
katana -u https://example.com -debug
Crawling Multiple URLs¶
# Crawl multiple URLs
katana -u https://example.com,https://test.com
# Crawl from a list of URLs
katana -list urls.txt
# Crawl from STDIN
cat urls.txt|katana
Opzioni di uscita¶
# Save results to a file
katana -u https://example.com -o results.txt
# Output in JSON format
katana -u https://example.com -json -o results.json
# Silent mode (only URLs)
katana -u https://example.com -silent
Crawling Opzioni¶
Crawling Depth and Scope¶
# Set crawling depth (default: 2)
katana -u https://example.com -depth 3
# Crawl subdomains (default: false)
katana -u https://example.com -crawl-scope subs
# Crawl out of scope (default: false)
katana -u https://example.com -crawl-scope out-of-scope
# Crawl only in scope
katana -u https://example.com -crawl-scope strict
Crawling Strategies¶
# Use standard crawler
katana -u https://example.com -crawler standard
# Use JavaScript parser
katana -u https://example.com -crawler js
# Use sitemap-based crawler
katana -u https://example.com -crawler sitemap
# Use robots.txt-based crawler
katana -u https://example.com -crawler robots
# Use all crawlers
katana -u https://example.com -crawler standard,js,sitemap,robots
Selezione del campo¶
# Display specific fields
katana -u https://example.com -field url,path,method
# Available fields: url, path, method, host, fqdn, scheme, port, query, fragment, endpoint
Uso avanzato¶
URL Filtering¶
# Match URLs by regex
katana -u https://example.com -match-regex "admin|login|dashboard"
# Filter URLs by regex
katana -u https://example.com -filter-regex "logout|static|images"
# Match URLs by condition
katana -u https://example.com -field url -match-condition "contains('admin')"
Resource Filtering¶
# Include specific file extensions
katana -u https://example.com -extension js,php,aspx
# Exclude specific file extensions
katana -u https://example.com -exclude-extension png,jpg,gif
# Include specific MIME types
katana -u https://example.com -mime-type application/json,text/html
Form Filling¶
# Enable automatic form filling
katana -u https://example.com -form-fill
# Use custom form values
katana -u https://example.com -form-fill -field-name "username=admin&password=admin"
JavaScript Parsing¶
# Enable JavaScript parsing
katana -u https://example.com -js-crawl
# Set headless browser timeout
katana -u https://example.com -js-crawl -headless-timeout 20
# Set browser path
katana -u https://example.com -js-crawl -chrome-path /path/to/chrome
Ottimizzazione delle prestazioni¶
Concurrency and Rate Limiting¶
# Set concurrency (default: 10)
katana -u https://example.com -concurrency 20
# Set delay between requests (milliseconds)
katana -u https://example.com -delay 100
# Set rate limit (requests per second)
katana -u https://example.com -rate-limit 50
Opzioni timeout¶
# Set timeout for HTTP requests (seconds)
katana -u https://example.com -timeout 10
# Set timeout for headless browser (seconds)
katana -u https://example.com -js-crawl -headless-timeout 30
Ottimizzazione per grandi scansioni¶
# Disable automatic form filling for faster crawling
katana -u https://example.com -no-form-fill
# Disable JavaScript parsing for faster crawling
katana -u https://example.com -no-js-crawl
# Limit maximum URLs to crawl
katana -u https://example.com -max-urls 1000
Integrazione con altri strumenti¶
Pipeline with Subfinder¶
# Find subdomains and crawl them
subfinder -d example.com -silent|katana -silent
# Find subdomains, crawl them, and extract JavaScript files
subfinder -d example.com -silent|katana -silent -extension js
Pipeline con HTTPX¶
# Probe URLs and crawl active ones
httpx -l urls.txt -silent|katana -silent
# Crawl and then probe discovered endpoints
katana -u https://example.com -silent|httpx -silent
Pipeline with Nuclei¶
# Crawl and scan for vulnerabilities
katana -u https://example.com -silent|nuclei -t cves/
# Crawl, extract JavaScript files, and scan for vulnerabilities
katana -u https://example.com -silent -extension js|nuclei -t exposures/
Personalizzazione dell'uscita¶
Formato di uscita personalizzato¶
# Output only URLs
katana -u https://example.com -silent
# Output URLs with specific fields
katana -u https://example.com -field url,path,method -o results.txt
# Count discovered URLs
katana -u https://example.com -silent|wc -l
# Sort output alphabetically
katana -u https://example.com -silent|sort
Filtro dell'uscita¶
# Filter by file extension
katana -u https://example.com -silent|grep "\.js$"
# Filter by endpoint pattern
katana -u https://example.com -silent|grep "/api/"
# Find unique domains
katana -u https://example.com -silent|awk -F/ '\\\\{print $3\\\\}'|sort -u
Filtro avanzato¶
URL Pattern Matching¶
# Match specific URL patterns
katana -u https://example.com -match-regex "^https://example.com/admin"
# Filter out specific URL patterns
katana -u https://example.com -filter-regex "^https://example.com/static"
# Match URLs containing specific query parameters
katana -u https://example.com -match-regex "id=[0-9]+"
Contenuti Filtering¶
# Match responses containing specific content
katana -u https://example.com -match-condition "contains(body, 'admin')"
# Filter responses by status code
katana -u https://example.com -match-condition "status == 200"
# Match responses by content type
katana -u https://example.com -match-condition "contains(content_type, 'application/json')"
Opzioni proxy e di rete¶
# Use HTTP proxy
katana -u https://example.com -proxy http://127.0.0.1:8080
# Use SOCKS5 proxy
katana -u https://example.com -proxy socks5://127.0.0.1:1080
# Set custom headers
katana -u https://example.com -header "User-Agent: Mozilla/5.0" -header "Cookie: session=123456"
# Set custom cookies
katana -u https://example.com -cookie "session=123456; user=admin"
Miscellaneous # Caratteristiche¶
Riempimento automatico¶
# Enable automatic form filling
katana -u https://example.com -form-fill
# Set custom form values
katana -u https://example.com -form-fill -field-name "username=admin&password=admin"
Crawling Paths Specific Paths¶
# Crawl specific paths
katana -u https://example.com -paths /admin,/login,/dashboard
# Crawl from a file containing paths
katana -u https://example.com -paths-file paths.txt
Storing Responses¶
# Store all responses
katana -u https://example.com -store-response
# Specify response storage directory
katana -u https://example.com -store-response -store-response-dir responses/
Risoluzione dei problemi¶
Questioni comuni¶
-
JavaScript Parsing Issues Traduzione:
-
** Limitare il bersaglio* * Traduzione:
-
** Problemi di memoria ** Traduzione:
-
Revisione: Traduzione:
Debugging¶
# Enable verbose mode
katana -u https://example.com -v
# Show debug information
katana -u https://example.com -debug
# Show request and response details
katana -u https://example.com -debug -show-request -show-response
Configurazione¶
Configuration File¶
Katana utilizza un file di configurazione situato in $HOME/.config/katana/config.yaml. È possibile personalizzare varie impostazioni in questo file:
# Example configuration file
concurrency: 10
delay: 100
timeout: 10
max-depth: 3
crawl-scope: strict
crawl-duration: 0
field: url,path,method
extensions: js,php,aspx
Variabili ambientali¶
# Set Katana configuration via environment variables
export KATANA_CONCURRENCY=10
export KATANA_DELAY=100
export KATANA_TIMEOUT=10
export KATANA_MAX_DEPTH=3
Riferimento¶
Opzioni di riga di comando¶
Tabella_90_
Crawling Scopes¶
Tabella_91
Crawler Types¶
Tabella_92_
Opzioni di campo¶
Tabella_93__
Risorse¶
- [Documentazione ufficiale](URL_86__ GitHub Repository
- Progetto Discovery Discord
*Questo foglio di scacchi fornisce un riferimento completo per l'utilizzo di Katana, dalla scansione di base al filtraggio avanzato e l'integrazione con altri strumenti. Per le informazioni più aggiornate, consultare sempre la documentazione ufficiale. *