Vai al contenuto

FRONTMATTER_89_# Katana Web Crawler Cheat Sheet

Panoramica

Katana è un framework web crawling veloce e personalizzabile sviluppato da Project Discovery. È progettato per strisciare i siti web in modo efficiente per raccogliere informazioni e scoprire endpoint. Katana si distingue dagli altri web crawler grazie alla sua velocità, flessibilità e attenzione ai casi di utilizzo dei test di sicurezza.

Ciò che rende Katana unica è la sua capacità di strisciare intelligentemente applicazioni web moderne, tra cui applicazioni di singola pagina (SPA) che si basano pesantemente su JavaScript. Può gestire tecnologie web complesse ed estrarre informazioni preziose come URL, file JavaScript, endpoint API e altri beni web. Katana è costruito con i professionisti della sicurezza in mente, rendendolo uno strumento eccellente per la ricognizione durante le valutazioni di sicurezza e la caccia di taglie di bug.

Katana supporta varie strategie di strisciamento, tra cui strisciamento standard, JavaScript parsing, e sitemap-based crawling. Può essere personalizzato per concentrarsi su specifici tipi di risorse o seguire particolari modelli, rendendolo adattabile a diversi scenari di test di sicurezza. Lo strumento è progettato per essere facilmente integrato nei flussi di lavoro di test di sicurezza e può essere combinato con altri strumenti Project Discovery per una ricognizione completa.

Installazione

Using Go

# Install using Go (requires Go 1.20 or later)
go install -v github.com/projectdiscovery/katana/cmd/katana@latest

# Verify installation
katana -version

Using Docker

# Pull the latest Docker image
docker pull projectdiscovery/katana:latest

# Run Katana using Docker
docker run -it projectdiscovery/katana:latest -h

Using Homebrew (macOS)

# Install using Homebrew
brew install katana

# Verify installation
katana -version

Using PDTM (Project Discovery Tools Manager)

# Install PDTM first if not already installed
go install -v github.com/projectdiscovery/pdtm/cmd/pdtm@latest

# Install Katana using PDTM
pdtm -i katana

# Verify installation
katana -version

On Kali Linux

# Install using apt
sudo apt install katana

# Verify installation
katana -version

Uso di base

Crawling a Single URL

# Crawl a single URL
katana -u https://example.com

# Crawl with increased verbosity
katana -u https://example.com -v

# Crawl with debug information
katana -u https://example.com -debug

Crawling Multiple URLs

# Crawl multiple URLs
katana -u https://example.com,https://test.com

# Crawl from a list of URLs
katana -list urls.txt

# Crawl from STDIN
cat urls.txt|katana

Opzioni di uscita

# Save results to a file
katana -u https://example.com -o results.txt

# Output in JSON format
katana -u https://example.com -json -o results.json

# Silent mode (only URLs)
katana -u https://example.com -silent

Crawling Opzioni

Crawling Depth and Scope

# Set crawling depth (default: 2)
katana -u https://example.com -depth 3

# Crawl subdomains (default: false)
katana -u https://example.com -crawl-scope subs

# Crawl out of scope (default: false)
katana -u https://example.com -crawl-scope out-of-scope

# Crawl only in scope
katana -u https://example.com -crawl-scope strict

Crawling Strategies

# Use standard crawler
katana -u https://example.com -crawler standard

# Use JavaScript parser
katana -u https://example.com -crawler js

# Use sitemap-based crawler
katana -u https://example.com -crawler sitemap

# Use robots.txt-based crawler
katana -u https://example.com -crawler robots

# Use all crawlers
katana -u https://example.com -crawler standard,js,sitemap,robots

Selezione del campo

# Display specific fields
katana -u https://example.com -field url,path,method

# Available fields: url, path, method, host, fqdn, scheme, port, query, fragment, endpoint

Uso avanzato

URL Filtering

# Match URLs by regex
katana -u https://example.com -match-regex "admin|login|dashboard"

# Filter URLs by regex
katana -u https://example.com -filter-regex "logout|static|images"

# Match URLs by condition
katana -u https://example.com -field url -match-condition "contains('admin')"

Resource Filtering

# Include specific file extensions
katana -u https://example.com -extension js,php,aspx

# Exclude specific file extensions
katana -u https://example.com -exclude-extension png,jpg,gif

# Include specific MIME types
katana -u https://example.com -mime-type application/json,text/html

Form Filling

# Enable automatic form filling
katana -u https://example.com -form-fill

# Use custom form values
katana -u https://example.com -form-fill -field-name "username=admin&password=admin"

JavaScript Parsing

# Enable JavaScript parsing
katana -u https://example.com -js-crawl

# Set headless browser timeout
katana -u https://example.com -js-crawl -headless-timeout 20

# Set browser path
katana -u https://example.com -js-crawl -chrome-path /path/to/chrome

Ottimizzazione delle prestazioni

Concurrency and Rate Limiting

# Set concurrency (default: 10)
katana -u https://example.com -concurrency 20

# Set delay between requests (milliseconds)
katana -u https://example.com -delay 100

# Set rate limit (requests per second)
katana -u https://example.com -rate-limit 50

Opzioni timeout

# Set timeout for HTTP requests (seconds)
katana -u https://example.com -timeout 10

# Set timeout for headless browser (seconds)
katana -u https://example.com -js-crawl -headless-timeout 30

Ottimizzazione per grandi scansioni

# Disable automatic form filling for faster crawling
katana -u https://example.com -no-form-fill

# Disable JavaScript parsing for faster crawling
katana -u https://example.com -no-js-crawl

# Limit maximum URLs to crawl
katana -u https://example.com -max-urls 1000

Integrazione con altri strumenti

Pipeline with Subfinder

# Find subdomains and crawl them
subfinder -d example.com -silent|katana -silent

# Find subdomains, crawl them, and extract JavaScript files
subfinder -d example.com -silent|katana -silent -extension js

Pipeline con HTTPX

# Probe URLs and crawl active ones
httpx -l urls.txt -silent|katana -silent

# Crawl and then probe discovered endpoints
katana -u https://example.com -silent|httpx -silent

Pipeline with Nuclei

# Crawl and scan for vulnerabilities
katana -u https://example.com -silent|nuclei -t cves/

# Crawl, extract JavaScript files, and scan for vulnerabilities
katana -u https://example.com -silent -extension js|nuclei -t exposures/

Personalizzazione dell'uscita

Formato di uscita personalizzato

# Output only URLs
katana -u https://example.com -silent

# Output URLs with specific fields
katana -u https://example.com -field url,path,method -o results.txt

# Count discovered URLs
katana -u https://example.com -silent|wc -l

# Sort output alphabetically
katana -u https://example.com -silent|sort

Filtro dell'uscita

# Filter by file extension
katana -u https://example.com -silent|grep "\.js$"

# Filter by endpoint pattern
katana -u https://example.com -silent|grep "/api/"

# Find unique domains
katana -u https://example.com -silent|awk -F/ '\\\\{print $3\\\\}'|sort -u

Filtro avanzato

URL Pattern Matching

# Match specific URL patterns
katana -u https://example.com -match-regex "^https://example.com/admin"

# Filter out specific URL patterns
katana -u https://example.com -filter-regex "^https://example.com/static"

# Match URLs containing specific query parameters
katana -u https://example.com -match-regex "id=[0-9]+"

Contenuti Filtering

# Match responses containing specific content
katana -u https://example.com -match-condition "contains(body, 'admin')"

# Filter responses by status code
katana -u https://example.com -match-condition "status == 200"

# Match responses by content type
katana -u https://example.com -match-condition "contains(content_type, 'application/json')"

Opzioni proxy e di rete

# Use HTTP proxy
katana -u https://example.com -proxy http://127.0.0.1:8080

# Use SOCKS5 proxy
katana -u https://example.com -proxy socks5://127.0.0.1:1080

# Set custom headers
katana -u https://example.com -header "User-Agent: Mozilla/5.0" -header "Cookie: session=123456"

# Set custom cookies
katana -u https://example.com -cookie "session=123456; user=admin"

Miscellaneous # Caratteristiche

Riempimento automatico

# Enable automatic form filling
katana -u https://example.com -form-fill

# Set custom form values
katana -u https://example.com -form-fill -field-name "username=admin&password=admin"

Crawling Paths Specific Paths

# Crawl specific paths
katana -u https://example.com -paths /admin,/login,/dashboard

# Crawl from a file containing paths
katana -u https://example.com -paths-file paths.txt

Storing Responses

# Store all responses
katana -u https://example.com -store-response

# Specify response storage directory
katana -u https://example.com -store-response -store-response-dir responses/

Risoluzione dei problemi

Questioni comuni

  1. JavaScript Parsing Issues Traduzione:

  2. ** Limitare il bersaglio* * Traduzione:

  3. ** Problemi di memoria ** Traduzione:

  4. Revisione: Traduzione:

Debugging

# Enable verbose mode
katana -u https://example.com -v

# Show debug information
katana -u https://example.com -debug

# Show request and response details
katana -u https://example.com -debug -show-request -show-response

Configurazione

Configuration File

Katana utilizza un file di configurazione situato in $HOME/.config/katana/config.yaml. È possibile personalizzare varie impostazioni in questo file:

# Example configuration file
concurrency: 10
delay: 100
timeout: 10
max-depth: 3
crawl-scope: strict
crawl-duration: 0
field: url,path,method
extensions: js,php,aspx

Variabili ambientali

# Set Katana configuration via environment variables
export KATANA_CONCURRENCY=10
export KATANA_DELAY=100
export KATANA_TIMEOUT=10
export KATANA_MAX_DEPTH=3

Riferimento

Opzioni di riga di comando

Tabella_90_

Crawling Scopes

Tabella_91

Crawler Types

Tabella_92_

Opzioni di campo

Tabella_93__

Risorse


*Questo foglio di scacchi fornisce un riferimento completo per l'utilizzo di Katana, dalla scansione di base al filtraggio avanzato e l'integrazione con altri strumenti. Per le informazioni più aggiornate, consultare sempre la documentazione ufficiale. *