Katana Web Crawler Cheat Sheet
Sinopsis
Katana es un marco de rastreo web rápido y personalizable desarrollado por Project Discovery. Está diseñado para rastrear sitios web de manera eficiente para reunir información y descubrir puntos finales. Katana destaca de otros rastreadores web debido a su velocidad, flexibilidad y enfoque en casos de uso de pruebas de seguridad.
Lo que hace que Katana sea única es su capacidad de rastrear inteligentemente las aplicaciones web modernas, incluyendo aplicaciones de una sola página (SPAs) que dependen en gran medida de JavaScript. Puede manejar tecnologías web complejas y extraer información valiosa como URLs, archivos JavaScript, puntos finales de API y otros activos web. Katana está construida con profesionales de seguridad en mente, por lo que es una excelente herramienta para el reconocimiento durante las evaluaciones de seguridad y la caza de botín.
Katana apoya varias estrategias de rastreo, incluyendo rastreo estándar, pareado de JavaScript y rastreo basado en mapas de sitio. Se puede personalizar para centrarse en tipos específicos de recursos o seguir patrones particulares, haciendo que sea adaptable a diferentes escenarios de pruebas de seguridad. La herramienta está diseñada para integrarse fácilmente en los flujos de trabajo de pruebas de seguridad y se puede combinar con otras herramientas de Project Discovery para un reconocimiento integral.
Instalación
Usando Go
# Install using Go (requires Go 1.20 or later)
go install -v github.com/projectdiscovery/katana/cmd/katana@latest
# Verify installation
katana -version
Usando Docker
# Pull the latest Docker image
docker pull projectdiscovery/katana:latest
# Run Katana using Docker
docker run -it projectdiscovery/katana:latest -h
Utilizando Homebrew (macOS)
# Install using Homebrew
brew install katana
# Verify installation
katana -version
Utilizando PDTM (Project Discovery Tools Manager)
# Install PDTM first if not already installed
go install -v github.com/projectdiscovery/pdtm/cmd/pdtm@latest
# Install Katana using PDTM
pdtm -i katana
# Verify installation
katana -version
En Kali Linux
# Install using apt
sudo apt install katana
# Verify installation
katana -version
Uso básico
Crawing a Single URL
# Crawl a single URL
katana -u https://example.com
# Crawl with increased verbosity
katana -u https://example.com -v
# Crawl with debug information
katana -u https://example.com -debug
Crawling Multiple URLs
# Crawl multiple URLs
katana -u https://example.com,https://test.com
# Crawl from a list of URLs
katana -list urls.txt
# Crawl from STDIN
cat urls.txt|katana
Opciones de salida
# Save results to a file
katana -u https://example.com -o results.txt
# Output in JSON format
katana -u https://example.com -json -o results.json
# Silent mode (only URLs)
katana -u https://example.com -silent
Opciones de arrastre
Profundidad y alcance
# Set crawling depth (default: 2)
katana -u https://example.com -depth 3
# Crawl subdomains (default: false)
katana -u https://example.com -crawl-scope subs
# Crawl out of scope (default: false)
katana -u https://example.com -crawl-scope out-of-scope
# Crawl only in scope
katana -u https://example.com -crawl-scope strict
Crawling Strategies
# Use standard crawler
katana -u https://example.com -crawler standard
# Use JavaScript parser
katana -u https://example.com -crawler js
# Use sitemap-based crawler
katana -u https://example.com -crawler sitemap
# Use robots.txt-based crawler
katana -u https://example.com -crawler robots
# Use all crawlers
katana -u https://example.com -crawler standard,js,sitemap,robots
Selección de campo
# Display specific fields
katana -u https://example.com -field url,path,method
# Available fields: url, path, method, host, fqdn, scheme, port, query, fragment, endpoint
Uso avanzado
Filtración URL
# Match URLs by regex
katana -u https://example.com -match-regex "admin|login|dashboard"
# Filter URLs by regex
katana -u https://example.com -filter-regex "logout|static|images"
# Match URLs by condition
katana -u https://example.com -field url -match-condition "contains('admin')"
Filtro de recursos
# Include specific file extensions
katana -u https://example.com -extension js,php,aspx
# Exclude specific file extensions
katana -u https://example.com -exclude-extension png,jpg,gif
# Include specific MIME types
katana -u https://example.com -mime-type application/json,text/html
Relleno de formulario
# Enable automatic form filling
katana -u https://example.com -form-fill
# Use custom form values
katana -u https://example.com -form-fill -field-name "username=admin&password=admin"
JavaScript Parsing
# Enable JavaScript parsing
katana -u https://example.com -js-crawl
# Set headless browser timeout
katana -u https://example.com -js-crawl -headless-timeout 20
# Set browser path
katana -u https://example.com -js-crawl -chrome-path /path/to/chrome
Optimización del rendimiento
Concurrencia y limitación de tarifas
# Set concurrency (default: 10)
katana -u https://example.com -concurrency 20
# Set delay between requests (milliseconds)
katana -u https://example.com -delay 100
# Set rate limit (requests per second)
katana -u https://example.com -rate-limit 50
Opciones de tiempo
# Set timeout for HTTP requests (seconds)
katana -u https://example.com -timeout 10
# Set timeout for headless browser (seconds)
katana -u https://example.com -js-crawl -headless-timeout 30
Optimización para grandes escáneres
# Disable automatic form filling for faster crawling
katana -u https://example.com -no-form-fill
# Disable JavaScript parsing for faster crawling
katana -u https://example.com -no-js-crawl
# Limit maximum URLs to crawl
katana -u https://example.com -max-urls 1000
Integración con otras herramientas
Pipeline con Subfinder
# Find subdomains and crawl them
subfinder -d example.com -silent|katana -silent
# Find subdomains, crawl them, and extract JavaScript files
subfinder -d example.com -silent|katana -silent -extension js
Pipeline con HTTPX
# Probe URLs and crawl active ones
httpx -l urls.txt -silent|katana -silent
# Crawl and then probe discovered endpoints
katana -u https://example.com -silent|httpx -silent
Pipeline con Nuclei
# Crawl and scan for vulnerabilities
katana -u https://example.com -silent|nuclei -t cves/
# Crawl, extract JavaScript files, and scan for vulnerabilities
katana -u https://example.com -silent -extension js|nuclei -t exposures/
Personalización de productos
Formato de salida personalizado
# Output only URLs
katana -u https://example.com -silent
# Output URLs with specific fields
katana -u https://example.com -field url,path,method -o results.txt
# Count discovered URLs
katana -u https://example.com -silent|wc -l
# Sort output alphabetically
katana -u https://example.com -silent|sort
Filtro de salida
# Filter by file extension
katana -u https://example.com -silent|grep "\.js$"
# Filter by endpoint pattern
katana -u https://example.com -silent|grep "/api/"
# Find unique domains
katana -u https://example.com -silent|awk -F/ '\\\\{print $3\\\\}'|sort -u
Filtro avanzado
URL Pattern Matching
# Match specific URL patterns
katana -u https://example.com -match-regex "^https://example.com/admin"
# Filter out specific URL patterns
katana -u https://example.com -filter-regex "^https://example.com/static"
# Match URLs containing specific query parameters
katana -u https://example.com -match-regex "id=[0-9]+"
Filtro de contenidos
# Match responses containing specific content
katana -u https://example.com -match-condition "contains(body, 'admin')"
# Filter responses by status code
katana -u https://example.com -match-condition "status == 200"
# Match responses by content type
katana -u https://example.com -match-condition "contains(content_type, 'application/json')"
Opciones proxy y Network
# Use HTTP proxy
katana -u https://example.com -proxy http://127.0.0.1:8080
# Use SOCKS5 proxy
katana -u https://example.com -proxy socks5://127.0.0.1:1080
# Set custom headers
katana -u https://example.com -header "User-Agent: Mozilla/5.0" -header "Cookie: session=123456"
# Set custom cookies
katana -u https://example.com -cookie "session=123456; user=admin"
Varios Características
Relleno de forma automática
# Enable automatic form filling
katana -u https://example.com -form-fill
# Set custom form values
katana -u https://example.com -form-fill -field-name "username=admin&password=admin"
Senderos Específicos Crawling
# Crawl specific paths
katana -u https://example.com -paths /admin,/login,/dashboard
# Crawl from a file containing paths
katana -u https://example.com -paths-file paths.txt
Intervención de respuestas
# Store all responses
katana -u https://example.com -store-response
# Specify response storage directory
katana -u https://example.com -store-response -store-response-dir responses/
Solución de problemas
Cuestiones comunes
- JavaScript Parsing Issues
# Increase headless browser timeout
katana -u https://example.com -js-crawl -headless-timeout 30
# Specify Chrome path manually
katana -u https://example.com -js-crawl -chrome-path /usr/bin/google-chrome
```
2. ** Limitación de destino por objetivo* *
```bash
# Reduce concurrency
katana -u https://example.com -concurrency 5
# Add delay between requests
katana -u https://example.com -delay 500
```
3. * Problemas de memoria*
```bash
# Limit maximum URLs to crawl
katana -u https://example.com -max-urls 500
# Disable JavaScript parsing
katana -u https://example.com -no-js-crawl
```
4. ** Cuestiones relativas a los cultivos**
```bash
# Restrict crawling to specific domain
katana -u https://example.com -crawl-scope strict
# Allow crawling subdomains
katana -u https://example.com -crawl-scope subs
```
### Debugging
```bash
# Enable verbose mode
katana -u https://example.com -v
# Show debug information
katana -u https://example.com -debug
# Show request and response details
katana -u https://example.com -debug -show-request -show-response
Configuración
Archivo de configuración
Katana utiliza un archivo de configuración ubicado en $HOME/.config/katana/config.yaml
_. Puede personalizar varios ajustes en este archivo:
# Example configuration file
concurrency: 10
delay: 100
timeout: 10
max-depth: 3
crawl-scope: strict
crawl-duration: 0
field: url,path,method
extensions: js,php,aspx
Medio ambiente
# Set Katana configuration via environment variables
export KATANA_CONCURRENCY=10
export KATANA_DELAY=100
export KATANA_TIMEOUT=10
export KATANA_MAX_DEPTH=3
Referencia
Opciones de línea de mando
Flag | Description |
---|---|
-u, -url |
Target URL to crawl |
-list, -l |
File containing list of URLs to crawl |
-o, -output |
File to write output to |
-json |
Write output in JSON format |
-silent |
Show only URLs in output |
-v, -verbose |
Show verbose output |
-depth |
Maximum depth to crawl (default: 2) |
-crawl-scope |
Crawling scope (strict, subs, out-of-scope) |
-crawler |
Crawler types to use (standard, js, sitemap, robots) |
-field |
Fields to display in output |
-extension |
File extensions to include |
-exclude-extension |
File extensions to exclude |
-match-regex |
Regex pattern to match URLs |
-filter-regex |
Regex pattern to filter URLs |
-match-condition |
Condition to match URLs |
-form-fill |
Enable automatic form filling |
-js-crawl |
Enable JavaScript parsing |
-headless-timeout |
Timeout for headless browser (seconds) |
-chrome-path |
Path to Chrome browser |
-concurrency |
Number of concurrent requests |
-delay |
Delay between requests (milliseconds) |
-rate-limit |
Maximum number of requests per second |
-timeout |
Timeout for HTTP requests (seconds) |
-max-urls |
Maximum number of URLs to crawl |
-proxy |
HTTP/SOCKS5 proxy to use |
-header |
Custom header to add to all requests |
-cookie |
Custom cookies to add to all requests |
-paths |
Specific paths to crawl |
-paths-file |
File containing paths to crawl |
-store-response |
Store all responses |
-store-response-dir |
Directory to store responses |
-version |
Show Katana version |
Sábanas de arrastre
Scope | Description |
---|---|
strict |
Crawl only the exact domain provided |
subs |
Crawl the domain and its subdomains |
out-of-scope |
Crawl any domain, regardless of the initial domain |
Tipos de arrastre
Type | Description |
---|---|
standard |
Standard HTTP crawler |
js |
JavaScript parser using headless browser |
sitemap |
Sitemap-based crawler |
robots |
Robots.txt-based crawler |
Opciones sobre el terreno
Field | Description |
---|---|
url |
Full URL |
path |
URL path |
method |
HTTP method |
host |
Host part of URL |
fqdn |
Fully qualified domain name |
scheme |
URL scheme (http/https) |
port |
URL port |
query |
Query parameters |
fragment |
URL fragment |
endpoint |
URL endpoint |
Recursos
-...
*Esta hoja de trampolín proporciona una referencia completa para el uso de Katana, desde el rastreo básico hasta el filtrado avanzado e integración con otras herramientas. Para la información más actualizada, consulte siempre la documentación oficial. *