Saltar a contenido

Katana Web Crawler Cheat Sheet

Sinopsis

Katana es un marco de rastreo web rápido y personalizable desarrollado por Project Discovery. Está diseñado para rastrear sitios web de manera eficiente para reunir información y descubrir puntos finales. Katana destaca de otros rastreadores web debido a su velocidad, flexibilidad y enfoque en casos de uso de pruebas de seguridad.

Lo que hace que Katana sea única es su capacidad de rastrear inteligentemente las aplicaciones web modernas, incluyendo aplicaciones de una sola página (SPAs) que dependen en gran medida de JavaScript. Puede manejar tecnologías web complejas y extraer información valiosa como URLs, archivos JavaScript, puntos finales de API y otros activos web. Katana está construida con profesionales de seguridad en mente, por lo que es una excelente herramienta para el reconocimiento durante las evaluaciones de seguridad y la caza de botín.

Katana apoya varias estrategias de rastreo, incluyendo rastreo estándar, pareado de JavaScript y rastreo basado en mapas de sitio. Se puede personalizar para centrarse en tipos específicos de recursos o seguir patrones particulares, haciendo que sea adaptable a diferentes escenarios de pruebas de seguridad. La herramienta está diseñada para integrarse fácilmente en los flujos de trabajo de pruebas de seguridad y se puede combinar con otras herramientas de Project Discovery para un reconocimiento integral.

Instalación

Usando Go

# Install using Go (requires Go 1.20 or later)
go install -v github.com/projectdiscovery/katana/cmd/katana@latest

# Verify installation
katana -version

Usando Docker

# Pull the latest Docker image
docker pull projectdiscovery/katana:latest

# Run Katana using Docker
docker run -it projectdiscovery/katana:latest -h

Utilizando Homebrew (macOS)

# Install using Homebrew
brew install katana

# Verify installation
katana -version

Utilizando PDTM (Project Discovery Tools Manager)

# Install PDTM first if not already installed
go install -v github.com/projectdiscovery/pdtm/cmd/pdtm@latest

# Install Katana using PDTM
pdtm -i katana

# Verify installation
katana -version

En Kali Linux

# Install using apt
sudo apt install katana

# Verify installation
katana -version

Uso básico

Crawing a Single URL

# Crawl a single URL
katana -u https://example.com

# Crawl with increased verbosity
katana -u https://example.com -v

# Crawl with debug information
katana -u https://example.com -debug

Crawling Multiple URLs

# Crawl multiple URLs
katana -u https://example.com,https://test.com

# Crawl from a list of URLs
katana -list urls.txt

# Crawl from STDIN
cat urls.txt|katana

Opciones de salida

# Save results to a file
katana -u https://example.com -o results.txt

# Output in JSON format
katana -u https://example.com -json -o results.json

# Silent mode (only URLs)
katana -u https://example.com -silent

Opciones de arrastre

Profundidad y alcance

# Set crawling depth (default: 2)
katana -u https://example.com -depth 3

# Crawl subdomains (default: false)
katana -u https://example.com -crawl-scope subs

# Crawl out of scope (default: false)
katana -u https://example.com -crawl-scope out-of-scope

# Crawl only in scope
katana -u https://example.com -crawl-scope strict

Crawling Strategies

# Use standard crawler
katana -u https://example.com -crawler standard

# Use JavaScript parser
katana -u https://example.com -crawler js

# Use sitemap-based crawler
katana -u https://example.com -crawler sitemap

# Use robots.txt-based crawler
katana -u https://example.com -crawler robots

# Use all crawlers
katana -u https://example.com -crawler standard,js,sitemap,robots

Selección de campo

# Display specific fields
katana -u https://example.com -field url,path,method

# Available fields: url, path, method, host, fqdn, scheme, port, query, fragment, endpoint

Uso avanzado

Filtración URL

# Match URLs by regex
katana -u https://example.com -match-regex "admin|login|dashboard"

# Filter URLs by regex
katana -u https://example.com -filter-regex "logout|static|images"

# Match URLs by condition
katana -u https://example.com -field url -match-condition "contains('admin')"

Filtro de recursos

# Include specific file extensions
katana -u https://example.com -extension js,php,aspx

# Exclude specific file extensions
katana -u https://example.com -exclude-extension png,jpg,gif

# Include specific MIME types
katana -u https://example.com -mime-type application/json,text/html

Relleno de formulario

# Enable automatic form filling
katana -u https://example.com -form-fill

# Use custom form values
katana -u https://example.com -form-fill -field-name "username=admin&password=admin"

JavaScript Parsing

# Enable JavaScript parsing
katana -u https://example.com -js-crawl

# Set headless browser timeout
katana -u https://example.com -js-crawl -headless-timeout 20

# Set browser path
katana -u https://example.com -js-crawl -chrome-path /path/to/chrome

Optimización del rendimiento

Concurrencia y limitación de tarifas

# Set concurrency (default: 10)
katana -u https://example.com -concurrency 20

# Set delay between requests (milliseconds)
katana -u https://example.com -delay 100

# Set rate limit (requests per second)
katana -u https://example.com -rate-limit 50

Opciones de tiempo

# Set timeout for HTTP requests (seconds)
katana -u https://example.com -timeout 10

# Set timeout for headless browser (seconds)
katana -u https://example.com -js-crawl -headless-timeout 30

Optimización para grandes escáneres

# Disable automatic form filling for faster crawling
katana -u https://example.com -no-form-fill

# Disable JavaScript parsing for faster crawling
katana -u https://example.com -no-js-crawl

# Limit maximum URLs to crawl
katana -u https://example.com -max-urls 1000

Integración con otras herramientas

Pipeline con Subfinder

# Find subdomains and crawl them
subfinder -d example.com -silent|katana -silent

# Find subdomains, crawl them, and extract JavaScript files
subfinder -d example.com -silent|katana -silent -extension js

Pipeline con HTTPX

# Probe URLs and crawl active ones
httpx -l urls.txt -silent|katana -silent

# Crawl and then probe discovered endpoints
katana -u https://example.com -silent|httpx -silent

Pipeline con Nuclei

# Crawl and scan for vulnerabilities
katana -u https://example.com -silent|nuclei -t cves/

# Crawl, extract JavaScript files, and scan for vulnerabilities
katana -u https://example.com -silent -extension js|nuclei -t exposures/

Personalización de productos

Formato de salida personalizado

# Output only URLs
katana -u https://example.com -silent

# Output URLs with specific fields
katana -u https://example.com -field url,path,method -o results.txt

# Count discovered URLs
katana -u https://example.com -silent|wc -l

# Sort output alphabetically
katana -u https://example.com -silent|sort

Filtro de salida

# Filter by file extension
katana -u https://example.com -silent|grep "\.js$"

# Filter by endpoint pattern
katana -u https://example.com -silent|grep "/api/"

# Find unique domains
katana -u https://example.com -silent|awk -F/ '\\\\{print $3\\\\}'|sort -u

Filtro avanzado

URL Pattern Matching

# Match specific URL patterns
katana -u https://example.com -match-regex "^https://example.com/admin"

# Filter out specific URL patterns
katana -u https://example.com -filter-regex "^https://example.com/static"

# Match URLs containing specific query parameters
katana -u https://example.com -match-regex "id=[0-9]+"

Filtro de contenidos

# Match responses containing specific content
katana -u https://example.com -match-condition "contains(body, 'admin')"

# Filter responses by status code
katana -u https://example.com -match-condition "status == 200"

# Match responses by content type
katana -u https://example.com -match-condition "contains(content_type, 'application/json')"

Opciones proxy y Network

# Use HTTP proxy
katana -u https://example.com -proxy http://127.0.0.1:8080

# Use SOCKS5 proxy
katana -u https://example.com -proxy socks5://127.0.0.1:1080

# Set custom headers
katana -u https://example.com -header "User-Agent: Mozilla/5.0" -header "Cookie: session=123456"

# Set custom cookies
katana -u https://example.com -cookie "session=123456; user=admin"

Varios Características

Relleno de forma automática

# Enable automatic form filling
katana -u https://example.com -form-fill

# Set custom form values
katana -u https://example.com -form-fill -field-name "username=admin&password=admin"

Senderos Específicos Crawling

# Crawl specific paths
katana -u https://example.com -paths /admin,/login,/dashboard

# Crawl from a file containing paths
katana -u https://example.com -paths-file paths.txt

Intervención de respuestas

# Store all responses
katana -u https://example.com -store-response

# Specify response storage directory
katana -u https://example.com -store-response -store-response-dir responses/

Solución de problemas

Cuestiones comunes

  1. JavaScript Parsing Issues
   # Increase headless browser timeout
   katana -u https://example.com -js-crawl -headless-timeout 30

   # Specify Chrome path manually
   katana -u https://example.com -js-crawl -chrome-path /usr/bin/google-chrome
   ```

2. ** Limitación de destino por objetivo* *
```bash
   # Reduce concurrency
   katana -u https://example.com -concurrency 5

   # Add delay between requests
   katana -u https://example.com -delay 500
   ```

3. * Problemas de memoria*
```bash
   # Limit maximum URLs to crawl
   katana -u https://example.com -max-urls 500

   # Disable JavaScript parsing
   katana -u https://example.com -no-js-crawl
   ```

4. ** Cuestiones relativas a los cultivos**
```bash
   # Restrict crawling to specific domain
   katana -u https://example.com -crawl-scope strict

   # Allow crawling subdomains
   katana -u https://example.com -crawl-scope subs
   ```

### Debugging

```bash
# Enable verbose mode
katana -u https://example.com -v

# Show debug information
katana -u https://example.com -debug

# Show request and response details
katana -u https://example.com -debug -show-request -show-response

Configuración

Archivo de configuración

Katana utiliza un archivo de configuración ubicado en $HOME/.config/katana/config.yaml_. Puede personalizar varios ajustes en este archivo:

# Example configuration file
concurrency: 10
delay: 100
timeout: 10
max-depth: 3
crawl-scope: strict
crawl-duration: 0
field: url,path,method
extensions: js,php,aspx

Medio ambiente

# Set Katana configuration via environment variables
export KATANA_CONCURRENCY=10
export KATANA_DELAY=100
export KATANA_TIMEOUT=10
export KATANA_MAX_DEPTH=3

Referencia

Opciones de línea de mando

Flag Description
-u, -url Target URL to crawl
-list, -l File containing list of URLs to crawl
-o, -output File to write output to
-json Write output in JSON format
-silent Show only URLs in output
-v, -verbose Show verbose output
-depth Maximum depth to crawl (default: 2)
-crawl-scope Crawling scope (strict, subs, out-of-scope)
-crawler Crawler types to use (standard, js, sitemap, robots)
-field Fields to display in output
-extension File extensions to include
-exclude-extension File extensions to exclude
-match-regex Regex pattern to match URLs
-filter-regex Regex pattern to filter URLs
-match-condition Condition to match URLs
-form-fill Enable automatic form filling
-js-crawl Enable JavaScript parsing
-headless-timeout Timeout for headless browser (seconds)
-chrome-path Path to Chrome browser
-concurrency Number of concurrent requests
-delay Delay between requests (milliseconds)
-rate-limit Maximum number of requests per second
-timeout Timeout for HTTP requests (seconds)
-max-urls Maximum number of URLs to crawl
-proxy HTTP/SOCKS5 proxy to use
-header Custom header to add to all requests
-cookie Custom cookies to add to all requests
-paths Specific paths to crawl
-paths-file File containing paths to crawl
-store-response Store all responses
-store-response-dir Directory to store responses
-version Show Katana version

Sábanas de arrastre

Scope Description
strict Crawl only the exact domain provided
subs Crawl the domain and its subdomains
out-of-scope Crawl any domain, regardless of the initial domain

Tipos de arrastre

Type Description
standard Standard HTTP crawler
js JavaScript parser using headless browser
sitemap Sitemap-based crawler
robots Robots.txt-based crawler

Opciones sobre el terreno

Field Description
url Full URL
path URL path
method HTTP method
host Host part of URL
fqdn Fully qualified domain name
scheme URL scheme (http/https)
port URL port
query Query parameters
fragment URL fragment
endpoint URL endpoint

Recursos

-...

*Esta hoja de trampolín proporciona una referencia completa para el uso de Katana, desde el rastreo básico hasta el filtrado avanzado e integración con otras herramientas. Para la información más actualizada, consulte siempre la documentación oficial. *