Katana
Katana Web Crawler Cheat Sheet
Overview¶
Katana es un marco de rastreo web rápido y personalizable desarrollado por Project Discovery. Está diseñado para rastrear sitios web de manera eficiente para reunir información y descubrir puntos finales. Katana destaca de otros rastreadores web debido a su velocidad, flexibilidad y enfoque en casos de uso de pruebas de seguridad.
Lo que hace que Katana sea única es su capacidad de rastrear inteligentemente las aplicaciones web modernas, incluyendo aplicaciones de una sola página (SPAs) que dependen en gran medida de JavaScript. Puede manejar tecnologías web complejas y extraer información valiosa como URLs, archivos JavaScript, puntos finales de API y otros activos web. Katana está construida con profesionales de seguridad en mente, por lo que es una excelente herramienta para el reconocimiento durante las evaluaciones de seguridad y la caza de botín.
Katana apoya varias estrategias de rastreo, incluyendo rastreo estándar, pareado de JavaScript y rastreo basado en mapas de sitio. Se puede personalizar para centrarse en tipos específicos de recursos o seguir patrones particulares, haciendo que sea adaptable a diferentes escenarios de pruebas de seguridad. La herramienta está diseñada para integrarse fácilmente en los flujos de trabajo de pruebas de seguridad y se puede combinar con otras herramientas de Project Discovery para un reconocimiento integral.
Instalación¶
Usando Go¶
# Install using Go (requires Go 1.20 or later)
go install -v github.com/projectdiscovery/katana/cmd/katana@latest
# Verify installation
katana -version
Usando Docker¶
# Pull the latest Docker image
docker pull projectdiscovery/katana:latest
# Run Katana using Docker
docker run -it projectdiscovery/katana:latest -h
Usando Homebrew (macOS)¶
Usando PDTM (Project Discovery Tools Manager)¶
# Install PDTM first if not already installed
go install -v github.com/projectdiscovery/pdtm/cmd/pdtm@latest
# Install Katana using PDTM
pdtm -i katana
# Verify installation
katana -version
On Kali Linux¶
Uso básico¶
Crawling a Single URL¶
# Crawl a single URL
katana -u https://example.com
# Crawl with increased verbosity
katana -u https://example.com -v
# Crawl with debug information
katana -u https://example.com -debug
Crawling Multiple URLs¶
# Crawl multiple URLs
katana -u https://example.com,https://test.com
# Crawl from a list of URLs
katana -list urls.txt
# Crawl from STDIN
cat urls.txt|katana
Output Options¶
# Save results to a file
katana -u https://example.com -o results.txt
# Output in JSON format
katana -u https://example.com -json -o results.json
# Silent mode (only URLs)
katana -u https://example.com -silent
Crawling Options¶
Crawling Depth and Scope¶
# Set crawling depth (default: 2)
katana -u https://example.com -depth 3
# Crawl subdomains (default: false)
katana -u https://example.com -crawl-scope subs
# Crawl out of scope (default: false)
katana -u https://example.com -crawl-scope out-of-scope
# Crawl only in scope
katana -u https://example.com -crawl-scope strict
Crawling Strategies¶
# Use standard crawler
katana -u https://example.com -crawler standard
# Use JavaScript parser
katana -u https://example.com -crawler js
# Use sitemap-based crawler
katana -u https://example.com -crawler sitemap
# Use robots.txt-based crawler
katana -u https://example.com -crawler robots
# Use all crawlers
katana -u https://example.com -crawler standard,js,sitemap,robots
Field Selection¶
# Display specific fields
katana -u https://example.com -field url,path,method
# Available fields: url, path, method, host, fqdn, scheme, port, query, fragment, endpoint
Advanced Usage¶
URL Filtración¶
# Match URLs by regex
katana -u https://example.com -match-regex "admin|login|dashboard"
# Filter URLs by regex
katana -u https://example.com -filter-regex "logout|static|images"
# Match URLs by condition
katana -u https://example.com -field url -match-condition "contains('admin')"
Filtro de recursos¶
# Include specific file extensions
katana -u https://example.com -extension js,php,aspx
# Exclude specific file extensions
katana -u https://example.com -exclude-extension png,jpg,gif
# Include specific MIME types
katana -u https://example.com -mime-type application/json,text/html
Form Filling¶
# Enable automatic form filling
katana -u https://example.com -form-fill
# Use custom form values
katana -u https://example.com -form-fill -field-name "username=admin&password=admin"
JavaScript Parsing¶
# Enable JavaScript parsing
katana -u https://example.com -js-crawl
# Set headless browser timeout
katana -u https://example.com -js-crawl -headless-timeout 20
# Set browser path
katana -u https://example.com -js-crawl -chrome-path /path/to/chrome
Performance Optimization¶
Concurrencia y limitación de tarifas¶
# Set concurrency (default: 10)
katana -u https://example.com -concurrency 20
# Set delay between requests (milliseconds)
katana -u https://example.com -delay 100
# Set rate limit (requests per second)
katana -u https://example.com -rate-limit 50
Timeout Options¶
# Set timeout for HTTP requests (seconds)
katana -u https://example.com -timeout 10
# Set timeout for headless browser (seconds)
katana -u https://example.com -js-crawl -headless-timeout 30
Optimización para grandes escáneres
# Disable automatic form filling for faster crawling
katana -u https://example.com -no-form-fill
# Disable JavaScript parsing for faster crawling
katana -u https://example.com -no-js-crawl
# Limit maximum URLs to crawl
katana -u https://example.com -max-urls 1000
Integración con otras herramientas¶
Pipeline with Subfinder¶
# Find subdomains and crawl them
subfinder -d example.com -silent|katana -silent
# Find subdomains, crawl them, and extract JavaScript files
subfinder -d example.com -silent|katana -silent -extension js
Pipeline with HTTPX¶
# Probe URLs and crawl active ones
httpx -l urls.txt -silent|katana -silent
# Crawl and then probe discovered endpoints
katana -u https://example.com -silent|httpx -silent
Pipeline with Nuclei¶
# Crawl and scan for vulnerabilities
katana -u https://example.com -silent|nuclei -t cves/
# Crawl, extract JavaScript files, and scan for vulnerabilities
katana -u https://example.com -silent -extension js|nuclei -t exposures/
Output Customization¶
Custom Output Format¶
# Output only URLs
katana -u https://example.com -silent
# Output URLs with specific fields
katana -u https://example.com -field url,path,method -o results.txt
# Count discovered URLs
katana -u https://example.com -silent|wc -l
# Sort output alphabetically
katana -u https://example.com -silent|sort
Filtrando salida¶
# Filter by file extension
katana -u https://example.com -silent|grep "\.js$"
# Filter by endpoint pattern
katana -u https://example.com -silent|grep "/api/"
# Find unique domains
katana -u https://example.com -silent|awk -F/ '\\\\{print $3\\\\}'|sort -u
Filtro avanzado¶
URL Pattern Matching¶
# Match specific URL patterns
katana -u https://example.com -match-regex "^https://example.com/admin"
# Filter out specific URL patterns
katana -u https://example.com -filter-regex "^https://example.com/static"
# Match URLs containing specific query parameters
katana -u https://example.com -match-regex "id=[0-9]+"
Content Filtering¶
# Match responses containing specific content
katana -u https://example.com -match-condition "contains(body, 'admin')"
# Filter responses by status code
katana -u https://example.com -match-condition "status == 200"
# Match responses by content type
katana -u https://example.com -match-condition "contains(content_type, 'application/json')"
Proxy and Network Options¶
# Use HTTP proxy
katana -u https://example.com -proxy http://127.0.0.1:8080
# Use SOCKS5 proxy
katana -u https://example.com -proxy socks5://127.0.0.1:1080
# Set custom headers
katana -u https://example.com -header "User-Agent: Mozilla/5.0" -header "Cookie: session=123456"
# Set custom cookies
katana -u https://example.com -cookie "session=123456; user=admin"
Miscelánea Características¶
Relleno de forma automática¶
# Enable automatic form filling
katana -u https://example.com -form-fill
# Set custom form values
katana -u https://example.com -form-fill -field-name "username=admin&password=admin"
Crawling Specific Paths¶
# Crawl specific paths
katana -u https://example.com -paths /admin,/login,/dashboard
# Crawl from a file containing paths
katana -u https://example.com -paths-file paths.txt
Storing Responses¶
# Store all responses
katana -u https://example.com -store-response
# Specify response storage directory
katana -u https://example.com -store-response -store-response-dir responses/
Troubleshooting¶
Common Issues¶
- JavaScript Parsing Issues
# Increase headless browser timeout katana -u https://example.com -js-crawl -headless-timeout 30 # Specify Chrome path manually katana -u https://example.com -js-crawl -chrome-path /usr/bin/google-chrome ``` 2. ** Limitación de destino por objetivo* * ```bash # Reduce concurrency katana -u https://example.com -concurrency 5 # Add delay between requests katana -u https://example.com -delay 500 ``` 3. * Problemas de memoria* ```bash # Limit maximum URLs to crawl katana -u https://example.com -max-urls 500 # Disable JavaScript parsing katana -u https://example.com -no-js-crawl ``` 4. ** Cuestiones relativas a los cultivos** ```bash # Restrict crawling to specific domain katana -u https://example.com -crawl-scope strict # Allow crawling subdomains katana -u https://example.com -crawl-scope subs ``` ### Debugging ```bash # Enable verbose mode katana -u https://example.com -v # Show debug information katana -u https://example.com -debug # Show request and response details katana -u https://example.com -debug -show-request -show-response
Configuración¶
Archivo de configuración¶
Katana utiliza un archivo de configuración ubicado en $HOME/.config/katana/config.yaml_. Puede personalizar varios ajustes en este archivo:
# Example configuration file
concurrency: 10
delay: 100
timeout: 10
max-depth: 3
crawl-scope: strict
crawl-duration: 0
field: url,path,method
extensions: js,php,aspx
Environment Variables¶
# Set Katana configuration via environment variables
export KATANA_CONCURRENCY=10
export KATANA_DELAY=100
export KATANA_TIMEOUT=10
export KATANA_MAX_DEPTH=3
Reference¶
Command Line Options¶
| Flag | Description |
|---|---|
| INLINE_CODE_37 | Target URL to crawl |
| INLINE_CODE_38 | File containing list of URLs to crawl |
| INLINE_CODE_39 | File to write output to |
| INLINE_CODE_40 | Write output in JSON format |
| INLINE_CODE_41 | Show only URLs in output |
| INLINE_CODE_42 | Show verbose output |
| INLINE_CODE_43 | Maximum depth to crawl (default: 2) |
| INLINE_CODE_44 | Crawling scope (strict, subs, out-of-scope) |
| INLINE_CODE_45 | Crawler types to use (standard, js, sitemap, robots) |
| INLINE_CODE_46 | Fields to display in output |
| INLINE_CODE_47 | File extensions to include |
| INLINE_CODE_48 | File extensions to exclude |
| INLINE_CODE_49 | Regex pattern to match URLs |
| INLINE_CODE_50 | Regex pattern to filter URLs |
| INLINE_CODE_51 | Condition to match URLs |
| INLINE_CODE_52 | Enable automatic form filling |
| INLINE_CODE_53 | Enable JavaScript parsing |
| INLINE_CODE_54 | Timeout for headless browser (seconds) |
| INLINE_CODE_55 | Path to Chrome browser |
| INLINE_CODE_56 | Number of concurrent requests |
| INLINE_CODE_57 | Delay between requests (milliseconds) |
| INLINE_CODE_58 | Maximum number of requests per second |
| INLINE_CODE_59 | Timeout for HTTP requests (seconds) |
| INLINE_CODE_60 | Maximum number of URLs to crawl |
| INLINE_CODE_61 | HTTP/SOCKS5 proxy to use |
| INLINE_CODE_62 | Custom header to add to all requests |
| INLINE_CODE_63 | Custom cookies to add to all requests |
| INLINE_CODE_64 | Specific paths to crawl |
| INLINE_CODE_65 | File containing paths to crawl |
| INLINE_CODE_66 | Store all responses |
| INLINE_CODE_67 | Directory to store responses |
| INLINE_CODE_68 | Show Katana version |
Crawling Scopes¶
| Scope | Description |
|---|---|
| INLINE_CODE_69 | Crawl only the exact domain provided |
| INLINE_CODE_70 | Crawl the domain and its subdomains |
| INLINE_CODE_71 | Crawl any domain, regardless of the initial domain |
Crawler Types¶
| Type | Description |
|---|---|
| INLINE_CODE_72 | Standard HTTP crawler |
| INLINE_CODE_73 | JavaScript parser using headless browser |
| INLINE_CODE_74 | Sitemap-based crawler |
| INLINE_CODE_75 | Robots.txt-based crawler |
Field Options¶
| Field | Description |
|---|---|
| INLINE_CODE_76 | Full URL |
| INLINE_CODE_77 | URL path |
| INLINE_CODE_78 | HTTP method |
| INLINE_CODE_79 | Host part of URL |
| INLINE_CODE_80 | Fully qualified domain name |
| INLINE_CODE_81 | URL scheme (http/https) |
| INLINE_CODE_82 | URL port |
| INLINE_CODE_83 | Query parameters |
| INLINE_CODE_84 | URL fragment |
| INLINE_CODE_85 | URL endpoint |
Resources¶
- [Documentación Oficial](URL_86__
- [Repositorio GitHub](URL_87__
- [Discord de descubrimiento del producto](URL_88__
-...
*Esta hoja de trampolín proporciona una referencia completa para el uso de Katana, desde el rastreo básico hasta el filtrado avanzado e integración con otras herramientas. Para la información más actualizada, consulte siempre la documentación oficial. *