Saltar a contenido

Katana

Katana Web Crawler Cheat Sheet

Overview

Katana es un marco de rastreo web rápido y personalizable desarrollado por Project Discovery. Está diseñado para rastrear sitios web de manera eficiente para reunir información y descubrir puntos finales. Katana destaca de otros rastreadores web debido a su velocidad, flexibilidad y enfoque en casos de uso de pruebas de seguridad.

Lo que hace que Katana sea única es su capacidad de rastrear inteligentemente las aplicaciones web modernas, incluyendo aplicaciones de una sola página (SPAs) que dependen en gran medida de JavaScript. Puede manejar tecnologías web complejas y extraer información valiosa como URLs, archivos JavaScript, puntos finales de API y otros activos web. Katana está construida con profesionales de seguridad en mente, por lo que es una excelente herramienta para el reconocimiento durante las evaluaciones de seguridad y la caza de botín.

Katana apoya varias estrategias de rastreo, incluyendo rastreo estándar, pareado de JavaScript y rastreo basado en mapas de sitio. Se puede personalizar para centrarse en tipos específicos de recursos o seguir patrones particulares, haciendo que sea adaptable a diferentes escenarios de pruebas de seguridad. La herramienta está diseñada para integrarse fácilmente en los flujos de trabajo de pruebas de seguridad y se puede combinar con otras herramientas de Project Discovery para un reconocimiento integral.

Instalación

Usando Go

# Install using Go (requires Go 1.20 or later)
go install -v github.com/projectdiscovery/katana/cmd/katana@latest

# Verify installation
katana -version

Usando Docker

# Pull the latest Docker image
docker pull projectdiscovery/katana:latest

# Run Katana using Docker
docker run -it projectdiscovery/katana:latest -h

Usando Homebrew (macOS)

# Install using Homebrew
brew install katana

# Verify installation
katana -version

Usando PDTM (Project Discovery Tools Manager)

# Install PDTM first if not already installed
go install -v github.com/projectdiscovery/pdtm/cmd/pdtm@latest

# Install Katana using PDTM
pdtm -i katana

# Verify installation
katana -version

On Kali Linux

# Install using apt
sudo apt install katana

# Verify installation
katana -version

Uso básico

Crawling a Single URL

# Crawl a single URL
katana -u https://example.com

# Crawl with increased verbosity
katana -u https://example.com -v

# Crawl with debug information
katana -u https://example.com -debug

Crawling Multiple URLs

# Crawl multiple URLs
katana -u https://example.com,https://test.com

# Crawl from a list of URLs
katana -list urls.txt

# Crawl from STDIN
cat urls.txt|katana

Output Options

# Save results to a file
katana -u https://example.com -o results.txt

# Output in JSON format
katana -u https://example.com -json -o results.json

# Silent mode (only URLs)
katana -u https://example.com -silent

Crawling Options

Crawling Depth and Scope

# Set crawling depth (default: 2)
katana -u https://example.com -depth 3

# Crawl subdomains (default: false)
katana -u https://example.com -crawl-scope subs

# Crawl out of scope (default: false)
katana -u https://example.com -crawl-scope out-of-scope

# Crawl only in scope
katana -u https://example.com -crawl-scope strict

Crawling Strategies

# Use standard crawler
katana -u https://example.com -crawler standard

# Use JavaScript parser
katana -u https://example.com -crawler js

# Use sitemap-based crawler
katana -u https://example.com -crawler sitemap

# Use robots.txt-based crawler
katana -u https://example.com -crawler robots

# Use all crawlers
katana -u https://example.com -crawler standard,js,sitemap,robots

Field Selection

# Display specific fields
katana -u https://example.com -field url,path,method

# Available fields: url, path, method, host, fqdn, scheme, port, query, fragment, endpoint

Advanced Usage

URL Filtración

# Match URLs by regex
katana -u https://example.com -match-regex "admin|login|dashboard"

# Filter URLs by regex
katana -u https://example.com -filter-regex "logout|static|images"

# Match URLs by condition
katana -u https://example.com -field url -match-condition "contains('admin')"

Filtro de recursos

# Include specific file extensions
katana -u https://example.com -extension js,php,aspx

# Exclude specific file extensions
katana -u https://example.com -exclude-extension png,jpg,gif

# Include specific MIME types
katana -u https://example.com -mime-type application/json,text/html

Form Filling

# Enable automatic form filling
katana -u https://example.com -form-fill

# Use custom form values
katana -u https://example.com -form-fill -field-name "username=admin&password=admin"

JavaScript Parsing

# Enable JavaScript parsing
katana -u https://example.com -js-crawl

# Set headless browser timeout
katana -u https://example.com -js-crawl -headless-timeout 20

# Set browser path
katana -u https://example.com -js-crawl -chrome-path /path/to/chrome

Performance Optimization

Concurrencia y limitación de tarifas

# Set concurrency (default: 10)
katana -u https://example.com -concurrency 20

# Set delay between requests (milliseconds)
katana -u https://example.com -delay 100

# Set rate limit (requests per second)
katana -u https://example.com -rate-limit 50

Timeout Options

# Set timeout for HTTP requests (seconds)
katana -u https://example.com -timeout 10

# Set timeout for headless browser (seconds)
katana -u https://example.com -js-crawl -headless-timeout 30

Optimización para grandes escáneres

# Disable automatic form filling for faster crawling
katana -u https://example.com -no-form-fill

# Disable JavaScript parsing for faster crawling
katana -u https://example.com -no-js-crawl

# Limit maximum URLs to crawl
katana -u https://example.com -max-urls 1000

Integración con otras herramientas

Pipeline with Subfinder

# Find subdomains and crawl them
subfinder -d example.com -silent|katana -silent

# Find subdomains, crawl them, and extract JavaScript files
subfinder -d example.com -silent|katana -silent -extension js

Pipeline with HTTPX

# Probe URLs and crawl active ones
httpx -l urls.txt -silent|katana -silent

# Crawl and then probe discovered endpoints
katana -u https://example.com -silent|httpx -silent

Pipeline with Nuclei

# Crawl and scan for vulnerabilities
katana -u https://example.com -silent|nuclei -t cves/

# Crawl, extract JavaScript files, and scan for vulnerabilities
katana -u https://example.com -silent -extension js|nuclei -t exposures/

Output Customization

Custom Output Format

# Output only URLs
katana -u https://example.com -silent

# Output URLs with specific fields
katana -u https://example.com -field url,path,method -o results.txt

# Count discovered URLs
katana -u https://example.com -silent|wc -l

# Sort output alphabetically
katana -u https://example.com -silent|sort

Filtrando salida

# Filter by file extension
katana -u https://example.com -silent|grep "\.js$"

# Filter by endpoint pattern
katana -u https://example.com -silent|grep "/api/"

# Find unique domains
katana -u https://example.com -silent|awk -F/ '\\\\{print $3\\\\}'|sort -u

Filtro avanzado

URL Pattern Matching

# Match specific URL patterns
katana -u https://example.com -match-regex "^https://example.com/admin"

# Filter out specific URL patterns
katana -u https://example.com -filter-regex "^https://example.com/static"

# Match URLs containing specific query parameters
katana -u https://example.com -match-regex "id=[0-9]+"

Content Filtering

# Match responses containing specific content
katana -u https://example.com -match-condition "contains(body, 'admin')"

# Filter responses by status code
katana -u https://example.com -match-condition "status == 200"

# Match responses by content type
katana -u https://example.com -match-condition "contains(content_type, 'application/json')"

Proxy and Network Options

# Use HTTP proxy
katana -u https://example.com -proxy http://127.0.0.1:8080

# Use SOCKS5 proxy
katana -u https://example.com -proxy socks5://127.0.0.1:1080

# Set custom headers
katana -u https://example.com -header "User-Agent: Mozilla/5.0" -header "Cookie: session=123456"

# Set custom cookies
katana -u https://example.com -cookie "session=123456; user=admin"

Miscelánea Características

Relleno de forma automática

# Enable automatic form filling
katana -u https://example.com -form-fill

# Set custom form values
katana -u https://example.com -form-fill -field-name "username=admin&password=admin"

Crawling Specific Paths

# Crawl specific paths
katana -u https://example.com -paths /admin,/login,/dashboard

# Crawl from a file containing paths
katana -u https://example.com -paths-file paths.txt

Storing Responses

# Store all responses
katana -u https://example.com -store-response

# Specify response storage directory
katana -u https://example.com -store-response -store-response-dir responses/

Troubleshooting

Common Issues

  1. JavaScript Parsing Issues
       # Increase headless browser timeout
       katana -u https://example.com -js-crawl -headless-timeout 30
    
       # Specify Chrome path manually
       katana -u https://example.com -js-crawl -chrome-path /usr/bin/google-chrome
       ```
    
    2. ** Limitación de destino por objetivo* *
    ```bash
       # Reduce concurrency
       katana -u https://example.com -concurrency 5
    
       # Add delay between requests
       katana -u https://example.com -delay 500
       ```
    
    3. * Problemas de memoria*
    ```bash
       # Limit maximum URLs to crawl
       katana -u https://example.com -max-urls 500
    
       # Disable JavaScript parsing
       katana -u https://example.com -no-js-crawl
       ```
    
    4. ** Cuestiones relativas a los cultivos**
    ```bash
       # Restrict crawling to specific domain
       katana -u https://example.com -crawl-scope strict
    
       # Allow crawling subdomains
       katana -u https://example.com -crawl-scope subs
       ```
    
    ### Debugging
    
    ```bash
    # Enable verbose mode
    katana -u https://example.com -v
    
    # Show debug information
    katana -u https://example.com -debug
    
    # Show request and response details
    katana -u https://example.com -debug -show-request -show-response
    

Configuración

Archivo de configuración

Katana utiliza un archivo de configuración ubicado en $HOME/.config/katana/config.yaml_. Puede personalizar varios ajustes en este archivo:

# Example configuration file
concurrency: 10
delay: 100
timeout: 10
max-depth: 3
crawl-scope: strict
crawl-duration: 0
field: url,path,method
extensions: js,php,aspx

Environment Variables

# Set Katana configuration via environment variables
export KATANA_CONCURRENCY=10
export KATANA_DELAY=100
export KATANA_TIMEOUT=10
export KATANA_MAX_DEPTH=3

Reference

Command Line Options

Flag Description
INLINE_CODE_37 Target URL to crawl
INLINE_CODE_38 File containing list of URLs to crawl
INLINE_CODE_39 File to write output to
INLINE_CODE_40 Write output in JSON format
INLINE_CODE_41 Show only URLs in output
INLINE_CODE_42 Show verbose output
INLINE_CODE_43 Maximum depth to crawl (default: 2)
INLINE_CODE_44 Crawling scope (strict, subs, out-of-scope)
INLINE_CODE_45 Crawler types to use (standard, js, sitemap, robots)
INLINE_CODE_46 Fields to display in output
INLINE_CODE_47 File extensions to include
INLINE_CODE_48 File extensions to exclude
INLINE_CODE_49 Regex pattern to match URLs
INLINE_CODE_50 Regex pattern to filter URLs
INLINE_CODE_51 Condition to match URLs
INLINE_CODE_52 Enable automatic form filling
INLINE_CODE_53 Enable JavaScript parsing
INLINE_CODE_54 Timeout for headless browser (seconds)
INLINE_CODE_55 Path to Chrome browser
INLINE_CODE_56 Number of concurrent requests
INLINE_CODE_57 Delay between requests (milliseconds)
INLINE_CODE_58 Maximum number of requests per second
INLINE_CODE_59 Timeout for HTTP requests (seconds)
INLINE_CODE_60 Maximum number of URLs to crawl
INLINE_CODE_61 HTTP/SOCKS5 proxy to use
INLINE_CODE_62 Custom header to add to all requests
INLINE_CODE_63 Custom cookies to add to all requests
INLINE_CODE_64 Specific paths to crawl
INLINE_CODE_65 File containing paths to crawl
INLINE_CODE_66 Store all responses
INLINE_CODE_67 Directory to store responses
INLINE_CODE_68 Show Katana version

Crawling Scopes

Scope Description
INLINE_CODE_69 Crawl only the exact domain provided
INLINE_CODE_70 Crawl the domain and its subdomains
INLINE_CODE_71 Crawl any domain, regardless of the initial domain

Crawler Types

Type Description
INLINE_CODE_72 Standard HTTP crawler
INLINE_CODE_73 JavaScript parser using headless browser
INLINE_CODE_74 Sitemap-based crawler
INLINE_CODE_75 Robots.txt-based crawler

Field Options

Field Description
INLINE_CODE_76 Full URL
INLINE_CODE_77 URL path
INLINE_CODE_78 HTTP method
INLINE_CODE_79 Host part of URL
INLINE_CODE_80 Fully qualified domain name
INLINE_CODE_81 URL scheme (http/https)
INLINE_CODE_82 URL port
INLINE_CODE_83 Query parameters
INLINE_CODE_84 URL fragment
INLINE_CODE_85 URL endpoint

Resources

  • [Documentación Oficial](URL_86__
  • [Repositorio GitHub](URL_87__
  • [Discord de descubrimiento del producto](URL_88__

-...

*Esta hoja de trampolín proporciona una referencia completa para el uso de Katana, desde el rastreo básico hasta el filtrado avanzado e integración con otras herramientas. Para la información más actualizada, consulte siempre la documentación oficial. *