Aller au contenu

Katana Web Crawler Feuille de chaleur

Aperçu général

Katana est un cadre de crawling web rapide et personnalisable développé par Project Discovery. Il est conçu pour ramper les sites Web efficacement pour recueillir des informations et découvrir les paramètres. Katana se distingue par sa rapidité, sa flexibilité et sa concentration sur les cas d'utilisation de tests de sécurité.

Ce qui rend Katana unique est sa capacité à ramper intelligemment les applications web modernes, y compris les applications à une page (SPA) qui dépendent fortement de JavaScript. Il peut gérer des technologies Web complexes et extraire des informations précieuses telles que les URL, les fichiers JavaScript, les terminaux API et d'autres actifs Web. Katana est construit en ayant à l'esprit les professionnels de la sécurité, ce qui en fait un excellent outil de reconnaissance lors des évaluations de sécurité et de la chasse au bounty.

Katana prend en charge diverses stratégies de rampage, notamment le rampage standard, l'analyse JavaScript et le rampage basé sur la carte du site. Il peut être personnalisé pour se concentrer sur des types spécifiques de ressources ou suivre des modèles particuliers, le rendant adaptable à différents scénarios de tests de sécurité. L'outil est conçu pour être facilement intégré dans les processus de test de sécurité et peut être combiné avec d'autres outils de découverte de projet pour une reconnaissance complète.

Installation

Utilisation de Go

# Install using Go (requires Go 1.20 or later)
go install -v github.com/projectdiscovery/katana/cmd/katana@latest

# Verify installation
katana -version

Utilisation de Docker

# Pull the latest Docker image
docker pull projectdiscovery/katana:latest

# Run Katana using Docker
docker run -it projectdiscovery/katana:latest -h
```_

### Utilisation de Homebrew (macOS)

```bash
# Install using Homebrew
brew install katana

# Verify installation
katana -version
```_

### Utilisation de PDTM (Project Discovery Tools Manager)

```bash
# Install PDTM first if not already installed
go install -v github.com/projectdiscovery/pdtm/cmd/pdtm@latest

# Install Katana using PDTM
pdtm -i katana

# Verify installation
katana -version

Sur Kali Linux

# Install using apt
sudo apt install katana

# Verify installation
katana -version

Utilisation de base

Dessiner une URL unique

# Crawl a single URL
katana -u https://example.com

# Crawl with increased verbosity
katana -u https://example.com -v

# Crawl with debug information
katana -u https://example.com -debug

Dessiner plusieurs URLs

# Crawl multiple URLs
katana -u https://example.com,https://test.com

# Crawl from a list of URLs
katana -list urls.txt

# Crawl from STDIN
cat urls.txt|katana

Options de sortie

# Save results to a file
katana -u https://example.com -o results.txt

# Output in JSON format
katana -u https://example.com -json -o results.json

# Silent mode (only URLs)
katana -u https://example.com -silent

Options de calibrage

Profondeur et portée

# Set crawling depth (default: 2)
katana -u https://example.com -depth 3

# Crawl subdomains (default: false)
katana -u https://example.com -crawl-scope subs

# Crawl out of scope (default: false)
katana -u https://example.com -crawl-scope out-of-scope

# Crawl only in scope
katana -u https://example.com -crawl-scope strict

Stratégies de calibrage

# Use standard crawler
katana -u https://example.com -crawler standard

# Use JavaScript parser
katana -u https://example.com -crawler js

# Use sitemap-based crawler
katana -u https://example.com -crawler sitemap

# Use robots.txt-based crawler
katana -u https://example.com -crawler robots

# Use all crawlers
katana -u https://example.com -crawler standard,js,sitemap,robots

Sélection du champ

# Display specific fields
katana -u https://example.com -field url,path,method

# Available fields: url, path, method, host, fqdn, scheme, port, query, fragment, endpoint

Utilisation avancée

Filtre URL

# Match URLs by regex
katana -u https://example.com -match-regex "admin|login|dashboard"

# Filter URLs by regex
katana -u https://example.com -filter-regex "logout|static|images"

# Match URLs by condition
katana -u https://example.com -field url -match-condition "contains('admin')"

Filtrage des ressources

# Include specific file extensions
katana -u https://example.com -extension js,php,aspx

# Exclude specific file extensions
katana -u https://example.com -exclude-extension png,jpg,gif

# Include specific MIME types
katana -u https://example.com -mime-type application/json,text/html

Formulaire de remplissage

# Enable automatic form filling
katana -u https://example.com -form-fill

# Use custom form values
katana -u https://example.com -form-fill -field-name "username=admin&password=admin"

Parsing JavaScript

# Enable JavaScript parsing
katana -u https://example.com -js-crawl

# Set headless browser timeout
katana -u https://example.com -js-crawl -headless-timeout 20

# Set browser path
katana -u https://example.com -js-crawl -chrome-path /path/to/chrome

Optimisation des performances

Concurrence et limitation des taux

# Set concurrency (default: 10)
katana -u https://example.com -concurrency 20

# Set delay between requests (milliseconds)
katana -u https://example.com -delay 100

# Set rate limit (requests per second)
katana -u https://example.com -rate-limit 50

Options de délai

# Set timeout for HTTP requests (seconds)
katana -u https://example.com -timeout 10

# Set timeout for headless browser (seconds)
katana -u https://example.com -js-crawl -headless-timeout 30

Optimisation pour les grands balayages

# Disable automatic form filling for faster crawling
katana -u https://example.com -no-form-fill

# Disable JavaScript parsing for faster crawling
katana -u https://example.com -no-js-crawl

# Limit maximum URLs to crawl
katana -u https://example.com -max-urls 1000

Intégration avec d'autres outils

Pipeline avec sous-marin

# Find subdomains and crawl them
subfinder -d example.com -silent|katana -silent

# Find subdomains, crawl them, and extract JavaScript files
subfinder -d example.com -silent|katana -silent -extension js

Pipeline avec HTTPX

# Probe URLs and crawl active ones
httpx -l urls.txt -silent|katana -silent

# Crawl and then probe discovered endpoints
katana -u https://example.com -silent|httpx -silent

Pipeline avec Nuclei

# Crawl and scan for vulnerabilities
katana -u https://example.com -silent|nuclei -t cves/

# Crawl, extract JavaScript files, and scan for vulnerabilities
katana -u https://example.com -silent -extension js|nuclei -t exposures/

Personnalisation des sorties

Format de sortie personnalisé

# Output only URLs
katana -u https://example.com -silent

# Output URLs with specific fields
katana -u https://example.com -field url,path,method -o results.txt

# Count discovered URLs
katana -u https://example.com -silent|wc -l

# Sort output alphabetically
katana -u https://example.com -silent|sort

Filtrage Sortie

# Filter by file extension
katana -u https://example.com -silent|grep "\.js$"

# Filter by endpoint pattern
katana -u https://example.com -silent|grep "/api/"

# Find unique domains
katana -u https://example.com -silent|awk -F/ '\\\\{print $3\\\\}'|sort -u

Filtre avancé

Correspondance du modèle d'URL

# Match specific URL patterns
katana -u https://example.com -match-regex "^https://example.com/admin"

# Filter out specific URL patterns
katana -u https://example.com -filter-regex "^https://example.com/static"

# Match URLs containing specific query parameters
katana -u https://example.com -match-regex "id=[0-9]+"

Filtrage du contenu

# Match responses containing specific content
katana -u https://example.com -match-condition "contains(body, 'admin')"

# Filter responses by status code
katana -u https://example.com -match-condition "status == 200"

# Match responses by content type
katana -u https://example.com -match-condition "contains(content_type, 'application/json')"

Options de procuration et de réseau

# Use HTTP proxy
katana -u https://example.com -proxy http://127.0.0.1:8080

# Use SOCKS5 proxy
katana -u https://example.com -proxy socks5://127.0.0.1:1080

# Set custom headers
katana -u https://example.com -header "User-Agent: Mozilla/5.0" -header "Cookie: session=123456"

# Set custom cookies
katana -u https://example.com -cookie "session=123456; user=admin"

Divers Caractéristiques

Remplissage automatique du formulaire

# Enable automatic form filling
katana -u https://example.com -form-fill

# Set custom form values
katana -u https://example.com -form-fill -field-name "username=admin&password=admin"

Tracer des chemins spécifiques

# Crawl specific paths
katana -u https://example.com -paths /admin,/login,/dashboard

# Crawl from a file containing paths
katana -u https://example.com -paths-file paths.txt

Stockage des réponses

# Store all responses
katana -u https://example.com -store-response

# Specify response storage directory
katana -u https://example.com -store-response -store-response-dir responses/

Dépannage

Questions communes

  1. JavaScript analyse les problèmes
   # Increase headless browser timeout
   katana -u https://example.com -js-crawl -headless-timeout 30

   # Specify Chrome path manually
   katana -u https://example.com -js-crawl -chrome-path /usr/bin/google-chrome
   ```

2. **Limitation des taux par objectif* *
```bash
   # Reduce concurrency
   katana -u https://example.com -concurrency 5

   # Add delay between requests
   katana -u https://example.com -delay 500
   ```

3. **Questions de mémoire**
```bash
   # Limit maximum URLs to crawl
   katana -u https://example.com -max-urls 500

   # Disable JavaScript parsing
   katana -u https://example.com -no-js-crawl
   ```

4. ** Questions de portée générale**
```bash
   # Restrict crawling to specific domain
   katana -u https://example.com -crawl-scope strict

   # Allow crawling subdomains
   katana -u https://example.com -crawl-scope subs
   ```

### Déboguement

```bash
# Enable verbose mode
katana -u https://example.com -v

# Show debug information
katana -u https://example.com -debug

# Show request and response details
katana -u https://example.com -debug -show-request -show-response

Configuration

Fichier de configuration

Katana utilise un fichier de configuration situé à $HOME/.config/katana/config.yaml. Vous pouvez personnaliser différents paramètres dans ce fichier :

# Example configuration file
concurrency: 10
delay: 100
timeout: 10
max-depth: 3
crawl-scope: strict
crawl-duration: 0
field: url,path,method
extensions: js,php,aspx

Variables d'environnement

# Set Katana configuration via environment variables
export KATANA_CONCURRENCY=10
export KATANA_DELAY=100
export KATANA_TIMEOUT=10
export KATANA_MAX_DEPTH=3

Référence

Options de ligne de commande

Flag Description
-u, -url Target URL to crawl
-list, -l File containing list of URLs to crawl
-o, -output File to write output to
-json Write output in JSON format
-silent Show only URLs in output
-v, -verbose Show verbose output
-depth Maximum depth to crawl (default: 2)
-crawl-scope Crawling scope (strict, subs, out-of-scope)
-crawler Crawler types to use (standard, js, sitemap, robots)
-field Fields to display in output
-extension File extensions to include
-exclude-extension File extensions to exclude
-match-regex Regex pattern to match URLs
-filter-regex Regex pattern to filter URLs
-match-condition Condition to match URLs
-form-fill Enable automatic form filling
-js-crawl Enable JavaScript parsing
-headless-timeout Timeout for headless browser (seconds)
-chrome-path Path to Chrome browser
-concurrency Number of concurrent requests
-delay Delay between requests (milliseconds)
-rate-limit Maximum number of requests per second
-timeout Timeout for HTTP requests (seconds)
-max-urls Maximum number of URLs to crawl
-proxy HTTP/SOCKS5 proxy to use
-header Custom header to add to all requests
-cookie Custom cookies to add to all requests
-paths Specific paths to crawl
-paths-file File containing paths to crawl
-store-response Store all responses
-store-response-dir Directory to store responses
-version Show Katana version

Champ d'application

Scope Description
strict Crawl only the exact domain provided
subs Crawl the domain and its subdomains
out-of-scope Crawl any domain, regardless of the initial domain

Types de rampants

Type Description
standard Standard HTTP crawler
js JavaScript parser using headless browser
sitemap Sitemap-based crawler
robots Robots.txt-based crawler

Options sur le terrain

Field Description
url Full URL
path URL path
method HTTP method
host Host part of URL
fqdn Fully qualified domain name
scheme URL scheme (http/https)
port URL port
query Query parameters
fragment URL fragment
endpoint URL endpoint

Ressources


*Cette feuille de triche fournit une référence complète pour l'utilisation de Katana, du rampage de base au filtrage avancé et l'intégration avec d'autres outils. Pour les informations les plus récentes, veuillez toujours consulter la documentation officielle. *