Pagodo
Pagodo automates Google dorking using the Google Hacking Database (GHDB). It fetches dorks from GHDB, performs multi-threaded searches, and saves results for analysis.
Installation
Linux/Ubuntu
# Clone repository
git clone https://github.com/opsdisk/pagodo.git
cd pagodo
# Install requirements
pip3 install -r requirements.txt
# Install Selenium WebDriver
# Download chromedriver matching your Chrome version
# https://chromedriver.chromium.org/
# Make executable
chmod +x pagodo.py
macOS
# Install via Homebrew
brew install chromedriver
# Clone and install
git clone https://github.com/opsdisk/pagodo.git
pip3 install -r requirements.txt
Basic Usage
# Fetch GHDB and scan domain
python3 pagodo.py -d target.com -g dorks.txt
# Scan with existing dork list
python3 pagodo.py -d target.com -l dorks.txt
# Save results
python3 pagodo.py -d target.com -g dorks.txt -s results.json
Command-Line Options
| Option | Description |
|---|---|
-d, --domain <DOMAIN> | Target domain to scan |
-g, --google-dorks <FILE> | Fetch dorks from GHDB and save to file |
-l, --dork-list <FILE> | Use existing dork list file |
-s, --save-dorks <FILE> | Save discovered items to JSON |
-e, --exclude <PATTERN> | Exclude results matching pattern |
-t, --threads <NUM> | Thread count for parallel search |
--proxy <IP:PORT> | HTTP proxy address |
--timeout <SEC> | Request timeout |
-v, --verbose | Verbose output |
--headless | Run Selenium in headless mode |
Getting Started
Download GHDB Dorks
# Fetch latest dorks from Google Hacking Database
python3 pagodo.py -g dorks.txt
# View downloaded dorks
head -20 dorks.txt
wc -l dorks.txt # Count total dorks
Basic Domain Scan
# Scan with fetched dorks
python3 pagodo.py -d target.com -g dorks.txt -s results.json
# Scan with existing dork list
python3 pagodo.py -d target.com -l dorks.txt -s findings.json
# Verbose output
python3 pagodo.py -d target.com -l dorks.txt -v
Practical Scenarios
Complete OSINT on Company
# 1. Fetch GHDB dorks
python3 pagodo.py -g company_dorks.txt
# 2. Scan company domain
python3 pagodo.py -d company.com -l company_dorks.txt -s company_findings.json
# 3. Analyze results
python3 << 'EOF'
import json
with open('company_findings.json') as f:
results = json.load(f)
for item in results:
print(f"{item['dork']}: {item['url']}")
EOF
Sensitive Data Discovery
# Create sensitive data-focused dorks
cat > sensitive_dorks.txt << EOF
site:target.com filetype:pdf confidential
site:target.com filetype:xls password
site:target.com filetype:doc secret
site:target.com "api_key"
site:target.com "private_key"
site:target.com "secret"
EOF
# Execute scan
python3 pagodo.py -d target.com -l sensitive_dorks.txt -s sensitive_findings.json
# Review findings
cat sensitive_findings.json | python3 -m json.tool
Subdomain and Infrastructure Discovery
# Infrastructure dorks
cat > infrastructure_dorks.txt << EOF
site:*.target.com
site:*.target.co.uk
site:target.com inurl:admin
site:target.com inurl:api
site:target.com inurl:dev
site:target.com inurl:staging
site:target.com inurl:test
EOF
python3 pagodo.py -d target.com -l infrastructure_dorks.txt -s infrastructure.json
Advanced Techniques
Parallel Execution
# Increase threads for faster scanning
python3 pagodo.py -d target.com -l dorks.txt -t 20 -s results.json
# Monitor progress
python3 pagodo.py -d target.com -l dorks.txt -t 10 -v | tee scan.log
Result Filtering
# Exclude false positives
python3 pagodo.py -d target.com -l dorks.txt \
-e "404|error|not found" \
-s filtered_results.json
# Multiple exclusions
python3 pagodo.py -d target.com -l dorks.txt \
-e "404|500|error|denied" \
-s clean_results.json
Proxy Configuration
# Route through proxy
python3 pagodo.py -d target.com -l dorks.txt \
--proxy http://127.0.0.1:8080 \
-s results.json
# Burp Suite proxy
python3 pagodo.py -d target.com -l dorks.txt \
--proxy http://127.0.0.1:8080 \
-v
Headless Mode
# Run without browser window
python3 pagodo.py -d target.com -l dorks.txt \
--headless \
-t 15 \
-s results.json
Custom Dork Lists
Create Targeted Lists
# Database exposure dorks
cat > database_dorks.txt << EOF
site:target.com inurl:phpmyadmin
site:target.com inurl:pgadmin
site:target.com filetype:sql
site:target.com filetype:sql.bak
site:target.com "mysql" "password"
EOF
# CMS-specific dorks
cat > cms_dorks.txt << EOF
site:target.com inurl:wp-admin
site:target.com/administrator
site:target.com inurl:joomla
site:target.com inurl:drupal
EOF
# Execute scans
python3 pagodo.py -d target.com -l database_dorks.txt -s db_results.json
python3 pagodo.py -d target.com -l cms_dorks.txt -s cms_results.json
Results Analysis
Parse JSON Output
# Extract URLs from results
python3 << 'EOF'
import json
import sys
with open('results.json') as f:
results = json.load(f)
for item in results:
print(f"URL: {item.get('url', 'N/A')}")
print(f"Dork: {item.get('dork', 'N/A')}")
print(f"Title: {item.get('title', 'N/A')}")
print("---")
EOF
# Count results per dork
python3 << 'EOF'
import json
from collections import Counter
with open('results.json') as f:
results = json.load(f)
dorks = [item['dork'] for item in results]
dork_counts = Counter(dorks)
for dork, count in dork_counts.most_common(10):
print(f"{count}: {dork}")
EOF
Validate and Filter Results
# Check if URLs are still accessible
python3 << 'EOF'
import json
import requests
import sys
with open('results.json') as f:
results = json.load(f)
valid = []
for item in results:
try:
response = requests.head(item['url'], timeout=5)
if response.status_code < 400:
valid.append(item)
except:
pass
print(f"Valid URLs: {len(valid)}/{len(results)}")
with open('valid_results.json', 'w') as f:
json.dump(valid, f, indent=2)
EOF
Real-World Examples
Financial Sector Assessment
# Financial-focused dorks
cat > finance_dorks.txt << EOF
site:target.com inurl:banking
site:target.com inurl:transfer
site:target.com filetype:pdf "account number"
site:target.com "SWIFT"
site:target.com "routing number"
site:target.com "credit card"
EOF
python3 pagodo.py -d target.com -l finance_dorks.txt \
-e "404|denied|not found" \
-t 10 \
-s financial_findings.json
Healthcare Organization Assessment
# Healthcare compliance dorks
cat > healthcare_dorks.txt << EOF
site:target.com inurl:patient
site:target.com inurl:medical
site:target.com filetype:pdf "patient"
site:target.com "HIPAA"
site:target.com "medical record"
site:target.com inurl:radiology
EOF
python3 pagodo.py -d target.com -l healthcare_dorks.txt -s healthcare_findings.json
Technology Company Assessment
# Tech company-specific scans
cat > tech_dorks.txt << EOF
site:target.com inurl:github
site:target.com inurl:gitlab
site:target.com filetype:java source
site:target.com filetype:py script
site:target.com "docker" "password"
site:target.com "kubernetes"
EOF
python3 pagodo.py -d target.com -l tech_dorks.txt \
-s tech_findings.json \
-v
Workflow Integration
Automated Reconnaissance Pipeline
#!/bin/bash
# Comprehensive Google dorking workflow
TARGET="target.com"
DATE=$(date +%Y%m%d)
# Step 1: Download fresh GHDB
python3 pagodo.py -g dorks_${DATE}.txt
# Step 2: Execute scan
python3 pagodo.py -d "$TARGET" -l dorks_${DATE}.txt \
-s results_${DATE}.json \
-t 15
# Step 3: Analyze results
echo "=== Scan Results ==="
cat results_${DATE}.json | python3 -m json.tool
# Step 4: Extract URLs for further testing
python3 << EOF
import json
with open('results_${DATE}.json') as f:
results = json.load(f)
urls = [item['url'] for item in results]
with open('urls_${DATE}.txt', 'w') as f:
for url in urls:
f.write(url + '\n')
EOF
echo "Found $(wc -l < urls_${DATE}.txt) unique URLs"
Troubleshooting
WebDriver Issues
# Ensure chromedriver matches Chrome version
chrome --version
ls -la /path/to/chromedriver
# Path to chromedriver
export PATH=$PATH:/path/to/chromedriver/directory
# Run in headless mode if display issues
python3 pagodo.py -d target.com -l dorks.txt --headless
Slow Scanning
# Check internet connection
ping -c 1 google.com
# Reduce thread count if rate-limited
python3 pagodo.py -d target.com -l dorks.txt -t 5
# Increase timeout for slow connections
python3 pagodo.py -d target.com -l dorks.txt --timeout 30
No Results
# Verify dork list
head -5 dorks.txt
# Test with simple dork
echo "site:target.com" > test.txt
python3 pagodo.py -d target.com -l test.txt -v
# Check target domain exists
dig target.com
Best Practices
- Update GHDB dorks regularly (-g option)
- Use specific dorks for targeted searches
- Combine multiple dork techniques
- Filter common false positives
- Verify discovered URLs before investigation
- Document all findings with timestamps
- Use appropriate thread counts (5-15)
- Combine with other reconnaissance tools
- Respect Google’s terms of service
- Follow responsible disclosure practices
- Maintain separate logs per target/date
Last updated: 2025-03-30 | Pagodo GitHub