Overview
Goofile is a reconnaissance tool that uses Google dorks to find specific file types hosted on a target domain. It automates the process of searching for potentially sensitive files (PDFs, documents, source code, configs, backups) that may be publicly accessible. Goofile is useful for OSINT (Open Source Intelligence) gathering and authorized penetration testing.
Installation
Prerequisites
sudo apt-get update
sudo apt-get install python3 python3-pip git
Clone from GitHub
git clone https://github.com/1007/goofile.git
cd goofile
pip3 install -r requirements.txt
Alternative: Direct pip installation
pip3 install goofile
Verify Installation
python3 goofile.py --help
# or if installed via pip
goofile --help
Basic Syntax
python3 goofile.py -d <domain> -f <filetype>
Command Line Options
| Option | Description | Example |
|---|
-d, --domain | Target domain (required) | -d example.com |
-f, --filetype | File type to search for | -f pdf |
-l, --limit | Max results to return | -l 50 |
-t, --timeout | Search timeout in seconds | -t 10 |
--proxy | Use HTTP proxy | --proxy http://proxy:8080 |
--user-agent | Custom User-Agent string | --user-agent "Mozilla/5.0..." |
-o, --output | Save results to file | -o results.txt |
-v, --verbose | Verbose output | -v |
--delay | Delay between requests (seconds) | --delay 2 |
Quick Start Examples
Search for PDFs
# Find all PDFs on a domain
python3 goofile.py -d example.com -f pdf
# Find PDFs with limit of 20 results
python3 goofile.py -d example.com -f pdf -l 20
# Find PDFs and save to file
python3 goofile.py -d example.com -f pdf -o pdfs_found.txt
Search for Documents
# Microsoft Word documents
python3 goofile.py -d example.com -f docx
# Excel spreadsheets
python3 goofile.py -d example.com -f xlsx
# PowerPoint presentations
python3 goofile.py -d example.com -f pptx
Search for Source Code and Config Files
# Search for JavaScript files
python3 goofile.py -d example.com -f js
# Search for configuration files
python3 goofile.py -d example.com -f conf
# Search for backup files
python3 goofile.py -d example.com -f bak
Common File Types to Search
| File Type | Typical Content |
|---|
pdf | Documents, reports, manuals |
docx / doc | Word documents, specifications |
xlsx / xls | Spreadsheets, budgets, data |
pptx / ppt | Presentations, slides |
zip / rar | Archives, backups |
sql | Database dumps |
txt | Text files, logs, config |
conf / config | Configuration files |
bak / backup | Backup files |
exe / zip | Executable files |
log | Log files |
csv | CSV data files |
Advanced Search Strategies
Multi-File Type Search
# Search for multiple file types sequentially
for filetype in pdf docx xlsx txt sql; do
echo "[*] Searching for $filetype files..."
python3 goofile.py -d example.com -f $filetype -o results_$filetype.txt
done
# Combine results
cat results_*.txt > all_results.txt
Search Subdomains
# Search root domain
python3 goofile.py -d example.com -f pdf
# Search subdomain
python3 goofile.py -d mail.example.com -f pdf
# Search common subdomains
for subdomain in www mail ftp admin dev test; do
python3 goofile.py -d $subdomain.example.com -f pdf -o $subdomain.txt
done
Batch Domain Scanning
#!/bin/bash
# Scan multiple domains for PDFs
domains=(
"target1.com"
"target2.com"
"target3.com"
)
for domain in "${domains[@]}"; do
echo "[*] Scanning $domain"
python3 goofile.py -d "$domain" -f pdf -o "${domain}_pdfs.txt"
python3 goofile.py -d "$domain" -f docx -o "${domain}_docs.txt"
done
Using Proxies
HTTP Proxy
# Route through proxy server
python3 goofile.py \
-d example.com \
-f pdf \
--proxy http://proxy.company.com:8080
# With authentication
python3 goofile.py \
-d example.com \
-f pdf \
--proxy http://user:pass@proxy.com:8080
SOCKS5 Proxy
# Some versions support SOCKS
python3 goofile.py \
-d example.com \
-f pdf \
--proxy socks5://127.0.0.1:9050
Custom User-Agent and Headers
Change User-Agent
# Use custom User-Agent to avoid detection
python3 goofile.py \
-d example.com \
-f pdf \
--user-agent "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
Delay Between Requests
# Add delays to be less aggressive
python3 goofile.py \
-d example.com \
-f pdf \
--delay 2 \
-l 100
Output and Result Processing
Save Results
# Save to text file
python3 goofile.py -d example.com -f pdf -o results.txt
# View results
cat results.txt
# Count results
wc -l results.txt
# Extract domain names
python3 goofile.py -d example.com -f pdf | cut -d'/' -f3 | sort -u
# Filter for specific pattern
python3 goofile.py -d example.com -f pdf | grep -i "confidential"
# Download found files (with caution)
python3 goofile.py -d example.com -f pdf | xargs -I {} wget {}
Python Script for Results Processing
#!/usr/bin/env python3
import subprocess
import json
from urllib.parse import urlparse
def search_and_process(domain, filetype):
"""Search for files and process results"""
cmd = [
'python3', 'goofile.py',
'-d', domain,
'-f', filetype,
'-v'
]
results = []
try:
output = subprocess.check_output(cmd, text=True)
for line in output.strip().split('\n'):
if line.startswith('http'):
results.append(line)
except subprocess.CalledProcessError as e:
print(f"Error: {e}")
return results
def analyze_results(results):
"""Analyze found URLs"""
analysis = {
'total': len(results),
'domains': set(),
'paths': set()
}
for url in results:
parsed = urlparse(url)
analysis['domains'].add(parsed.netloc)
analysis['paths'].add(parsed.path)
return analysis
# Usage
files = search_and_process('example.com', 'pdf')
print(f"Found {len(files)} PDF files")
analysis = analyze_results(files)
print(f"Unique domains: {len(analysis['domains'])}")
print(f"Unique paths: {len(analysis['paths'])}")
Reconnaissance Workflow
Step 1: Domain Enumeration
# Start with basic domain information
nslookup example.com
whois example.com
# Check subdomains
python3 goofile.py -d example.com -f pdf
python3 goofile.py -d www.example.com -f pdf
Step 2: File Type Discovery
# Common sensitive file types in order of interest
filetypes=(
"pdf"
"docx"
"xlsx"
"sql"
"backup"
"conf"
"log"
)
for ftype in "${filetypes[@]}"; do
echo "[*] Searching for $ftype..."
python3 goofile.py -d example.com -f $ftype -l 50 >> findings.txt
done
Step 3: Results Analysis
# Compile and deduplicate results
cat findings.txt | sort -u > unique_findings.txt
# Group by file type
grep "\.pdf$" unique_findings.txt > pdfs.txt
grep "\.docx$" unique_findings.txt > docs.txt
grep "\.xlsx$" unique_findings.txt > sheets.txt
# Count by type
echo "PDFs: $(wc -l < pdfs.txt)"
echo "Docs: $(wc -l < docs.txt)"
echo "Sheets: $(wc -l < sheets.txt)"
Google Dork Equivalents
Goofile automates the following Google dork searches:
# PDF files on domain
site:example.com filetype:pdf
# Word documents
site:example.com filetype:docx
# Excel spreadsheets
site:example.com filetype:xlsx
# Log files
site:example.com filetype:log
# Config files
site:example.com filetype:conf
# Backup files
site:example.com filetype:bak
# Combined search
site:example.com (filetype:pdf OR filetype:docx OR filetype:xlsx)
Handling Rate Limiting
Graceful Rate Limiting
# Slower scan with delays
python3 goofile.py \
-d example.com \
-f pdf \
--delay 3 \
--timeout 15 \
-l 50
Multiple Search Sessions
# Split searches across time
python3 goofile.py -d example.com -f pdf -l 20 &
sleep 60
python3 goofile.py -d example.com -f docx -l 20 &
sleep 60
python3 goofile.py -d example.com -f xlsx -l 20 &
# Wait for all to complete
wait
Troubleshooting
Common Issues
| Problem | Solution |
|---|
| No results found | Check domain spelling, try different file types |
| Connection timeout | Increase timeout: --timeout 20 |
| ”403 Forbidden” | Google blocking requests, use proxy or reduce limit |
| No module found | Install deps: pip3 install -r requirements.txt |
| Slow results | Results depend on Google indexing, may take time |
Debugging
# Enable verbose output
python3 goofile.py -d example.com -f pdf -v
# Check Python version
python3 --version
# Verify internet connectivity
ping -c 1 google.com
# Test with simpler domain
python3 goofile.py -d google.com -f pdf
Ethical Considerations
Authorized Use Only
- Only search domains you own or have written authorization to scan
- Respect robots.txt and site terms of service
- Use appropriate delays to avoid overloading servers
- Do not download files without authorization
- Document all findings and report responsibly
Privacy Concerns
- Files found through Goofile may contain sensitive information
- Handle discovered data responsibly
- Notify system administrators of findings
- Follow responsible disclosure practices
With OSINT Frameworks
# Combine with other reconnaissance
# 1. Enumerate domains
# 2. Run Goofile on each domain
# 3. Combine results with other tools
# Example: Nmap + Goofile workflow
nmap -sV example.com > nmap_results.txt
python3 goofile.py -d example.com -f pdf > goofile_results.txt
Automation Script
#!/bin/bash
# Complete reconnaissance script
TARGET=$1
echo "[*] Starting reconnaissance on $TARGET"
# DNS enumeration
nslookup $TARGET > recon/$TARGET.dns
# Goofile search
echo "[*] Running Goofile..."
python3 goofile.py -d $TARGET -f pdf > recon/$TARGET.pdfs &
python3 goofile.py -d $TARGET -f docx > recon/$TARGET.docs &
python3 goofile.py -d $TARGET -f xlsx > recon/$TARGET.sheets &
python3 goofile.py -d $TARGET -f sql > recon/$TARGET.sql &
python3 goofile.py -d $TARGET -f bak > recon/$TARGET.bak &
# Wait for all processes
wait
# Combine results
cat recon/$TARGET.* > recon/$TARGET.combined.txt
echo "[+] Reconnaissance complete. Results in recon/ directory"
| Tool | Purpose |
|---|
| Metagoofile | Similar Google dorks tool (older) |
| Google Dorking | Manual search using Google |
| Censys | Internet-wide database search |
| Shodan | IoT device search engine |
| OSINT Framework | Comprehensive OSINT toolkit |
Resources