Goofile

Overview

Goofile is a reconnaissance tool that uses Google dorks to find specific file types hosted on a target domain. It automates the process of searching for potentially sensitive files (PDFs, documents, source code, configs, backups) that may be publicly accessible. Goofile is useful for OSINT (Open Source Intelligence) gathering and authorized penetration testing.

Installation

Prerequisites

sudo apt-get update
sudo apt-get install python3 python3-pip git

Clone from GitHub

git clone https://github.com/1007/goofile.git
cd goofile
pip3 install -r requirements.txt

Alternative: Direct pip installation

pip3 install goofile

Verify Installation

python3 goofile.py --help
# or if installed via pip
goofile --help

Basic Syntax

python3 goofile.py -d <domain> -f <filetype>

Command Line Options

Option	Description	Example
`-d, --domain`	Target domain (required)	`-d example.com`
`-f, --filetype`	File type to search for	`-f pdf`
`-l, --limit`	Max results to return	`-l 50`
`-t, --timeout`	Search timeout in seconds	`-t 10`
`--proxy`	Use HTTP proxy	`--proxy http://proxy:8080`
`--user-agent`	Custom User-Agent string	`--user-agent "Mozilla/5.0..."`
`-o, --output`	Save results to file	`-o results.txt`
`-v, --verbose`	Verbose output	`-v`
`--delay`	Delay between requests (seconds)	`--delay 2`

Quick Start Examples

Search for PDFs

# Find all PDFs on a domain
python3 goofile.py -d example.com -f pdf

# Find PDFs with limit of 20 results
python3 goofile.py -d example.com -f pdf -l 20

# Find PDFs and save to file
python3 goofile.py -d example.com -f pdf -o pdfs_found.txt

Search for Documents

# Microsoft Word documents
python3 goofile.py -d example.com -f docx

# Excel spreadsheets
python3 goofile.py -d example.com -f xlsx

# PowerPoint presentations
python3 goofile.py -d example.com -f pptx

Search for Source Code and Config Files

# Search for JavaScript files
python3 goofile.py -d example.com -f js

# Search for configuration files
python3 goofile.py -d example.com -f conf

# Search for backup files
python3 goofile.py -d example.com -f bak

Common File Types to Search

File Type	Typical Content
`pdf`	Documents, reports, manuals
`docx` / `doc`	Word documents, specifications
`xlsx` / `xls`	Spreadsheets, budgets, data
`pptx` / `ppt`	Presentations, slides
`zip` / `rar`	Archives, backups
`sql`	Database dumps
`txt`	Text files, logs, config
`conf` / `config`	Configuration files
`bak` / `backup`	Backup files
`exe` / `zip`	Executable files
`log`	Log files
`csv`	CSV data files

Advanced Search Strategies

Multi-File Type Search

# Search for multiple file types sequentially
for filetype in pdf docx xlsx txt sql; do
  echo "[*] Searching for $filetype files..."
  python3 goofile.py -d example.com -f $filetype -o results_$filetype.txt
done

# Combine results
cat results_*.txt > all_results.txt

Search Subdomains

# Search root domain
python3 goofile.py -d example.com -f pdf

# Search subdomain
python3 goofile.py -d mail.example.com -f pdf

# Search common subdomains
for subdomain in www mail ftp admin dev test; do
  python3 goofile.py -d $subdomain.example.com -f pdf -o $subdomain.txt
done

Batch Domain Scanning

#!/bin/bash
# Scan multiple domains for PDFs

domains=(
  "target1.com"
  "target2.com"
  "target3.com"
)

for domain in "${domains[@]}"; do
  echo "[*] Scanning $domain"
  python3 goofile.py -d "$domain" -f pdf -o "${domain}_pdfs.txt"
  python3 goofile.py -d "$domain" -f docx -o "${domain}_docs.txt"
done

Using Proxies

HTTP Proxy

# Route through proxy server
python3 goofile.py \
  -d example.com \
  -f pdf \
  --proxy http://proxy.company.com:8080

# With authentication
python3 goofile.py \
  -d example.com \
  -f pdf \
  --proxy http://user:pass@proxy.com:8080

SOCKS5 Proxy

# Some versions support SOCKS
python3 goofile.py \
  -d example.com \
  -f pdf \
  --proxy socks5://127.0.0.1:9050

Custom User-Agent and Headers

Change User-Agent

# Use custom User-Agent to avoid detection
python3 goofile.py \
  -d example.com \
  -f pdf \
  --user-agent "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"

Delay Between Requests

# Add delays to be less aggressive
python3 goofile.py \
  -d example.com \
  -f pdf \
  --delay 2 \
  -l 100

Output and Result Processing

Save Results

# Save to text file
python3 goofile.py -d example.com -f pdf -o results.txt

# View results
cat results.txt

# Count results
wc -l results.txt

Process Results with Command Line Tools

# Extract domain names
python3 goofile.py -d example.com -f pdf | cut -d'/' -f3 | sort -u

# Filter for specific pattern
python3 goofile.py -d example.com -f pdf | grep -i "confidential"

# Download found files (with caution)
python3 goofile.py -d example.com -f pdf | xargs -I {} wget {}

Python Script for Results Processing

#!/usr/bin/env python3
import subprocess
import json
from urllib.parse import urlparse

def search_and_process(domain, filetype):
    """Search for files and process results"""
    cmd = [
        'python3', 'goofile.py',
        '-d', domain,
        '-f', filetype,
        '-v'
    ]
    
    results = []
    try:
        output = subprocess.check_output(cmd, text=True)
        for line in output.strip().split('\n'):
            if line.startswith('http'):
                results.append(line)
    except subprocess.CalledProcessError as e:
        print(f"Error: {e}")
    
    return results

def analyze_results(results):
    """Analyze found URLs"""
    analysis = {
        'total': len(results),
        'domains': set(),
        'paths': set()
    }
    
    for url in results:
        parsed = urlparse(url)
        analysis['domains'].add(parsed.netloc)
        analysis['paths'].add(parsed.path)
    
    return analysis

# Usage
files = search_and_process('example.com', 'pdf')
print(f"Found {len(files)} PDF files")

analysis = analyze_results(files)
print(f"Unique domains: {len(analysis['domains'])}")
print(f"Unique paths: {len(analysis['paths'])}")

Reconnaissance Workflow

Step 1: Domain Enumeration

# Start with basic domain information
nslookup example.com
whois example.com

# Check subdomains
python3 goofile.py -d example.com -f pdf
python3 goofile.py -d www.example.com -f pdf

Step 2: File Type Discovery

# Common sensitive file types in order of interest
filetypes=(
  "pdf"
  "docx"
  "xlsx"
  "sql"
  "backup"
  "conf"
  "log"
)

for ftype in "${filetypes[@]}"; do
  echo "[*] Searching for $ftype..."
  python3 goofile.py -d example.com -f $ftype -l 50 >> findings.txt
done

Step 3: Results Analysis

# Compile and deduplicate results
cat findings.txt | sort -u > unique_findings.txt

# Group by file type
grep "\.pdf$" unique_findings.txt > pdfs.txt
grep "\.docx$" unique_findings.txt > docs.txt
grep "\.xlsx$" unique_findings.txt > sheets.txt

# Count by type
echo "PDFs: $(wc -l < pdfs.txt)"
echo "Docs: $(wc -l < docs.txt)"
echo "Sheets: $(wc -l < sheets.txt)"

Google Dork Equivalents

Goofile automates the following Google dork searches:

# PDF files on domain
site:example.com filetype:pdf

# Word documents
site:example.com filetype:docx

# Excel spreadsheets
site:example.com filetype:xlsx

# Log files
site:example.com filetype:log

# Config files
site:example.com filetype:conf

# Backup files
site:example.com filetype:bak

# Combined search
site:example.com (filetype:pdf OR filetype:docx OR filetype:xlsx)

Handling Rate Limiting

Graceful Rate Limiting

# Slower scan with delays
python3 goofile.py \
  -d example.com \
  -f pdf \
  --delay 3 \
  --timeout 15 \
  -l 50

Multiple Search Sessions

# Split searches across time
python3 goofile.py -d example.com -f pdf -l 20 &
sleep 60
python3 goofile.py -d example.com -f docx -l 20 &
sleep 60
python3 goofile.py -d example.com -f xlsx -l 20 &

# Wait for all to complete
wait

Troubleshooting

Common Issues

Problem	Solution
No results found	Check domain spelling, try different file types
Connection timeout	Increase timeout: `--timeout 20`
”403 Forbidden”	Google blocking requests, use proxy or reduce limit
No module found	Install deps: `pip3 install -r requirements.txt`
Slow results	Results depend on Google indexing, may take time

Debugging

# Enable verbose output
python3 goofile.py -d example.com -f pdf -v

# Check Python version
python3 --version

# Verify internet connectivity
ping -c 1 google.com

# Test with simpler domain
python3 goofile.py -d google.com -f pdf

Ethical Considerations

Authorized Use Only

Only search domains you own or have written authorization to scan
Respect robots.txt and site terms of service
Use appropriate delays to avoid overloading servers
Do not download files without authorization
Document all findings and report responsibly

Privacy Concerns

Files found through Goofile may contain sensitive information
Handle discovered data responsibly
Notify system administrators of findings
Follow responsible disclosure practices

Integration with Other Tools

With OSINT Frameworks

# Combine with other reconnaissance
# 1. Enumerate domains
# 2. Run Goofile on each domain
# 3. Combine results with other tools

# Example: Nmap + Goofile workflow
nmap -sV example.com > nmap_results.txt
python3 goofile.py -d example.com -f pdf > goofile_results.txt

Automation Script

#!/bin/bash
# Complete reconnaissance script

TARGET=$1

echo "[*] Starting reconnaissance on $TARGET"

# DNS enumeration
nslookup $TARGET > recon/$TARGET.dns

# Goofile search
echo "[*] Running Goofile..."
python3 goofile.py -d $TARGET -f pdf > recon/$TARGET.pdfs &
python3 goofile.py -d $TARGET -f docx > recon/$TARGET.docs &
python3 goofile.py -d $TARGET -f xlsx > recon/$TARGET.sheets &
python3 goofile.py -d $TARGET -f sql > recon/$TARGET.sql &
python3 goofile.py -d $TARGET -f bak > recon/$TARGET.bak &

# Wait for all processes
wait

# Combine results
cat recon/$TARGET.* > recon/$TARGET.combined.txt

echo "[+] Reconnaissance complete. Results in recon/ directory"

Alternative Tools

Tool	Purpose
Metagoofile	Similar Google dorks tool (older)
Google Dorking	Manual search using Google
Censys	Internet-wide database search
Shodan	IoT device search engine
OSINT Framework	Comprehensive OSINT toolkit

Resources

GitHub: https://github.com/1007/goofile
Google Dorks: https://www.exploit-db.com/google-hacking-database
OSINT: https://osintframework.com/
Responsible Disclosure: https://cheatsheetseries.owasp.org/cheatsheets/Vulnerable_Dependency_Management_Cheat_Sheet.html