EmailHarvester is an OSINT tool that automatically gathers email addresses associated with a target domain by querying multiple search engines including Google, Bing, Yahoo, and Ask. It’s used for reconnaissance, penetration testing, and security research to identify valid email addresses for social engineering testing and threat modeling.
Installation
Clone from GitHub
git clone https://github.com/maldevel/EmailHarvester.git
cd EmailHarvester
pip3 install -r requirements.txt
Install Dependencies
# Python 3.6 or higher
python3 --version
# Install required packages
pip3 install requests beautifulsoup4
pip3 install dnspython
pip3 install argparse
Docker Installation
docker build -t emailharvester .
docker run emailharvester -d example.com
System Requirements
# Verify Python and pip
python3 -m pip --version
# Install system dependencies (if needed)
sudo apt-get install python3-pip python3-dev
Basic Usage
Simple Domain Harvesting
python3 EmailHarvester.py -d example.com
Save Results to File
python3 EmailHarvester.py -d example.com -f results.txt
Specify Search Engine
python3 EmailHarvester.py -d example.com -e google
Multiple Search Engines
python3 EmailHarvester.py -d example.com -e google,bing,yahoo
Verbose Output
python3 EmailHarvester.py -d example.com -v
Command-Line Options
| Option | Description |
|---|
-d, --domain | Target domain name |
-l, --limit | Number of results per search engine |
-e, --engine | Specific search engines (comma-separated) |
-f, --file | Output file name |
-v, --verbose | Verbose output |
-h, --help | Show help message |
--proxies | Use proxy list |
--timeout | Request timeout in seconds |
--delay | Delay between requests |
Supported Search Engines
Available Search Engines
| Engine | Coverage | Speed | Reliability |
|---|
| Google | Comprehensive | Fast | Very High |
| Bing | Good | Fast | Very High |
| Yahoo | Good | Fast | High |
| Ask | Limited | Fast | Medium |
| Baidu | China-focused | Fast | Medium |
| Yandex | Russia-focused | Fast | Medium |
Practical Examples
Basic Domain Reconnaissance
python3 EmailHarvester.py -d example.com
Harvesting with File Output
python3 EmailHarvester.py -d company.com -f emails.txt
Aggressive Harvesting (More Results)
python3 EmailHarvester.py -d target.org -l 1000
Specific Search Engine Query
python3 EmailHarvester.py -d example.com -e google
Multiple Engines with Limit
python3 EmailHarvester.py -d company.org \
-e google,bing,yahoo \
-l 500 \
-f harvested_emails.txt
Batch Domain Harvesting
#!/bin/bash
# Harvest emails from multiple domains
for domain in example.com test.org sample.net; do
python3 EmailHarvester.py -d $domain -f results_${domain}.txt
done
Subdomain Email Harvesting
python3 EmailHarvester.py -d mail.example.com
python3 EmailHarvester.py -d admin.example.com
python3 EmailHarvester.py -d support.example.com
Delayed Harvesting (Avoid Detection)
python3 EmailHarvester.py -d example.com --delay 2 -v
Search Engine Query Techniques
Google Dork Queries
# EmailHarvester uses these patterns:
# site:example.com email
# site:example.com contact
# site:example.com @example.com
Bing Search Patterns
# Bing-specific searches:
# domain:example.com
# -domain:subdomain.example.com
Filtered Searches
# Exclude unwanted results
# site:example.com -gmail.com
# site:example.com filetype:pdf
Advanced Techniques
Combine with DNS Enumeration
#!/bin/bash
# Harvest emails and enumerate DNS
python3 EmailHarvester.py -d example.com -f emails.txt
# Extract domains from emails
cut -d'@' -f2 emails.txt | sort | uniq > domains.txt
# DNS enumeration
for domain in $(cat domains.txt); do
dig $domain +short
done
Harvest with Proxies
# Create proxy list
echo "http://proxy1:8080" > proxies.txt
echo "http://proxy2:8080" >> proxies.txt
# Use with EmailHarvester
python3 EmailHarvester.py -d example.com \
--proxies proxies.txt -f results.txt
Target Subsidiary Domains
#!/bin/bash
# Harvest from parent and subdomains
PARENT="example.com"
SUBDOMAINS=(
"mail.$PARENT"
"admin.$PARENT"
"support.$PARENT"
"dev.$PARENT"
)
for sub in "${SUBDOMAINS[@]}"; do
python3 EmailHarvester.py -d $sub -f results_$sub.txt
done
Email Validation and Filtering
#!/bin/bash
# Clean up harvested emails
python3 EmailHarvester.py -d example.com -f raw_emails.txt
# Remove duplicates
sort raw_emails.txt | uniq > unique_emails.txt
# Filter by domain
grep "@example.com" unique_emails.txt > domain_emails.txt
# Count results
wc -l domain_emails.txt
Bulk Target Harvesting
#!/bin/bash
# Harvest from multiple domains efficiently
TARGETS=(
"company1.com"
"company2.org"
"company3.net"
)
for target in "${TARGETS[@]}"; do
echo "Harvesting: $target"
python3 EmailHarvester.py -d $target \
-e google,bing \
-f results_${target}.txt \
--delay 1
done
# Combine all results
cat results_*.txt | sort | uniq > all_emails.txt
Processing and Analysis
python3 EmailHarvester.py -d example.com -f emails.txt
sort emails.txt | uniq > unique_emails.txt
wc -l unique_emails.txt
# Get usernames from email addresses
cat emails.txt | cut -d'@' -f1 | sort | uniq > usernames.txt
Group by Department
# Common naming patterns
grep -E "admin|support|info" emails.txt > general.txt
grep -E "dev|engineering|tech" emails.txt > tech.txt
grep -E "sales|marketing|business" emails.txt > business.txt
Create Email Lists for Testing
#!/bin/bash
# Prepare list for penetration testing
python3 EmailHarvester.py -d target.com -f harvested.txt
# Filter and clean
grep -oE '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}' harvested.txt \
| sort | uniq > valid_emails.txt
# Create CSV format
awk '{print $0","}' valid_emails.txt > email_list.csv
OSINT Workflow Integration
Complete Reconnaissance
#!/bin/bash
DOMAIN="target.com"
echo "=== Email Harvesting for $DOMAIN ==="
python3 EmailHarvester.py -d $DOMAIN -f emails_${DOMAIN}.txt
echo "=== Email Statistics ==="
echo "Total emails: $(wc -l < emails_${DOMAIN}.txt)"
echo "=== Employee Names ==="
cut -d'@' -f1 emails_${DOMAIN}.txt | sort | uniq
echo "=== Email Domains ==="
cut -d'@' -f2 emails_${DOMAIN}.txt | sort | uniq -c
echo "=== Reconnaissance Complete ==="
Social Engineering Testing
#!/bin/bash
# Prepare for authorized social engineering test
DOMAIN="target.com"
python3 EmailHarvester.py -d $DOMAIN -f test_targets.txt
# Validate emails (check if they exist)
# Use separate tool for SMTP validation
# (Never execute without authorization)
# Create test campaign (with authorization)
wc -l test_targets.txt
Threat Modeling
#!/bin/bash
# Identify attack surface
TARGETS=("example.com" "test.org" "sample.net")
for target in "${TARGETS[@]}"; do
EMAILS=$(python3 EmailHarvester.py -d $target | wc -l)
echo "$target: $EMAILS email addresses exposed"
done
Handling Results
# Convert to CSV
python3 EmailHarvester.py -d example.com -f emails.txt
awk '{print NR","$0}' emails.txt > emails.csv
Database Import
# Import into database for tracking
sqlite3 emails.db "CREATE TABLE harvested (
id INTEGER PRIMARY KEY,
email TEXT UNIQUE,
domain TEXT,
harvested_date DATETIME
);"
# Insert results
while read email; do
DOMAIN=$(echo $email | cut -d'@' -f2)
sqlite3 emails.db "INSERT INTO harvested VALUES (NULL, '$email', '$DOMAIN', datetime('now'));"
done < emails.txt
Remove Duplicates Across Campaigns
# Aggregate multiple harvesting sessions
cat results_*.txt | sort | uniq > combined_emails.txt
# Find new emails vs previous run
comm -13 previous.txt combined_emails.txt > new_emails.txt
Ethical and Legal Considerations
Authorized Use
# Only harvest emails when authorized:
- Your own domain
- Penetration testing contract
- Security research
- Bug bounty program
- Legal investigation
Privacy and Compliance
# Respect privacy regulations:
- GDPR: Legitimate purpose required
- CCPA: Don't sell or misuse data
- CFAA: Only access with authorization
- Document authorization and purpose
Email Handling
# Protect harvested email addresses:
- Store securely
- Limit access
- Delete after use
- Don't share externally
- Comply with data protection laws
Troubleshooting
No Results Found
# Domain may not have indexed content
# Try different search engines:
python3 EmailHarvester.py -d example.com -e bing
# Check if domain exists:
dig example.com +short
Connection Blocked
# Search engine may block excessive requests
# Use delay between requests:
python3 EmailHarvester.py -d example.com --delay 5
# Use proxy:
python3 EmailHarvester.py -d example.com --proxies proxy_list.txt
Timeout Issues
# Increase timeout for slow connections:
python3 EmailHarvester.py -d example.com --timeout 30
# Check network connectivity:
curl -I https://www.google.com
Invalid Results
# Filter false positives
cat emails.txt | grep -E "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$"
# Remove non-target domain emails
grep "@example.com" emails.txt
Parallel Processing
#!/bin/bash
# Process multiple domains in parallel
for domain in example.com test.org sample.net; do
python3 EmailHarvester.py -d $domain -f results_${domain}.txt &
done
wait
Limit Results Efficiently
# Reduce unnecessary API calls
python3 EmailHarvester.py -d example.com -l 100 -e google
Cache Results
# Keep previous results to avoid re-harvesting
if [ -f emails.txt ]; then
echo "Using cached results"
cat emails.txt
else
python3 EmailHarvester.py -d example.com -f emails.txt
fi
| Tool | Method | Speed | Accuracy |
|---|
| EmailHarvester | Search engine scraping | Fast | Medium-High |
| Sherlock | Username search | Fast | Variable |
| Theelearninator | DNS/Email enum | Medium | High |
| Hunter.io | Commercial database | Fast | Very High |
| Clearbit | Commercial API | Fast | Very High |
Summary
EmailHarvester is a straightforward OSINT tool for gathering email addresses associated with target domains through search engine queries. It’s valuable for penetration testing, reconnaissance, and understanding an organization’s email footprint. The tool demonstrates how much information can be discovered about target organizations through passive techniques using publicly available search engines.