Pular para o conteúdo

EmailHarvester

EmailHarvester is an OSINT tool that automatically gathers email addresses associated with a target domain by querying multiple search engines including Google, Bing, Yahoo, and Ask. It’s used for reconnaissance, penetration testing, and security research to identify valid email addresses for social engineering testing and threat modeling.

Installation

Clone from GitHub

git clone https://github.com/maldevel/EmailHarvester.git
cd EmailHarvester
pip3 install -r requirements.txt

Install Dependencies

# Python 3.6 or higher
python3 --version

# Install required packages
pip3 install requests beautifulsoup4
pip3 install dnspython
pip3 install argparse

Docker Installation

docker build -t emailharvester .
docker run emailharvester -d example.com

System Requirements

# Verify Python and pip
python3 -m pip --version

# Install system dependencies (if needed)
sudo apt-get install python3-pip python3-dev

Basic Usage

Simple Domain Harvesting

python3 EmailHarvester.py -d example.com

Save Results to File

python3 EmailHarvester.py -d example.com -f results.txt

Specify Search Engine

python3 EmailHarvester.py -d example.com -e google

Multiple Search Engines

python3 EmailHarvester.py -d example.com -e google,bing,yahoo

Verbose Output

python3 EmailHarvester.py -d example.com -v

Command-Line Options

OptionDescription
-d, --domainTarget domain name
-l, --limitNumber of results per search engine
-e, --engineSpecific search engines (comma-separated)
-f, --fileOutput file name
-v, --verboseVerbose output
-h, --helpShow help message
--proxiesUse proxy list
--timeoutRequest timeout in seconds
--delayDelay between requests

Supported Search Engines

Available Search Engines

EngineCoverageSpeedReliability
GoogleComprehensiveFastVery High
BingGoodFastVery High
YahooGoodFastHigh
AskLimitedFastMedium
BaiduChina-focusedFastMedium
YandexRussia-focusedFastMedium

Practical Examples

Basic Domain Reconnaissance

python3 EmailHarvester.py -d example.com

Harvesting with File Output

python3 EmailHarvester.py -d company.com -f emails.txt

Aggressive Harvesting (More Results)

python3 EmailHarvester.py -d target.org -l 1000

Specific Search Engine Query

python3 EmailHarvester.py -d example.com -e google

Multiple Engines with Limit

python3 EmailHarvester.py -d company.org \
  -e google,bing,yahoo \
  -l 500 \
  -f harvested_emails.txt

Batch Domain Harvesting

#!/bin/bash
# Harvest emails from multiple domains
for domain in example.com test.org sample.net; do
  python3 EmailHarvester.py -d $domain -f results_${domain}.txt
done

Subdomain Email Harvesting

python3 EmailHarvester.py -d mail.example.com
python3 EmailHarvester.py -d admin.example.com
python3 EmailHarvester.py -d support.example.com

Delayed Harvesting (Avoid Detection)

python3 EmailHarvester.py -d example.com --delay 2 -v

Search Engine Query Techniques

Google Dork Queries

# EmailHarvester uses these patterns:
# site:example.com email
# site:example.com contact
# site:example.com @example.com

Bing Search Patterns

# Bing-specific searches:
# domain:example.com
# -domain:subdomain.example.com

Filtered Searches

# Exclude unwanted results
# site:example.com -gmail.com
# site:example.com filetype:pdf

Advanced Techniques

Combine with DNS Enumeration

#!/bin/bash
# Harvest emails and enumerate DNS
python3 EmailHarvester.py -d example.com -f emails.txt

# Extract domains from emails
cut -d'@' -f2 emails.txt | sort | uniq > domains.txt

# DNS enumeration
for domain in $(cat domains.txt); do
  dig $domain +short
done

Harvest with Proxies

# Create proxy list
echo "http://proxy1:8080" > proxies.txt
echo "http://proxy2:8080" >> proxies.txt

# Use with EmailHarvester
python3 EmailHarvester.py -d example.com \
  --proxies proxies.txt -f results.txt

Target Subsidiary Domains

#!/bin/bash
# Harvest from parent and subdomains
PARENT="example.com"
SUBDOMAINS=(
  "mail.$PARENT"
  "admin.$PARENT"
  "support.$PARENT"
  "dev.$PARENT"
)

for sub in "${SUBDOMAINS[@]}"; do
  python3 EmailHarvester.py -d $sub -f results_$sub.txt
done

Email Validation and Filtering

#!/bin/bash
# Clean up harvested emails
python3 EmailHarvester.py -d example.com -f raw_emails.txt

# Remove duplicates
sort raw_emails.txt | uniq > unique_emails.txt

# Filter by domain
grep "@example.com" unique_emails.txt > domain_emails.txt

# Count results
wc -l domain_emails.txt

Bulk Target Harvesting

#!/bin/bash
# Harvest from multiple domains efficiently
TARGETS=(
  "company1.com"
  "company2.org"
  "company3.net"
)

for target in "${TARGETS[@]}"; do
  echo "Harvesting: $target"
  python3 EmailHarvester.py -d $target \
    -e google,bing \
    -f results_${target}.txt \
    --delay 1
done

# Combine all results
cat results_*.txt | sort | uniq > all_emails.txt

Processing and Analysis

Extract Unique Emails

python3 EmailHarvester.py -d example.com -f emails.txt
sort emails.txt | uniq > unique_emails.txt
wc -l unique_emails.txt

Extract Usernames

# Get usernames from email addresses
cat emails.txt | cut -d'@' -f1 | sort | uniq > usernames.txt

Group by Department

# Common naming patterns
grep -E "admin|support|info" emails.txt > general.txt
grep -E "dev|engineering|tech" emails.txt > tech.txt
grep -E "sales|marketing|business" emails.txt > business.txt

Create Email Lists for Testing

#!/bin/bash
# Prepare list for penetration testing
python3 EmailHarvester.py -d target.com -f harvested.txt

# Filter and clean
grep -oE '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}' harvested.txt \
  | sort | uniq > valid_emails.txt

# Create CSV format
awk '{print $0","}' valid_emails.txt > email_list.csv

OSINT Workflow Integration

Complete Reconnaissance

#!/bin/bash
DOMAIN="target.com"

echo "=== Email Harvesting for $DOMAIN ==="
python3 EmailHarvester.py -d $DOMAIN -f emails_${DOMAIN}.txt

echo "=== Email Statistics ==="
echo "Total emails: $(wc -l < emails_${DOMAIN}.txt)"

echo "=== Employee Names ==="
cut -d'@' -f1 emails_${DOMAIN}.txt | sort | uniq

echo "=== Email Domains ==="
cut -d'@' -f2 emails_${DOMAIN}.txt | sort | uniq -c

echo "=== Reconnaissance Complete ==="

Social Engineering Testing

#!/bin/bash
# Prepare for authorized social engineering test
DOMAIN="target.com"
python3 EmailHarvester.py -d $DOMAIN -f test_targets.txt

# Validate emails (check if they exist)
# Use separate tool for SMTP validation
# (Never execute without authorization)

# Create test campaign (with authorization)
wc -l test_targets.txt

Threat Modeling

#!/bin/bash
# Identify attack surface
TARGETS=("example.com" "test.org" "sample.net")

for target in "${TARGETS[@]}"; do
  EMAILS=$(python3 EmailHarvester.py -d $target | wc -l)
  echo "$target: $EMAILS email addresses exposed"
done

Handling Results

CSV Export Format

# Convert to CSV
python3 EmailHarvester.py -d example.com -f emails.txt
awk '{print NR","$0}' emails.txt > emails.csv

Database Import

# Import into database for tracking
sqlite3 emails.db "CREATE TABLE harvested (
  id INTEGER PRIMARY KEY,
  email TEXT UNIQUE,
  domain TEXT,
  harvested_date DATETIME
);"

# Insert results
while read email; do
  DOMAIN=$(echo $email | cut -d'@' -f2)
  sqlite3 emails.db "INSERT INTO harvested VALUES (NULL, '$email', '$DOMAIN', datetime('now'));"
done < emails.txt

Remove Duplicates Across Campaigns

# Aggregate multiple harvesting sessions
cat results_*.txt | sort | uniq > combined_emails.txt

# Find new emails vs previous run
comm -13 previous.txt combined_emails.txt > new_emails.txt

Authorized Use

# Only harvest emails when authorized:
- Your own domain
- Penetration testing contract
- Security research
- Bug bounty program
- Legal investigation

Privacy and Compliance

# Respect privacy regulations:
- GDPR: Legitimate purpose required
- CCPA: Don't sell or misuse data
- CFAA: Only access with authorization
- Document authorization and purpose

Email Handling

# Protect harvested email addresses:
- Store securely
- Limit access
- Delete after use
- Don't share externally
- Comply with data protection laws

Troubleshooting

No Results Found

# Domain may not have indexed content
# Try different search engines:
python3 EmailHarvester.py -d example.com -e bing

# Check if domain exists:
dig example.com +short

Connection Blocked

# Search engine may block excessive requests
# Use delay between requests:
python3 EmailHarvester.py -d example.com --delay 5

# Use proxy:
python3 EmailHarvester.py -d example.com --proxies proxy_list.txt

Timeout Issues

# Increase timeout for slow connections:
python3 EmailHarvester.py -d example.com --timeout 30

# Check network connectivity:
curl -I https://www.google.com

Invalid Results

# Filter false positives
cat emails.txt | grep -E "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$"

# Remove non-target domain emails
grep "@example.com" emails.txt

Performance Optimization

Parallel Processing

#!/bin/bash
# Process multiple domains in parallel
for domain in example.com test.org sample.net; do
  python3 EmailHarvester.py -d $domain -f results_${domain}.txt &
done
wait

Limit Results Efficiently

# Reduce unnecessary API calls
python3 EmailHarvester.py -d example.com -l 100 -e google

Cache Results

# Keep previous results to avoid re-harvesting
if [ -f emails.txt ]; then
  echo "Using cached results"
  cat emails.txt
else
  python3 EmailHarvester.py -d example.com -f emails.txt
fi

Comparison with Similar Tools

ToolMethodSpeedAccuracy
EmailHarvesterSearch engine scrapingFastMedium-High
SherlockUsername searchFastVariable
TheelearninatorDNS/Email enumMediumHigh
Hunter.ioCommercial databaseFastVery High
ClearbitCommercial APIFastVery High

Summary

EmailHarvester is a straightforward OSINT tool for gathering email addresses associated with target domains through search engine queries. It’s valuable for penetration testing, reconnaissance, and understanding an organization’s email footprint. The tool demonstrates how much information can be discovered about target organizations through passive techniques using publicly available search engines.