Pular para o conteúdo

EmailHarvester

EmailHarvester is an OSINT tool that automatically gathers email addresses associated with a target domain by querying multiple search engines including Google, Bing, Yahoo, and Ask. It’s used for reconnaissance, penetration testing, and security research to identify valid email addresses for social engineering testing and threat modeling.

git clone https://github.com/maldevel/EmailHarvester.git
cd EmailHarvester
pip3 install -r requirements.txt
# Python 3.6 or higher
python3 --version

# Install required packages
pip3 install requests beautifulsoup4
pip3 install dnspython
pip3 install argparse
docker build -t emailharvester .
docker run emailharvester -d example.com
# Verify Python and pip
python3 -m pip --version

# Install system dependencies (if needed)
sudo apt-get install python3-pip python3-dev
python3 EmailHarvester.py -d example.com
python3 EmailHarvester.py -d example.com -f results.txt
python3 EmailHarvester.py -d example.com -e google
python3 EmailHarvester.py -d example.com -e google,bing,yahoo
python3 EmailHarvester.py -d example.com -v
OptionDescription
-d, --domainTarget domain name
-l, --limitNumber of results per search engine
-e, --engineSpecific search engines (comma-separated)
-f, --fileOutput file name
-v, --verboseVerbose output
-h, --helpShow help message
--proxiesUse proxy list
--timeoutRequest timeout in seconds
--delayDelay between requests
EngineCoverageSpeedReliability
GoogleComprehensiveFastVery High
BingGoodFastVery High
YahooGoodFastHigh
AskLimitedFastMedium
BaiduChina-focusedFastMedium
YandexRussia-focusedFastMedium
python3 EmailHarvester.py -d example.com
python3 EmailHarvester.py -d company.com -f emails.txt
python3 EmailHarvester.py -d target.org -l 1000
python3 EmailHarvester.py -d example.com -e google
python3 EmailHarvester.py -d company.org \
  -e google,bing,yahoo \
  -l 500 \
  -f harvested_emails.txt
#!/bin/bash
# Harvest emails from multiple domains
for domain in example.com test.org sample.net; do
  python3 EmailHarvester.py -d $domain -f results_${domain}.txt
done
python3 EmailHarvester.py -d mail.example.com
python3 EmailHarvester.py -d admin.example.com
python3 EmailHarvester.py -d support.example.com
python3 EmailHarvester.py -d example.com --delay 2 -v
# EmailHarvester uses these patterns:
# site:example.com email
# site:example.com contact
# site:example.com @example.com
# Bing-specific searches:
# domain:example.com
# -domain:subdomain.example.com
# Exclude unwanted results
# site:example.com -gmail.com
# site:example.com filetype:pdf
#!/bin/bash
# Harvest emails and enumerate DNS
python3 EmailHarvester.py -d example.com -f emails.txt

# Extract domains from emails
cut -d'@' -f2 emails.txt | sort | uniq > domains.txt

# DNS enumeration
for domain in $(cat domains.txt); do
  dig $domain +short
done
# Create proxy list
echo "http://proxy1:8080" > proxies.txt
echo "http://proxy2:8080" >> proxies.txt

# Use with EmailHarvester
python3 EmailHarvester.py -d example.com \
  --proxies proxies.txt -f results.txt
#!/bin/bash
# Harvest from parent and subdomains
PARENT="example.com"
SUBDOMAINS=(
  "mail.$PARENT"
  "admin.$PARENT"
  "support.$PARENT"
  "dev.$PARENT"
)

for sub in "${SUBDOMAINS[@]}"; do
  python3 EmailHarvester.py -d $sub -f results_$sub.txt
done
#!/bin/bash
# Clean up harvested emails
python3 EmailHarvester.py -d example.com -f raw_emails.txt

# Remove duplicates
sort raw_emails.txt | uniq > unique_emails.txt

# Filter by domain
grep "@example.com" unique_emails.txt > domain_emails.txt

# Count results
wc -l domain_emails.txt
#!/bin/bash
# Harvest from multiple domains efficiently
TARGETS=(
  "company1.com"
  "company2.org"
  "company3.net"
)

for target in "${TARGETS[@]}"; do
  echo "Harvesting: $target"
  python3 EmailHarvester.py -d $target \
    -e google,bing \
    -f results_${target}.txt \
    --delay 1
done

# Combine all results
cat results_*.txt | sort | uniq > all_emails.txt
python3 EmailHarvester.py -d example.com -f emails.txt
sort emails.txt | uniq > unique_emails.txt
wc -l unique_emails.txt
# Get usernames from email addresses
cat emails.txt | cut -d'@' -f1 | sort | uniq > usernames.txt
# Common naming patterns
grep -E "admin|support|info" emails.txt > general.txt
grep -E "dev|engineering|tech" emails.txt > tech.txt
grep -E "sales|marketing|business" emails.txt > business.txt
#!/bin/bash
# Prepare list for penetration testing
python3 EmailHarvester.py -d target.com -f harvested.txt

# Filter and clean
grep -oE '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}' harvested.txt \
  | sort | uniq > valid_emails.txt

# Create CSV format
awk '{print $0","}' valid_emails.txt > email_list.csv
#!/bin/bash
DOMAIN="target.com"

echo "=== Email Harvesting for $DOMAIN ==="
python3 EmailHarvester.py -d $DOMAIN -f emails_${DOMAIN}.txt

echo "=== Email Statistics ==="
echo "Total emails: $(wc -l < emails_${DOMAIN}.txt)"

echo "=== Employee Names ==="
cut -d'@' -f1 emails_${DOMAIN}.txt | sort | uniq

echo "=== Email Domains ==="
cut -d'@' -f2 emails_${DOMAIN}.txt | sort | uniq -c

echo "=== Reconnaissance Complete ==="
#!/bin/bash
# Prepare for authorized social engineering test
DOMAIN="target.com"
python3 EmailHarvester.py -d $DOMAIN -f test_targets.txt

# Validate emails (check if they exist)
# Use separate tool for SMTP validation
# (Never execute without authorization)

# Create test campaign (with authorization)
wc -l test_targets.txt
#!/bin/bash
# Identify attack surface
TARGETS=("example.com" "test.org" "sample.net")

for target in "${TARGETS[@]}"; do
  EMAILS=$(python3 EmailHarvester.py -d $target | wc -l)
  echo "$target: $EMAILS email addresses exposed"
done
# Convert to CSV
python3 EmailHarvester.py -d example.com -f emails.txt
awk '{print NR","$0}' emails.txt > emails.csv
# Import into database for tracking
sqlite3 emails.db "CREATE TABLE harvested (
  id INTEGER PRIMARY KEY,
  email TEXT UNIQUE,
  domain TEXT,
  harvested_date DATETIME
);"

# Insert results
while read email; do
  DOMAIN=$(echo $email | cut -d'@' -f2)
  sqlite3 emails.db "INSERT INTO harvested VALUES (NULL, '$email', '$DOMAIN', datetime('now'));"
done < emails.txt
# Aggregate multiple harvesting sessions
cat results_*.txt | sort | uniq > combined_emails.txt

# Find new emails vs previous run
comm -13 previous.txt combined_emails.txt > new_emails.txt
# Only harvest emails when authorized:
- Your own domain
- Penetration testing contract
- Security research
- Bug bounty program
- Legal investigation
# Respect privacy regulations:
- GDPR: Legitimate purpose required
- CCPA: Don't sell or misuse data
- CFAA: Only access with authorization
- Document authorization and purpose
# Protect harvested email addresses:
- Store securely
- Limit access
- Delete after use
- Don't share externally
- Comply with data protection laws
# Domain may not have indexed content
# Try different search engines:
python3 EmailHarvester.py -d example.com -e bing

# Check if domain exists:
dig example.com +short
# Search engine may block excessive requests
# Use delay between requests:
python3 EmailHarvester.py -d example.com --delay 5

# Use proxy:
python3 EmailHarvester.py -d example.com --proxies proxy_list.txt
# Increase timeout for slow connections:
python3 EmailHarvester.py -d example.com --timeout 30

# Check network connectivity:
curl -I https://www.google.com
# Filter false positives
cat emails.txt | grep -E "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$"

# Remove non-target domain emails
grep "@example.com" emails.txt
#!/bin/bash
# Process multiple domains in parallel
for domain in example.com test.org sample.net; do
  python3 EmailHarvester.py -d $domain -f results_${domain}.txt &
done
wait
# Reduce unnecessary API calls
python3 EmailHarvester.py -d example.com -l 100 -e google
# Keep previous results to avoid re-harvesting
if [ -f emails.txt ]; then
  echo "Using cached results"
  cat emails.txt
else
  python3 EmailHarvester.py -d example.com -f emails.txt
fi
ToolMethodSpeedAccuracy
EmailHarvesterSearch engine scrapingFastMedium-High
SherlockUsername searchFastVariable
TheelearninatorDNS/Email enumMediumHigh
Hunter.ioCommercial databaseFastVery High
ClearbitCommercial APIFastVery High

EmailHarvester is a straightforward OSINT tool for gathering email addresses associated with target domains through search engine queries. It’s valuable for penetration testing, reconnaissance, and understanding an organization’s email footprint. The tool demonstrates how much information can be discovered about target organizations through passive techniques using publicly available search engines.