ssdeep
Overview
섹션 제목: “Overview”ssdeep is a program for computing and comparing context-triggered piecewise hashing (CTPH) fuzzy hashes. Unlike traditional cryptographic hashes like MD5 or SHA-1 that produce completely different outputs for slightly modified files, ssdeep detects similarity between files even when they differ.
Key Features
섹션 제목: “Key Features”- Compute fuzzy hashes using CTPH algorithm
- Compare files for similarity matching
- Detect malware variants with minor modifications
- Batch processing of multiple files
- Generate fuzzy hash databases
- Cross-platform support (Linux, Windows, macOS)
Use Cases
섹션 제목: “Use Cases”- Malware analysis and variant detection
- Digital forensics and file comparison
- Identifying near-duplicate files
- Detecting code variants in security research
- File integrity monitoring with tolerance
Installation
섹션 제목: “Installation”Linux/Debian-based
섹션 제목: “Linux/Debian-based”sudo apt-get update
sudo apt-get install ssdeep
macOS
섹션 제목: “macOS”brew install ssdeep
Windows
섹션 제목: “Windows”Download from ssdeep sourceforge or install via package manager.
Build from Source
섹션 제목: “Build from Source”wget https://sourceforge.net/projects/ssdeep/files/ssdeep-2.14.1/ssdeep-2.14.1.tar.gz
tar xzf ssdeep-2.14.1.tar.gz
cd ssdeep-2.14.1
./configure
make
sudo make install
Basic Commands
섹션 제목: “Basic Commands”| Command | Purpose |
|---|---|
ssdeep file.bin | Calculate fuzzy hash of a single file |
ssdeep -r directory/ | Recursively hash all files in directory |
ssdeep -m hashfile.txt file.bin | Compare file against known hashes |
ssdeep -s hashfile.txt file.bin | Silent mode comparison (minimal output) |
ssdeep -p directory/ | Recursive mode with pretty-print |
ssdeep -d hashfile.txt file.bin | Show matching hash details |
ssdeep -h | Display help message |
ssdeep -V | Show version information |
Computing Fuzzy Hashes
섹션 제목: “Computing Fuzzy Hashes”Single File Hash
섹션 제목: “Single File Hash”ssdeep /path/to/file.bin
Output example:
3072:abcdef+GHIJKL+mnopqr+stuvwx:1234
/path/to/file.bin
Recursive Directory Hash
섹션 제목: “Recursive Directory Hash”ssdeep -r /path/to/directory/ > hashes.txt
This creates a file with all fuzzy hashes in the directory structure.
Pretty-Print Recursive Hash
섹션 제목: “Pretty-Print Recursive Hash”ssdeep -p -r /malware/samples/ > all_hashes.txt
Produces formatted output with file paths clearly visible.
Comparing Files and Hashes
섹션 제목: “Comparing Files and Hashes”Compare Single File Against Hash Database
섹션 제목: “Compare Single File Against Hash Database”ssdeep -m known_hashes.txt suspicious_file.bin
Shows similarity percentage if match is found.
Silent Mode Comparison (for scripting)
섹션 제목: “Silent Mode Comparison (for scripting)”ssdeep -m -s known_hashes.txt file.bin
Returns only exit code (0 for match, 1 for no match) without console output.
Detailed Match Information
섹션 제목: “Detailed Match Information”ssdeep -m -d known_hashes.txt file.bin
Displays detailed information about matched hashes and similarity scores.
Batch Compare Multiple Files
섹션 제목: “Batch Compare Multiple Files”for file in /path/to/files/*; do
echo "=== $file ==="
ssdeep -m hashes.txt "$file"
done
Creating Hash Databases
섹션 제목: “Creating Hash Databases”Generate Hash Database from Directory
섹션 제목: “Generate Hash Database from Directory”ssdeep -r -s /path/to/samples/ > malware_db.txt
Silent recursive mode creates a database suitable for later comparisons.
Append Hashes to Existing Database
섹션 제목: “Append Hashes to Existing Database”ssdeep -r /new/samples/ >> malware_db.txt
Adds new hashes to an existing database file.
Create Indexed Hash Database
섹션 제목: “Create Indexed Hash Database”ssdeep -r -s /samples/ > database.txt
ssdeep -m database.txt /path/to/compare/
Store baseline hashes, then perform batch comparisons against them.
Malware Analysis Workflow
섹션 제목: “Malware Analysis Workflow”Establish Known Malware Baseline
섹션 제목: “Establish Known Malware Baseline”# Create hash database of known malware samples
ssdeep -r -s /known/malware/collection/ > malware_baseline.txt
wc -l malware_baseline.txt # Verify hash count
Analyze Suspicious File
섹션 제목: “Analyze Suspicious File”# Calculate hash of suspicious file
ssdeep /path/to/suspicious_sample.bin > analysis.txt
# Compare against known malware
ssdeep -m malware_baseline.txt /path/to/suspicious_sample.bin
Detect Variants
섹션 제목: “Detect Variants”# Check if file is variant of known malware
ssdeep -m -d malware_baseline.txt /suspect.exe
# Output shows match percentage
# Example: 98:abc123:xyz789 matches /known/malware/trojan.exe (98%)
Advanced Usage
섹션 제목: “Advanced Usage”Exclude Files by Pattern
섹션 제목: “Exclude Files by Pattern”# Hash all files except system files
find /samples -type f ! -name "*.sys" ! -name "*.dll" | while read f; do
ssdeep "$f" >> custom_hashes.txt
done
Hash Only Specific File Types
섹션 제목: “Hash Only Specific File Types”# Hash only executable files
find /samples -type f -name "*.exe" -exec ssdeep {} \; > exes.txt
Compare Two Hash Databases
섹션 제목: “Compare Two Hash Databases”# Find similar hashes between two databases
ssdeep -m database1.txt database2.txt
Generate HTML Report
섹션 제목: “Generate HTML Report”ssdeep -r /samples/ | grep -v "^%" > hashes.txt
# Use third-party tools to convert to HTML visualization
Hash File Format
섹션 제목: “Hash File Format”Fuzzy hash database files have simple format:
# ssdeep generated hashes
3072:abcdef+GHIJKL+mnopqr:1234,file1.bin
2048:qwerty+asdfgh:5678,file2.bin
4096:zxcvbn+poiuyt:9012,file3.bin
Structure:
size:hash1:hash2— fuzzy hash components- Similarity percentage shown on comparison
- One hash per line with comma-separated filename
Similarity Matching Thresholds
섹션 제목: “Similarity Matching Thresholds”Interpreting Match Scores
섹션 제목: “Interpreting Match Scores”| Score | Meaning |
|---|---|
| 90-100% | Very similar, likely same file or minor variation |
| 75-89% | Similar structure, possible variant or derivative |
| 50-74% | Moderate similarity, may share code blocks |
| 25-49% | Weak similarity, possible common libraries |
| < 25% | Not similar, coincidental match |
Setting Match Confidence
섹션 제목: “Setting Match Confidence”# Use detailed output to review match percentages
ssdeep -m -d baseline.txt /path/to/file.bin | grep "%"
Automation and Scripting
섹션 제목: “Automation and Scripting”Monitor Directory for New Malware Variants
섹션 제목: “Monitor Directory for New Malware Variants”#!/bin/bash
BASELINE="/opt/malware_baseline.txt"
MONITOR_DIR="/quarantine"
while true; do
for file in "$MONITOR_DIR"/*.bin; do
if [ -f "$file" ]; then
if ssdeep -m -s "$BASELINE" "$file"; then
echo "$(date): VARIANT DETECTED - $file"
# Alert or quarantine
fi
rm "$file"
fi
done
sleep 300 # Check every 5 minutes
done
Bulk Hash and Compare
섹션 제목: “Bulk Hash and Compare”#!/bin/bash
SAMPLE_DIR="$1"
HASH_DB="$2"
for file in "$SAMPLE_DIR"/*; do
result=$(ssdeep -m -s "$HASH_DB" "$file")
if [ $? -eq 0 ]; then
echo "MATCH: $file"
fi
done
Generate Statistical Analysis
섹션 제목: “Generate Statistical Analysis”# Count hashes and create analysis
ssdeep -r /samples/ | grep -v "^%" > all_hashes.txt
total=$(wc -l < all_hashes.txt)
echo "Total files hashed: $total"
# Find most similar pairs (requires parsing)
echo "Use SSDEEP library or custom tools for pair-wise comparison"
Integration with Other Tools
섹션 제목: “Integration with Other Tools”Use with YARA for Enhanced Detection
섹션 제목: “Use with YARA for Enhanced Detection”# Generate fuzzy hashes as part of forensic investigation
ssdeep -r /evidence/drives/ > drive_hashes.txt
# Cross-reference with YARA rules for additional context
yara -r /rules/ /evidence/ > yara_results.txt
Combine with File Carving
섹션 제목: “Combine with File Carving”# After carving files from disk image
foremost -i disk.img -o carving_results/
# Hash carved files for variant detection
ssdeep -r carving_results/ > carved_hashes.txt
ssdeep -m known_malware.txt carving_results/ > carved_analysis.txt
Export for Analysis Platforms
섹션 제목: “Export for Analysis Platforms”# Create hash database for uploading to analysis platform
ssdeep -r -s /samples/ > upload_hashes.txt
# Document format for external tools
head -10 upload_hashes.txt
Performance Considerations
섹션 제목: “Performance Considerations”Large-Scale Hashing
섹션 제목: “Large-Scale Hashing”# Hash thousands of files efficiently
time ssdeep -r -s /massive/directory/ > output.txt
# Use -s (silent) for better performance without console output
ssdeep -r -s /path > /dev/null # Benchmark speed
Memory Usage
섹션 제목: “Memory Usage”- ssdeep uses minimal memory
- Suitable for embedded systems and resource-constrained environments
- No significant slowdown even with thousands of files
Optimization Tips
섹션 제목: “Optimization Tips”| Tip | Benefit |
|---|---|
Use -s flag | Faster, no pretty-print overhead |
| Hash to file | Avoids console bottleneck |
| Pre-filter files | Reduce unnecessary hashing |
| Use SSD storage | Faster disk I/O for large batches |
Troubleshooting
섹션 제목: “Troubleshooting”Hash Mismatch Despite Similarity
섹션 제목: “Hash Mismatch Despite Similarity”Issue: Same file produces different hash each time.
Solution: This shouldn’t happen with ssdeep. Verify:
# Check file integrity
md5sum file.bin
ssdeep file.bin
# Run again - should be identical
Performance Issues with Large Directories
섹션 제목: “Performance Issues with Large Directories”Issue: Hashing very large directories is slow.
Solution:
# Use silent mode
ssdeep -r -s /path > output.txt
# Process in parallel using GNU Parallel
parallel ssdeep ::: /samples/*.bin > parallel_output.txt
# Or use xargs
find /samples -type f | xargs -P 4 ssdeep > parallel_results.txt
Database Management
섹션 제목: “Database Management”Issue: Hash database becomes too large.
Solution:
# Segment databases by type
ssdeep -r -s /exe_samples/ > exes.txt
ssdeep -r -s /dll_samples/ > dlls.txt
# Compare against specific database
ssdeep -m exes.txt /suspicious.exe
Security Considerations
섹션 제목: “Security Considerations”Legitimate Uses (Authorized Testing)
섹션 제목: “Legitimate Uses (Authorized Testing)”- Malware variant detection in sandboxed environments
- Forensic investigation of compromised systems
- Integrity monitoring with fuzzy matching tolerance
- Research and analysis within controlled labs
Warning
섹션 제목: “Warning”NEVER use ssdeep to:
- Analyze files without authorization
- Bypass security measures or protections
- Violate intellectual property rights
- Conduct unauthorized security testing
Always obtain proper authorization before analyzing files or systems.
References
섹션 제목: “References”- Official Project: ssdeep SourceForge
- CTPH Algorithm: Jesse Kornblum’s Research
- Academic Paper: Context Triggered Piecewise Hashing
- Documentation:
man ssdeepon Linux/Unix systems
Quick Reference
섹션 제목: “Quick Reference”# Fast hash generation
ssdeep -r /path > hashes.txt
# Compare file against database
ssdeep -m hashes.txt /file.bin
# Silent batch comparison
ssdeep -m -s hashes.txt /file.bin && echo "MATCH"
# Pretty print with details
ssdeep -p -m hashes.txt /file.bin
# Combine find with ssdeep
find /samples -type f -exec ssdeep {} \; > all.txt