Salta ai contenuti

pyinstxtractor

pyinstxtractor is a forensic analysis tool that extracts the contents of PyInstaller-generated executables. It recovers the Python bytecode (.pyc files), resource files, and other embedded content, enabling security researchers and developers to analyze compiled Python applications. This is essential for malware analysis, code review, and compliance verification.

  • Bytecode Extraction: Recover compiled Python code from executables
  • Resource Recovery: Extract embedded data files and assets
  • Cross-Platform: Works with Windows .exe, macOS .app bundles, and Linux ELF binaries
  • Archive Analysis: Inspects the PyInstaller archive structure
  • Batch Processing: Extract multiple executables efficiently
  • Header Detection: Automatically identifies Python version and archive format
  • Decompilation Support: Prepares bytecode for tools like uncompyle6
  • Error Handling: Graceful handling of corrupted or unusual archives
# Clone repository
git clone https://github.com/extremecoders-re/pyinstxtractor
cd pyinstxtractor

# Make executable
chmod +x pyinstxtractor.py

# Verify installation
python pyinstxtractor.py --help
# Copy to PATH
sudo cp pyinstxtractor.py /usr/local/bin/pyinstxtractor
sudo chmod +x /usr/local/bin/pyinstxtractor

# Verify
pyinstxtractor --help
# Create and activate environment
python -m venv venv
source venv/bin/activate  # Linux/macOS
# or: venv\Scripts\activate  # Windows

# Install any dependencies
pip install uncompyle6  # optional, for decompilation
# Extract to directory
python pyinstxtractor.py application.exe

# Output structure created
# ├── application.exe_extracted/
# │   ├── base_library.zip
# │   ├── archive.pkg
# │   ├── PYZ-00.pyz_extracted/
# │   ├── [individual .pyc files]
# │   └── [resource files]
# Simple extraction
python pyinstxtractor.py myapp.exe

# Extract with specific output directory
python pyinstxtractor.py -d ./extracted myapp.exe

# Process in-place
python pyinstxtractor.py ./dist/application.exe
OptionDescriptionExample
file.exeTarget executablepyinstxtractor.py app.exe
-d DIROutput directorypyinstxtractor.py -d output app.exe
-hHelp messagepyinstxtractor.py -h
--verboseVerbose outputpyinstxtractor.py --verbose app.exe
PyInstaller Executable
├── Bootloader (C executable)
├── Python runtime libraries
├── Archive (TOC - Table of Contents)
│   ├── PYZ archive (bytecode)
│   │   ├── Compiled modules (.pyc)
│   │   ├── Bytecode files
│   │   └── Library index
│   ├── PKG archive (resources)
│   │   ├── Data files
│   │   ├── Configuration
│   │   └── Assets
│   └── Other files
└── Metadata
HeaderMeaningBytes
PyI\x00PyInstaller archive marker4
0x50494632PYZ archive magic4
COOKIEArchive metadata8
.pycPython compiled file4
# Extract Windows executable
python pyinstxtractor.py windows_app.exe

# Check contents
ls -la windows_app.exe_extracted/

# Find main script
find windows_app.exe_extracted -name "*.pyc" | head -5
# Extract from macOS bundle
python pyinstxtractor.py MyApp.app/Contents/MacOS/MyApp

# View extracted structure
tree -L 2 MyApp.app/Contents/MacOS/MyApp_extracted

# Examine binary
file MyApp.app/Contents/MacOS/MyApp
# Extract Linux binary
python pyinstxtractor.py ./linux_application

# Check file type
file linux_application

# Extract contents
ls linux_application_extracted/

# Find Python modules
find linux_application_extracted -type f -name "*.pyc"
#!/bin/bash

# Extract multiple executables
for exe in *.exe; do
    echo "Extracting $exe..."
    python pyinstxtractor.py "$exe"
done

# Verify all extractions
ls -d *_extracted/
# Install decompiler
pip install uncompyle6

# Decompile single .pyc file
uncompyle6 module.pyc > module.py

# Batch decompile
for pyc in *.pyc; do
    uncompyle6 "$pyc" > "${pyc%.pyc}.py"
done
# PYZ is a ZIP archive, extract it
unzip archive.pyz -d pyz_contents

# List contents
unzip -l archive.pyz

# Extract specific file
unzip archive.pyz module.pyc
# base_library.zip contains Python standard library
unzip -l base_library.zip | head -20

# Extract entire library
unzip base_library.zip -d stdlib

# View specific module
unzip -p base_library.zip os.pyc | xxd | head -20
# Check file type
file suspicious_app.exe
# Output: PE32 executable (console) Intel 80386, for MS Windows

# Verify PyInstaller signature
strings suspicious_app.exe | grep -i pyinstaller

# Check for pyi bootloader
objdump -h suspicious_app.exe | grep -i pyi
# Extract with output
python pyinstxtractor.py suspicious_app.exe

# Verify extraction success
ls suspicious_app.exe_extracted/ | wc -l

# Check for main module
find suspicious_app.exe_extracted -name "__main__.pyc"
# List extracted files
tree suspicious_app.exe_extracted -L 2

# Find entry point
grep -r "if __name__" suspicious_app.exe_extracted 2>/dev/null || \
  find . -name "__main__.pyc"

# Identify dependencies
find . -name "*.so" -o -name "*.dll"
# Find main script
pyc_file=$(find . -name "__main__.pyc" | head -1)

# Decompile
uncompyle6 "$pyc_file" > main.py

# Review code
cat main.py | head -50
# Extract and analyze
python pyinstxtractor.py suspect.exe

# Check for network connections
strings suspect.exe_extracted/*.pyc | grep -E "(http|socket|request|urllib)"

# Look for encoded strings
find . -name "*.pyc" -exec strings {} \; | grep -E "([A-Za-z0-9+/]{50,}=)"

# Search for common malware patterns
grep -r "subprocess\|os.system\|eval\|exec" suspect.exe_extracted/
# Find resource files
find suspect.exe_extracted -type f ! -name "*.pyc" ! -name "*.zip"

# Extract embedded files
for file in suspect.exe_extracted/PKG*; do
    unzip -l "$file" 2>/dev/null
done

# Save extracted resources
mkdir -p resources
unzip -j PKG-00.pyz -d resources/
# Search for suspicious patterns
grep -r "crypto\|cipher\|encrypt\|decrypt" *.pyc

# Find file operations
grep -r "open\|write\|read" main.pyc

# Identify C2 infrastructure
strings *.pyc | grep -E "^(https?|ftp)://"

# Check for registry/system calls
grep -r "winreg\|ctypes\|windll" *.pyc
Python .pyc File Structure:
├── Magic Number (4 bytes)    # Python version signature
├── Timestamp (4 bytes)        # File modification time
├── Code Object
│   ├── Constants
│   ├── Names
│   ├── Varnames
│   ├── Instructions (bytecode)
│   └── Nested code objects
└── [More code objects]
# Extract and display magic numbers
xxd -l 16 extracted_module.pyc

# Example output:
# 00000000: 6261 632d 2030 372b 0000 0000 9b6d e362

# Identify Python version
python << 'EOF'
import importlib.util
import struct

with open('module.pyc', 'rb') as f:
    magic = f.read(4)
    print(f"Magic: {magic.hex()}")
    # Map to Python version
EOF
# Read PyInstaller cookie (metadata)
python << 'EOF'
import struct

with open('application.exe', 'rb') as f:
    # Seek to end and read backwards for cookie
    f.seek(-24, 2)  # 24 bytes from end
    cookie = f.read(24)
    print(f"Cookie (hex): {cookie.hex()}")
    
    # Parse archive offset
    offset, length = struct.unpack('<2I', cookie[:8])
    print(f"Archive offset: {offset}")
    print(f"Archive length: {length}")
EOF
# Check file integrity
file suspicious_app.exe

# Verify it's actually a PyInstaller executable
strings suspicious_app.exe | grep -i "pyinstaller"

# Try with manual offset
python << 'EOF'
# Manual analysis if automated extraction fails
import struct

with open('app.exe', 'rb') as f:
    data = f.read()
    
# Search for PyInstaller signature
idx = data.find(b'PyI\x00')
if idx != -1:
    print(f"Found PyInstaller signature at offset: {idx}")
else:
    print("No PyInstaller signature found")
EOF
# Check extraction directory
ls -la application.exe_extracted/ | wc -l

# Verify PYZ extraction
unzip -t application.exe_extracted/base_library.zip

# Try repairing
python << 'EOF'
import zipfile
zf = 'base_library.zip'
try:
    with zipfile.ZipFile(zf, 'r') as z:
        z.testzip()
        print("ZIP file is valid")
except Exception as e:
    print(f"Corruption detected: {e}")
EOF
# Check Python version compatibility
python << 'EOF'
import struct

with open('module.pyc', 'rb') as f:
    magic = struct.unpack('I', f.read(4))[0]
    # Magic number maps to Python version
    print(f"Magic: {hex(magic)}")
    
# Common magic numbers:
# 0x33f0d0a (3.11), 0x445f0a (3.10), 0x431f0a (3.9), etc.
EOF

# Use version-specific decompiler
uncompyle6 --python=3.11 module.pyc > module.py
#!/usr/bin/env python3
"""
Advanced PyInstaller extraction with analysis
"""
import os
import struct
import zipfile
from pathlib import Path

class PyInstallerAnalyzer:
    def __init__(self, executable):
        self.exe = executable
        self.extracted_dir = f"{executable}_analyzed"
        
    def extract(self):
        """Extract using pyinstxtractor"""
        os.system(f"python pyinstxtractor.py {self.exe}")
        
    def analyze_archive(self):
        """Analyze extracted archive structure"""
        base_lib = Path(self.extracted_dir) / "base_library.zip"
        
        if base_lib.exists():
            with zipfile.ZipFile(base_lib, 'r') as z:
                print(f"Base library files: {len(z.namelist())}")
                print("First 10 modules:")
                for name in z.namelist()[:10]:
                    info = z.getinfo(name)
                    print(f"  {name} ({info.file_size} bytes)")
    
    def find_main_module(self):
        """Locate main entry point"""
        for root, dirs, files in os.walk(self.extracted_dir):
            for file in files:
                if file == '__main__.pyc':
                    return os.path.join(root, file)
        return None

# Usage
if __name__ == '__main__':
    analyzer = PyInstallerAnalyzer('app.exe')
    analyzer.extract()
    analyzer.analyze_archive()
    main = analyzer.find_main_module()
    print(f"Main module: {main}")
#!/bin/bash

REPORT="extraction_report.txt"
> "$REPORT"

for exe in *.exe; do
    echo "=== Analyzing $exe ===" | tee -a "$REPORT"
    
    # Extract
    python pyinstxtractor.py "$exe" 2>&1 | tee -a "$REPORT"
    
    # Find main module
    main_pyc=$(find "${exe}_extracted" -name "__main__.pyc")
    echo "Main module: $main_pyc" | tee -a "$REPORT"
    
    # Count dependencies
    dep_count=$(find "${exe}_extracted" -name "*.pyc" | wc -l)
    echo "Modules found: $dep_count" | tee -a "$REPORT"
    
    # Search for suspicious patterns
    suspicious=$(grep -r "socket\|subprocess\|eval\|exec" "${exe}_extracted" 2>/dev/null | wc -l)
    echo "Suspicious patterns: $suspicious" | tee -a "$REPORT"
    
    echo "" | tee -a "$REPORT"
done

echo "Report saved to: $REPORT"
  • Auditing your own compiled applications
  • Malware analysis and threat research
  • Code review and compliance verification
  • Educational purposes and learning
  • Code Obfuscation: Use PyArmor or Cython for compiled code
  • Encryption: Add bytecode encryption layers
  • Version Hiding: Remove Python version strings from binary
  • Custom Bootloader: Modify PyInstaller’s startup sequence
  • Code Signing: Verify executable authenticity
ToolPurposeSpeedAccuracy
pyinstxtractorArchive extractionFastExcellent
uncompyle6DecompilationSlowGood
pycdcDecompilationFastExcellent
GhidraBinary analysisSlowGood
IDA ProBinary analysisSlowExcellent
  • Authorization: Only analyze executables you own or have permission to analyze
  • Intellectual Property: Respect copyright and trade secrets
  • Responsible Disclosure: Report vulnerabilities properly
  • Legal Compliance: Follow applicable laws regarding reverse engineering
  • Attribution: Credit original authors when sharing analysis
StepToolCommand
Extractpyinstxtractorpyinstxtractor.py app.exe
Analyzestrings/grepgrep -r "pattern" extracted/
Decompressunzipunzip archive.pyz
Decompileuncompyle6uncompyle6 module.pyc
ReviewText editorcat main.py