Zum Inhalt

FOCA Cheat Sheet

generieren

Überblick

FOCA (Fingerprinting Organizations with Collected Archives) ist ein leistungsstarkes Metadaten-Analyse- und Dokument-Intelligenz-Tool, das zur Extraktion versteckter Informationen aus Dokumenten und Dateien verwendet wird. Es ist darauf spezialisiert, Metadaten, Netzwerkinformationen, Benutzer, Ordner, Softwareversionen und andere sensible Daten zu entdecken, die Organisationen versehentlich über öffentlich zugängliche Dokumente aussetzen.

ZEIT Note: Windows-basiertes Tool mit .NET Framework-Anforderung. Stellen Sie immer sicher, dass Sie eine ordnungsgemäße Autorisierung vor der Analyse von Zieldokumenten haben.

Installation und Inbetriebnahme

Systemanforderungen und Installation

```bash

System Requirements:

- Windows 7/8/10/11 (32-bit or 64-bit)

- .NET Framework 4.5 or later

- Microsoft Office (for advanced document analysis)

- Internet connection for online searches

Download FOCA:

1. Visit https://github.com/ElevenPaths/FOCA

2. Download latest release

3. Extract to desired directory

4. Run FOCA.exe as administrator

Alternative: Install via Chocolatey

choco install foca

Verify installation

Launch FOCA.exe and check version in Help > About

```_

Erstkonfiguration

```bash

Configuration steps:

1. Launch FOCA.exe

2. Go to Options > Configuration

3. Configure search engines and APIs

4. Set download directories

5. Configure proxy settings if needed

6. Set analysis preferences

Key configuration options:

- Search engines: Google, Bing, DuckDuckGo

- Download folder: C:\FOCA\Downloads

- Temporary folder: C:\FOCA\Temp

- Maximum file size: 50MB

- Timeout settings: 30 seconds

- Proxy configuration: Manual/Automatic

```_

Suche Engine API Konfiguration

```xml

<!-- Bing Search API -->
<add key="BingAPIKey" value="your_bing_api_key" />

<!-- Shodan API -->
<add key="ShodanAPIKey" value="your_shodan_api_key" />

<!-- VirusTotal API -->
<add key="VirusTotalAPIKey" value="your_virustotal_api_key" />

<!-- Download settings -->
<add key="MaxFileSize" value="52428800" /> <!-- 50MB -->
<add key="DownloadTimeout" value="30000" /> <!-- 30 seconds -->
<add key="MaxConcurrentDownloads" value="5" />

<!-- Proxy settings -->
<add key="UseProxy" value="false" />
<add key="ProxyAddress" value="proxy.company.com" />
<add key="ProxyPort" value="8080" />
<add key="ProxyUsername" value="username" />
<add key="ProxyPassword" value="password" />

```_

Dokumente Entdeckung und Sammlung

Suchmaschinenintegration

```bash

Google Search Configuration:

1. Create Google Custom Search Engine

2. Get API key from Google Cloud Console

3. Configure in FOCA Options

Search operators for document discovery:

site:target.com filetype:pdf

site:target.com filetype:doc

site:target.com filetype:docx

site:target.com filetype:xls

site:target.com filetype:xlsx

site:target.com filetype:ppt

site:target.com filetype:pptx

Advanced search operators:

site:target.com (filetype:pdf OR filetype:doc OR filetype:xls)

site:target.com "confidential" filetype:pdf

site:target.com "internal" filetype:doc

site:target.com inurl:admin filetype:pdf

```_

Sammlung des Handbuchs

```bash

Manual URL addition in FOCA:

1. Go to Project > URLs

2. Add URLs manually or import from file

3. Use bulk import for large lists

URL format examples:

https://target.com/documents/report.pdf https://target.com/files/presentation.pptx https://target.com/downloads/manual.doc https://subdomain.target.com/docs/guide.pdf

Bulk import file format (urls.txt):

https://target.com/doc1.pdf https://target.com/doc2.docx https://target.com/doc3.xlsx https://target.com/doc4.pptx

Import commands:

File > Import > URLs from file

Select urls.txt file

Choose import options

```_

Automatisierte Dokumentenentdeckung

```powershell

PowerShell script for automated document discovery

param( [Parameter(Mandatory=$true)] [string]$Domain,

[string[]]$FileTypes = @("pdf", "doc", "docx", "xls", "xlsx", "ppt", "pptx"),
[string]$OutputFile = "discovered_documents.txt",
[int]$MaxResults = 100

)

Function to search Google for documents

function Search-GoogleDocuments { param($domain, $filetype, $maxResults)

$searchQuery = "site:$domain filetype:$filetype"
$apiKey = "your_google_api_key"
$searchEngineId = "your_search_engine_id"

$results = @()
$startIndex = 1

while ($results.Count -lt $maxResults -and $startIndex -le 100) {
    $url = "https://www.googleapis.com/customsearch/v1?key=$apiKey&cx;=$searchEngineId&q;=$searchQuery&start;=$startIndex"

    try {
        $response = Invoke-RestMethod -Uri $url -Method Get

        if ($response.items) {
            foreach ($item in $response.items) {
                $results += $item.link
            }
            $startIndex += 10
        } else {
            break
        }
    } catch {
        Write-Warning "Error searching for $filetype files: $($_.Exception.Message)"
        break
    }

    Start-Sleep -Seconds 1  # Rate limiting
}

return $results

}

Main execution

$allDocuments = @()

foreach ($fileType in $FileTypes) { Write-Host "Searching for $fileType files on $Domain..." $documents = Search-GoogleDocuments -domain $Domain -filetype $fileType -maxResults $MaxResults $allDocuments += $documents Write-Host "Found $($documents.Count) $fileType files" }

Remove duplicates and save results

| $uniqueDocuments = $allDocuments | Sort-Object | Get-Unique | $uniqueDocuments | Out-File -FilePath $OutputFile -Encoding UTF8

Write-Host "Total unique documents found: $($uniqueDocuments.Count)" Write-Host "Results saved to: $OutputFile"

Usage example:

.\Discover-Documents.ps1 -Domain "example.com" -OutputFile "example_docs.txt"

```_

Metadatenanalyse und Extraktion

Analyse von Metadaten

```bash

FOCA Metadata Analysis Process:

1. Load project or create new one

2. Add documents via search or manual import

3. Download documents automatically

4. Analyze metadata from downloaded files

5. Review extracted information

Metadata types extracted by FOCA:

- Author information

- Creation and modification dates

- Software versions used

- Computer names and usernames

- Network paths and shared folders

- Printer information

- Email addresses

- Company information

- Document templates

- Revision history

- Comments and tracked changes

```_

Erweiterte Metadatenextraktion

```csharp // C# code for custom metadata extraction using System; using System.IO; using DocumentFormat.OpenXml.Packaging; using DocumentFormat.OpenXml.Wordprocessing; using iTextSharp.text.pdf; using iTextSharp.text.pdf.parser;

public class MetadataExtractor { public class DocumentMetadata { public string FileName { get; set; } public string Author { get; set; } public string Creator { get; set; } public DateTime? CreationDate { get; set; } public DateTime? ModificationDate { get; set; } public string Application { get; set; } public string Company { get; set; } public string Subject { get; set; } public string Title { get; set; } public string Keywords { get; set; } public string Comments { get; set; } public string LastModifiedBy { get; set; } public int? RevisionNumber { get; set; } public TimeSpan? TotalEditTime { get; set; } public string Template { get; set; } public List UserNames { get; set; } = new List(); public List ComputerNames { get; set; } = new List(); public List NetworkPaths { get; set; } = new List(); public List EmailAddresses { get; set; } = new List(); public List PrinterNames { get; set; } = new List(); }

public DocumentMetadata ExtractPdfMetadata(string filePath)
{
    var metadata = new DocumentMetadata { FileName = Path.GetFileName(filePath) };

    try
    {
        using (var reader = new PdfReader(filePath))
        {
            var info = reader.Info;

            metadata.Author = info.ContainsKey("Author") ? info["Author"] : null;
            metadata.Creator = info.ContainsKey("Creator") ? info["Creator"] : null;
            metadata.Subject = info.ContainsKey("Subject") ? info["Subject"] : null;
            metadata.Title = info.ContainsKey("Title") ? info["Title"] : null;
            metadata.Keywords = info.ContainsKey("Keywords") ? info["Keywords"] : null;

            if (info.ContainsKey("CreationDate"))
            {
                metadata.CreationDate = ParsePdfDate(info["CreationDate"]);
            }

            if (info.ContainsKey("ModDate"))
            {
                metadata.ModificationDate = ParsePdfDate(info["ModDate"]);
            }

            // Extract text content for additional analysis
            var text = ExtractTextFromPdf(reader);
            AnalyzeTextContent(text, metadata);
        }
    }
    catch (Exception ex)
    {
        Console.WriteLine($"Error extracting PDF metadata: {ex.Message}");
    }

    return metadata;
}

public DocumentMetadata ExtractWordMetadata(string filePath)
{
    var metadata = new DocumentMetadata { FileName = Path.GetFileName(filePath) };

    try
    {
        using (var doc = WordprocessingDocument.Open(filePath, false))
        {
            var coreProps = doc.PackageProperties;
            var extendedProps = doc.ExtendedFilePropertiesPart?.Properties;
            var customProps = doc.CustomFilePropertiesPart?.Properties;

            // Core properties
            metadata.Author = coreProps.Creator;
            metadata.LastModifiedBy = coreProps.LastModifiedBy;
            metadata.CreationDate = coreProps.Created;
            metadata.ModificationDate = coreProps.Modified;
            metadata.Subject = coreProps.Subject;
            metadata.Title = coreProps.Title;
            metadata.Keywords = coreProps.Keywords;
            metadata.Comments = coreProps.Description;

            // Extended properties
            if (extendedProps != null)
            {
                metadata.Application = extendedProps.Application?.Text;
                metadata.Company = extendedProps.Company?.Text;
                metadata.Template = extendedProps.Template?.Text;

                if (extendedProps.TotalTime != null)
                {
                    metadata.TotalEditTime = TimeSpan.FromMinutes(
                        double.Parse(extendedProps.TotalTime.Text)
                    );
                }
            }

            // Extract revision information
            ExtractRevisionInfo(doc, metadata);

            // Extract text content for additional analysis
            var text = ExtractTextFromWord(doc);
            AnalyzeTextContent(text, metadata);
        }
    }
    catch (Exception ex)
    {
        Console.WriteLine($"Error extracting Word metadata: {ex.Message}");
    }

    return metadata;
}

private void AnalyzeTextContent(string text, DocumentMetadata metadata)
{
    if (string.IsNullOrEmpty(text)) return;

    // Extract email addresses
    var emailRegex = new Regex(@"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b");
    var emailMatches = emailRegex.Matches(text);
    foreach (Match match in emailMatches)
    {
        if (!metadata.EmailAddresses.Contains(match.Value))
        {
            metadata.EmailAddresses.Add(match.Value);
        }
    }

    // Extract computer names (Windows format)
    var computerRegex = new Regex(@"\\\\([A-Za-z0-9\-_]+)\\");
    var computerMatches = computerRegex.Matches(text);
    foreach (Match match in computerMatches)
    {
        var computerName = match.Groups[1].Value;
        if (!metadata.ComputerNames.Contains(computerName))
        {
            metadata.ComputerNames.Add(computerName);
        }
    }

    // Extract network paths
    var pathRegex = new Regex(@"\\\\[A-Za-z0-9\-_]+\\[A-Za-z0-9\-_\\]+");
    var pathMatches = pathRegex.Matches(text);
    foreach (Match match in pathMatches)
    {
        if (!metadata.NetworkPaths.Contains(match.Value))
        {
            metadata.NetworkPaths.Add(match.Value);
        }
    }

    // Extract usernames from paths
    var userRegex = new Regex(@"C:\\Users\\([A-Za-z0-9\-_\.]+)\\");
    var userMatches = userRegex.Matches(text);
    foreach (Match match in userMatches)
    {
        var username = match.Groups[1].Value;
        if (!metadata.UserNames.Contains(username) && username != "Public")
        {
            metadata.UserNames.Add(username);
        }
    }
}

private void ExtractRevisionInfo(WordprocessingDocument doc, DocumentMetadata metadata)
{
    try
    {
        var mainPart = doc.MainDocumentPart;
        if (mainPart?.Document?.Body != null)
        {
            // Look for revision tracking information
            var insertions = mainPart.Document.Body.Descendants<Inserted>();
            var deletions = mainPart.Document.Body.Descendants<Deleted>();

            foreach (var insertion in insertions)
            {
                var author = insertion.Author?.Value;
                if (!string.IsNullOrEmpty(author) && !metadata.UserNames.Contains(author))
                {
                    metadata.UserNames.Add(author);
                }
            }

            foreach (var deletion in deletions)
            {
                var author = deletion.Author?.Value;
                if (!string.IsNullOrEmpty(author) && !metadata.UserNames.Contains(author))
                {
                    metadata.UserNames.Add(author);
                }
            }
        }
    }
    catch (Exception ex)
    {
        Console.WriteLine($"Error extracting revision info: {ex.Message}");
    }
}

public void GenerateMetadataReport(List<DocumentMetadata> documents, string outputPath)
{
    var report = new StringBuilder();
    report.AppendLine("FOCA Metadata Analysis Report");
    report.AppendLine("Generated: " + DateTime.Now.ToString("yyyy-MM-dd HH:mm:ss"));
    report.AppendLine(new string('=', 50));
    report.AppendLine();

    // Summary statistics
    report.AppendLine("SUMMARY STATISTICS");
    report.AppendLine($"Total documents analyzed: {documents.Count}");

    var uniqueAuthors = documents.Where(d => !string.IsNullOrEmpty(d.Author))
                               .Select(d => d.Author).Distinct().ToList();
    report.AppendLine($"Unique authors found: {uniqueAuthors.Count}");

    var uniqueUsers = documents.SelectMany(d => d.UserNames).Distinct().ToList();
    report.AppendLine($"Unique usernames found: {uniqueUsers.Count}");

    var uniqueComputers = documents.SelectMany(d => d.ComputerNames).Distinct().ToList();
    report.AppendLine($"Unique computer names found: {uniqueComputers.Count}");

    var uniqueEmails = documents.SelectMany(d => d.EmailAddresses).Distinct().ToList();
    report.AppendLine($"Unique email addresses found: {uniqueEmails.Count}");

    report.AppendLine();

    // Detailed findings
    report.AppendLine("DETAILED FINDINGS");
    report.AppendLine();

    if (uniqueAuthors.Any())
    {
        report.AppendLine("AUTHORS:");
        foreach (var author in uniqueAuthors.OrderBy(a => a))
        {
            var docCount = documents.Count(d => d.Author == author);
            report.AppendLine($"  {author} ({docCount} documents)");
        }
        report.AppendLine();
    }

    if (uniqueUsers.Any())
    {
        report.AppendLine("USERNAMES:");
        foreach (var user in uniqueUsers.OrderBy(u => u))
        {
            report.AppendLine($"  {user}");
        }
        report.AppendLine();
    }

    if (uniqueComputers.Any())
    {
        report.AppendLine("COMPUTER NAMES:");
        foreach (var computer in uniqueComputers.OrderBy(c => c))
        {
            report.AppendLine($"  {computer}");
        }
        report.AppendLine();
    }

    if (uniqueEmails.Any())
    {
        report.AppendLine("EMAIL ADDRESSES:");
        foreach (var email in uniqueEmails.OrderBy(e => e))
        {
            report.AppendLine($"  {email}");
        }
        report.AppendLine();
    }

    // Software analysis
    var applications = documents.Where(d => !string.IsNullOrEmpty(d.Application))
                              .GroupBy(d => d.Application)
                              .OrderByDescending(g => g.Count())
                              .ToList();

    if (applications.Any())
    {
        report.AppendLine("SOFTWARE APPLICATIONS:");
        foreach (var app in applications)
        {
            report.AppendLine($"  {app.Key} ({app.Count()} documents)");
        }
        report.AppendLine();
    }

    File.WriteAllText(outputPath, report.ToString());
}

}

// Usage example var extractor = new MetadataExtractor(); var documents = new List();

// Process all documents in a directory var documentFiles = Directory.GetFiles(@"C:\FOCA\Downloads", ".", SearchOption.AllDirectories) | .Where(f => f.EndsWith(".pdf") | | f.EndsWith(".docx") | | f.EndsWith(".doc")); |

foreach (var file in documentFiles) { DocumentMetadata metadata = null;

if (file.EndsWith(".pdf"))
{
    metadata = extractor.ExtractPdfMetadata(file);
}

| else if (file.EndsWith(".docx") | | file.EndsWith(".doc")) | { metadata = extractor.ExtractWordMetadata(file); }

if (metadata != null)
{
    documents.Add(metadata);
}

}

// Generate report extractor.GenerateMetadataReport(documents, @"C:\FOCA\metadata_report.txt"); ```_

Informationen zum Netzwerk Entdeckung

DNS und Netzwerkanalyse

```bash

FOCA Network Analysis Features:

1. DNS resolution of discovered domains

2. Network range identification

3. Technology fingerprinting

4. Server information extraction

5. Network infrastructure mapping

DNS Analysis in FOCA:

- Automatic DNS resolution of found domains

- Reverse DNS lookups

- DNS record enumeration (A, AAAA, MX, NS, TXT)

- Subdomain discovery from documents

- Network range calculation

Technology Fingerprinting:

- Web server identification

- Operating system detection

- Application framework identification

- Database technology discovery

- Content management system detection

```_

Netzinfrastruktur Mapping

```python

Python script for enhanced network analysis

import dns.resolver import socket import requests import json import ipaddress from urllib.parse import urlparse import whois import ssl import subprocess

class NetworkAnalyzer: def init(self): self.discovered_domains = set() self.discovered_ips = set() self.network_ranges = set() self.technologies = {}

def analyze_document_urls(self, document_urls):
    """Analyze URLs found in documents for network information"""

    for url in document_urls:
        try:
            parsed = urlparse(url)
            domain = parsed.netloc

            if domain:
                self.discovered_domains.add(domain)

                # Resolve domain to IP
                try:
                    ip = socket.gethostbyname(domain)
                    self.discovered_ips.add(ip)

                    # Determine network range
                    network = self.get_network_range(ip)
                    if network:
                        self.network_ranges.add(str(network))

                except socket.gaierror:
                    print(f"Could not resolve {domain}")

        except Exception as e:
            print(f"Error analyzing URL {url}: {e}")

def get_network_range(self, ip):
    """Determine network range for IP address"""
    try:
        # Use whois to get network information
        result = subprocess.run(['whois', ip], capture_output=True, text=True)
        whois_output = result.stdout

        # Parse CIDR from whois output
        for line in whois_output.split('\n'):
            if 'CIDR:' in line or 'route:' in line:
                cidr = line.split(':')[1].strip()
                if '/' in cidr:
                    return ipaddress.ip_network(cidr, strict=False)

        # Fallback to /24 network
        return ipaddress.ip_network(f"{ip}/24", strict=False)

    except Exception as e:
        print(f"Error getting network range for {ip}: {e}")
        return None

def perform_dns_enumeration(self, domain):
    """Perform comprehensive DNS enumeration"""
    dns_records = {}

    record_types = ['A', 'AAAA', 'MX', 'NS', 'TXT', 'CNAME', 'SOA']

    for record_type in record_types:
        try:
            answers = dns.resolver.resolve(domain, record_type)
            dns_records[record_type] = [str(answer) for answer in answers]
        except (dns.resolver.NXDOMAIN, dns.resolver.NoAnswer):
            dns_records[record_type] = []
        except Exception as e:
            print(f"Error resolving {record_type} for {domain}: {e}")
            dns_records[record_type] = []

    return dns_records

def fingerprint_web_technology(self, url):
    """Fingerprint web technologies"""
    try:
        response = requests.get(url, timeout=10, verify=False)

        technology_info = {
            'server': response.headers.get('Server', ''),
            'x_powered_by': response.headers.get('X-Powered-By', ''),
            'content_type': response.headers.get('Content-Type', ''),
            'status_code': response.status_code,
            'technologies': []
        }

        # Analyze response headers
        headers = response.headers
        content = response.text.lower()

        # Common technology indicators
        tech_indicators = {
            'Apache': ['apache'],
            'Nginx': ['nginx'],
            'IIS': ['microsoft-iis'],
            'PHP': ['php', 'x-powered-by: php'],
            'ASP.NET': ['asp.net', 'x-aspnet-version'],
            'WordPress': ['wp-content', 'wordpress'],
            'Drupal': ['drupal'],
            'Joomla': ['joomla'],
            'jQuery': ['jquery'],
            'Bootstrap': ['bootstrap'],
            'Angular': ['angular'],
            'React': ['react'],
            'Vue.js': ['vue.js', 'vuejs']
        }

        for tech, indicators in tech_indicators.items():
            for indicator in indicators:
                if (indicator in str(headers).lower() or 
                    indicator in content):
                    technology_info['technologies'].append(tech)
                    break

        # SSL/TLS information
        if url.startswith('https://'):
            ssl_info = self.get_ssl_info(urlparse(url).netloc)
            technology_info['ssl'] = ssl_info

        return technology_info

    except Exception as e:
        print(f"Error fingerprinting {url}: {e}")
        return None

def get_ssl_info(self, hostname):
    """Get SSL certificate information"""
    try:
        context = ssl.create_default_context()
        with socket.create_connection((hostname, 443), timeout=10) as sock:
            with context.wrap_socket(sock, server_hostname=hostname) as ssock:
                cert = ssock.getpeercert()

                return {
                    'subject': dict(x[0] for x in cert['subject']),
                    'issuer': dict(x[0] for x in cert['issuer']),
                    'version': cert['version'],
                    'serial_number': cert['serialNumber'],
                    'not_before': cert['notBefore'],
                    'not_after': cert['notAfter'],
                    'san': cert.get('subjectAltName', [])
                }
    except Exception as e:
        print(f"Error getting SSL info for {hostname}: {e}")
        return None

def generate_network_report(self, output_file):
    """Generate comprehensive network analysis report"""

    report = {
        'summary': {
            'domains_discovered': len(self.discovered_domains),
            'ip_addresses_discovered': len(self.discovered_ips),
            'network_ranges': len(self.network_ranges)
        },
        'domains': list(self.discovered_domains),
        'ip_addresses': list(self.discovered_ips),
        'network_ranges': list(self.network_ranges),
        'technologies': self.technologies,
        'dns_records': {},
        'ssl_certificates': {}
    }

    # Perform DNS enumeration for each domain
    for domain in self.discovered_domains:
        print(f"Enumerating DNS for {domain}...")
        report['dns_records'][domain] = self.perform_dns_enumeration(domain)

    # Fingerprint technologies for each domain
    for domain in self.discovered_domains:
        print(f"Fingerprinting {domain}...")
        for protocol in ['http', 'https']:
            url = f"{protocol}://{domain}"
            tech_info = self.fingerprint_web_technology(url)
            if tech_info:
                report['technologies'][url] = tech_info

    # Save report
    with open(output_file, 'w') as f:
        json.dump(report, f, indent=2, default=str)

    print(f"Network analysis report saved to {output_file}")
    return report

Usage example

analyzer = NetworkAnalyzer()

Example document URLs (would come from FOCA analysis)

document_urls = [ "https://example.com/documents/report.pdf", "https://subdomain.example.com/files/presentation.pptx", "https://internal.example.com/docs/manual.doc" ]

Analyze network information

analyzer.analyze_document_urls(document_urls)

Generate comprehensive report

report = analyzer.generate_network_report("network_analysis_report.json")

print(f"Discovered {len(analyzer.discovered_domains)} domains") print(f"Discovered {len(analyzer.discovered_ips)} IP addresses") print(f"Identified {len(analyzer.network_ranges)} network ranges") ```_

Benutzer und Organisation Intelligenz

Benutzerprofilierung und Analyse

```bash

FOCA User Intelligence Features:

1. Author extraction from document metadata

2. Username discovery from file paths

3. Email address identification

4. Organizational structure mapping

5. User behavior analysis

User Information Sources in FOCA:

- Document author fields

- Last modified by fields

- File path usernames (C:\Users\username)

- Email addresses in content

- Digital signatures

- Revision tracking information

- Comments and annotations

```_

Erweiterte Benutzeranalyse

```python

Python script for advanced user intelligence analysis

import re import json from collections import defaultdict, Counter from datetime import datetime import networkx as nx import matplotlib.pyplot as plt

class UserIntelligenceAnalyzer: def init(self): self.users = {} self.email_domains = defaultdict(list) self.organizational_structure = defaultdict(list) self.user_relationships = defaultdict(set) self.document_timeline = []

def analyze_user_metadata(self, documents_metadata):
    """Analyze user information from document metadata"""

    for doc in documents_metadata:
        # Extract user information
        users_in_doc = set()

        # Primary author
        if doc.get('author'):
            self.add_user(doc['author'], 'author', doc)
            users_in_doc.add(doc['author'])

        # Last modified by
        if doc.get('last_modified_by'):
            self.add_user(doc['last_modified_by'], 'modifier', doc)
            users_in_doc.add(doc['last_modified_by'])

        # Users from file paths
        for path in doc.get('file_paths', []):
            username = self.extract_username_from_path(path)
            if username:
                self.add_user(username, 'file_path', doc)
                users_in_doc.add(username)

        # Users from revision tracking
        for user in doc.get('revision_users', []):
            self.add_user(user, 'revision', doc)
            users_in_doc.add(user)

        # Email addresses
        for email in doc.get('email_addresses', []):
            username = email.split('@')[0]
            domain = email.split('@')[1]
            self.add_user(username, 'email', doc, email)
            self.email_domains[domain].append(username)
            users_in_doc.add(username)

        # Build user relationships (users who worked on same documents)
        for user1 in users_in_doc:
            for user2 in users_in_doc:
                if user1 != user2:
                    self.user_relationships[user1].add(user2)

        # Document timeline
        if doc.get('creation_date'):
            self.document_timeline.append({
                'date': doc['creation_date'],
                'document': doc['filename'],
                'users': list(users_in_doc)
            })

def add_user(self, username, source_type, document, email=None):
    """Add user information to the database"""

    if username not in self.users:
        self.users[username] = {
            'username': username,
            'email': email,
            'documents': [],
            'roles': set(),
            'first_seen': None,
            'last_seen': None,
            'activity_pattern': defaultdict(int)
        }

    user = self.users[username]
    user['documents'].append(document['filename'])
    user['roles'].add(source_type)

    if email and not user['email']:
        user['email'] = email

    # Update activity timeline
    if document.get('creation_date'):
        date = document['creation_date']
        if not user['first_seen'] or date < user['first_seen']:
            user['first_seen'] = date
        if not user['last_seen'] or date > user['last_seen']:
            user['last_seen'] = date

        # Activity pattern by day of week
        day_of_week = date.strftime('%A')
        user['activity_pattern'][day_of_week] += 1

def extract_username_from_path(self, path):
    """Extract username from file path"""
    patterns = [
        r'C:\\Users\\([^\\]+)\\',
        r'/home/([^/]+)/',
        r'/Users/([^/]+)/',
        r'\\\\[^\\]+\\([^\\]+)\\',
    ]

    for pattern in patterns:
        match = re.search(pattern, path, re.IGNORECASE)
        if match:
            username = match.group(1)
            # Filter out common system accounts
            if username.lower() not in ['public', 'default', 'administrator', 'guest']:
                return username

    return None

def identify_organizational_structure(self):
    """Identify organizational structure from user data"""

    # Analyze email domains to identify departments/organizations
    for domain, users in self.email_domains.items():
        if len(users) > 1:
            self.organizational_structure[domain] = users

    # Analyze user collaboration patterns
    collaboration_groups = self.find_collaboration_groups()

    return {
        'email_domains': dict(self.email_domains),
        'collaboration_groups': collaboration_groups,
        'organizational_chart': self.build_organizational_chart()
    }

def find_collaboration_groups(self):
    """Find groups of users who frequently collaborate"""

    # Build collaboration network
    G = nx.Graph()

    for user, collaborators in self.user_relationships.items():
        for collaborator in collaborators:
            if G.has_edge(user, collaborator):
                G[user][collaborator]['weight'] += 1
            else:
                G.add_edge(user, collaborator, weight=1)

    # Find communities/groups
    try:
        communities = nx.community.greedy_modularity_communities(G)
        return [list(community) for community in communities]
    except:
        # Fallback to simple clustering
        return self.simple_clustering()

def simple_clustering(self):
    """Simple clustering based on shared documents"""
    clusters = []
    processed_users = set()

    for user, collaborators in self.user_relationships.items():
        if user not in processed_users:
            cluster = {user}
            cluster.update(collaborators)

            # Add users who collaborate with any member of the cluster
            expanded = True
            while expanded:
                expanded = False
                for cluster_user in list(cluster):
                    new_collaborators = self.user_relationships[cluster_user] - cluster
                    if new_collaborators:
                        cluster.update(new_collaborators)
                        expanded = True

            clusters.append(list(cluster))
            processed_users.update(cluster)

    return clusters

def build_organizational_chart(self):
    """Build organizational chart based on user analysis"""

    org_chart = {
        'departments': {},
        'roles': defaultdict(list),
        'hierarchy': {}
    }

    # Group by email domains (departments)
    for domain, users in self.email_domains.items():
        org_chart['departments'][domain] = {
            'users': users,
            'document_count': sum(len(self.users[user]['documents']) for user in users if user in self.users),
            'active_period': self.get_department_active_period(users)
        }

    # Identify roles based on activity patterns
    for username, user_data in self.users.items():
        role_indicators = self.analyze_user_role(user_data)
        org_chart['roles'][role_indicators['primary_role']].append(username)

    return org_chart

def analyze_user_role(self, user_data):
    """Analyze user role based on activity patterns"""

    doc_count = len(user_data['documents'])
    roles = user_data['roles']

    # Determine primary role
    if 'author' in roles and doc_count > 5:
        primary_role = 'content_creator'
    elif 'modifier' in roles and doc_count > 10:
        primary_role = 'editor'
    elif 'revision' in roles:
        primary_role = 'reviewer'
    elif doc_count > 20:
        primary_role = 'power_user'
    else:
        primary_role = 'regular_user'

    return {
        'primary_role': primary_role,
        'document_count': doc_count,
        'activity_level': 'high' if doc_count > 10 else 'medium' if doc_count > 3 else 'low'
    }

def get_department_active_period(self, users):
    """Get active period for a department"""
    all_dates = []

    for user in users:
        if user in self.users:
            user_data = self.users[user]
            if user_data['first_seen']:
                all_dates.append(user_data['first_seen'])
            if user_data['last_seen']:
                all_dates.append(user_data['last_seen'])

    if all_dates:
        return {
            'start': min(all_dates),
            'end': max(all_dates)
        }

    return None

def generate_user_intelligence_report(self, output_file):
    """Generate comprehensive user intelligence report"""

    org_structure = self.identify_organizational_structure()

    report = {
        'summary': {
            'total_users': len(self.users),
            'email_domains': len(self.email_domains),
            'collaboration_groups': len(org_structure['collaboration_groups']),
            'total_documents': len(self.document_timeline)
        },
        'users': self.users,
        'organizational_structure': org_structure,
        'user_relationships': {k: list(v) for k, v in self.user_relationships.items()},
        'timeline': sorted(self.document_timeline, key=lambda x: x['date']),
        'insights': self.generate_insights()
    }

    # Convert datetime objects to strings for JSON serialization
    def json_serial(obj):
        if isinstance(obj, datetime):
            return obj.isoformat()
        raise TypeError(f"Type {type(obj)} not serializable")

    with open(output_file, 'w') as f:
        json.dump(report, f, indent=2, default=json_serial)

    print(f"User intelligence report saved to {output_file}")
    return report

def generate_insights(self):
    """Generate actionable insights from user analysis"""

    insights = []

    # Most active users
    most_active = sorted(self.users.items(), 
                       key=lambda x: len(x[1]['documents']), 
                       reverse=True)[:5]

    insights.append({
        'type': 'most_active_users',
        'description': 'Users with highest document activity',
        'data': [(user, len(data['documents'])) for user, data in most_active]
    })

    # Largest email domains
    largest_domains = sorted(self.email_domains.items(), 
                           key=lambda x: len(x[1]), 
                           reverse=True)[:5]

    insights.append({
        'type': 'largest_departments',
        'description': 'Email domains with most users',
        'data': [(domain, len(users)) for domain, users in largest_domains]
    })

    # Users with potential security risks
    risky_users = []
    for username, user_data in self.users.items():
        if (len(user_data['documents']) > 15 and 
            'file_path' in user_data['roles'] and
            user_data['email']):
            risky_users.append(username)

    insights.append({
        'type': 'high_exposure_users',
        'description': 'Users with high document exposure and identifiable information',
        'data': risky_users
    })

    return insights

def visualize_user_network(self, output_file='user_network.png'):
    """Create visualization of user collaboration network"""

    G = nx.Graph()

    # Add nodes and edges
    for user, collaborators in self.user_relationships.items():
        G.add_node(user)
        for collaborator in collaborators:
            G.add_edge(user, collaborator)

    # Create visualization
    plt.figure(figsize=(12, 8))
    pos = nx.spring_layout(G, k=1, iterations=50)

    # Draw network
    nx.draw_networkx_nodes(G, pos, node_color='lightblue', 
                          node_size=300, alpha=0.7)
    nx.draw_networkx_edges(G, pos, alpha=0.5)
    nx.draw_networkx_labels(G, pos, font_size=8)

    plt.title("User Collaboration Network")
    plt.axis('off')
    plt.tight_layout()
    plt.savefig(output_file, dpi=300, bbox_inches='tight')
    plt.close()

    print(f"User network visualization saved to {output_file}")

Usage example

analyzer = UserIntelligenceAnalyzer()

Example metadata (would come from FOCA analysis)

documents_metadata = [ { 'filename': 'report.pdf', 'author': 'john.smith', 'last_modified_by': 'jane.doe', 'creation_date': datetime(2024, 1, 15), 'email_addresses': ['john.smith@example.com', 'jane.doe@example.com'], 'file_paths': ['C:\Users\john.smith\Documents\report.pdf'], 'revision_users': ['john.smith', 'jane.doe', 'bob.wilson'] }, # More documents... ]

Analyze user intelligence

analyzer.analyze_user_metadata(documents_metadata)

Generate comprehensive report

report = analyzer.generate_user_intelligence_report("user_intelligence_report.json")

Create network visualization

analyzer.visualize_user_network("user_collaboration_network.png")

print(f"Analyzed {len(analyzer.users)} users") print(f"Found {len(analyzer.email_domains)} email domains") print(f"Identified {len(analyzer.user_relationships)} user relationships") ```_

Erweiterte Analysetechniken

Temporale Analyse und Muster

```python

Advanced temporal analysis for FOCA data

import pandas as pd import matplotlib.pyplot as plt import seaborn as sns from datetime import datetime, timedelta import numpy as np

class TemporalAnalyzer: def init(self, documents_data): self.df = pd.DataFrame(documents_data) self.df['creation_date'] = pd.to_datetime(self.df['creation_date']) self.df['modification_date'] = pd.to_datetime(self.df['modification_date'])

def analyze_document_creation_patterns(self):
    """Analyze document creation patterns over time"""

    # Group by month
    monthly_creation = self.df.groupby(self.df['creation_date'].dt.to_period('M')).size()

    # Group by day of week
    dow_creation = self.df.groupby(self.df['creation_date'].dt.day_name()).size()

    # Group by hour of day
    hourly_creation = self.df.groupby(self.df['creation_date'].dt.hour).size()

    return {
        'monthly_pattern': monthly_creation.to_dict(),
        'day_of_week_pattern': dow_creation.to_dict(),
        'hourly_pattern': hourly_creation.to_dict()
    }

def identify_work_patterns(self):
    """Identify organizational work patterns"""

    # Business hours analysis (9 AM - 5 PM)
    business_hours = self.df[
        (self.df['creation_date'].dt.hour >= 9) & 
        (self.df['creation_date'].dt.hour <= 17)
    ]

    # Weekend work
    weekend_work = self.df[
        self.df['creation_date'].dt.dayofweek.isin([5, 6])
    ]

    # After hours work
    after_hours = self.df[
        (self.df['creation_date'].dt.hour < 9) | 
        (self.df['creation_date'].dt.hour > 17)
    ]

    return {
        'business_hours_percentage': len(business_hours) / len(self.df) * 100,
        'weekend_work_percentage': len(weekend_work) / len(self.df) * 100,
        'after_hours_percentage': len(after_hours) / len(self.df) * 100,
        'peak_hours': self.df['creation_date'].dt.hour.mode().tolist()
    }

def detect_anomalies(self):
    """Detect temporal anomalies in document creation"""

    # Daily document counts
    daily_counts = self.df.groupby(self.df['creation_date'].dt.date).size()

    # Statistical anomaly detection
    mean_count = daily_counts.mean()
    std_count = daily_counts.std()
    threshold = mean_count + 2 * std_count

    anomalous_days = daily_counts[daily_counts > threshold]

    return {
        'anomalous_days': anomalous_days.to_dict(),
        'normal_range': (mean_count - std_count, mean_count + std_count),
        'peak_activity_days': daily_counts.nlargest(5).to_dict()
    }

def analyze_user_activity_timeline(self):
    """Analyze individual user activity timelines"""

    user_timelines = {}

    for user in self.df['author'].dropna().unique():
        user_docs = self.df[self.df['author'] == user]

        if len(user_docs) > 0:
            user_timelines[user] = {
                'first_document': user_docs['creation_date'].min(),
                'last_document': user_docs['creation_date'].max(),
                'total_documents': len(user_docs),
                'activity_span_days': (user_docs['creation_date'].max() - 
                                     user_docs['creation_date'].min()).days,
                'average_documents_per_month': len(user_docs) / max(1, 
                    (user_docs['creation_date'].max() - 
                     user_docs['creation_date'].min()).days / 30)
            }

    return user_timelines

def generate_temporal_visualizations(self, output_dir='temporal_analysis'):
    """Generate temporal analysis visualizations"""

    import os
    os.makedirs(output_dir, exist_ok=True)

    # 1. Monthly document creation trend
    plt.figure(figsize=(12, 6))
    monthly_data = self.df.groupby(self.df['creation_date'].dt.to_period('M')).size()
    monthly_data.plot(kind='line', marker='o')
    plt.title('Document Creation Trend Over Time')
    plt.xlabel('Month')
    plt.ylabel('Number of Documents')
    plt.xticks(rotation=45)
    plt.tight_layout()
    plt.savefig(f'{output_dir}/monthly_trend.png', dpi=300)
    plt.close()

    # 2. Day of week heatmap
    plt.figure(figsize=(10, 6))
    dow_hour = self.df.groupby([
        self.df['creation_date'].dt.day_name(),
        self.df['creation_date'].dt.hour
    ]).size().unstack(fill_value=0)

    # Reorder days
    day_order = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
    dow_hour = dow_hour.reindex(day_order)

    sns.heatmap(dow_hour, cmap='YlOrRd', annot=False, fmt='d')
    plt.title('Document Creation Heatmap (Day of Week vs Hour)')
    plt.xlabel('Hour of Day')
    plt.ylabel('Day of Week')
    plt.tight_layout()
    plt.savefig(f'{output_dir}/activity_heatmap.png', dpi=300)
    plt.close()

    # 3. User activity timeline
    plt.figure(figsize=(14, 8))

    # Get top 10 most active users
    top_users = self.df['author'].value_counts().head(10).index

    for i, user in enumerate(top_users):
        user_docs = self.df[self.df['author'] == user]
        plt.scatter(user_docs['creation_date'], [i] * len(user_docs), 
                   alpha=0.6, s=50, label=user)

    plt.yticks(range(len(top_users)), top_users)
    plt.xlabel('Date')
    plt.ylabel('User')
    plt.title('User Activity Timeline (Top 10 Users)')
    plt.xticks(rotation=45)
    plt.tight_layout()
    plt.savefig(f'{output_dir}/user_timeline.png', dpi=300)
    plt.close()

    print(f"Temporal visualizations saved to {output_dir}/")

Usage example

temporal_analyzer = TemporalAnalyzer(documents_data)

Analyze patterns

creation_patterns = temporal_analyzer.analyze_document_creation_patterns() work_patterns = temporal_analyzer.identify_work_patterns() anomalies = temporal_analyzer.detect_anomalies() user_timelines = temporal_analyzer.analyze_user_activity_timeline()

Generate visualizations

temporal_analyzer.generate_temporal_visualizations()

print("Temporal Analysis Results:") print(f"Peak creation hours: {work_patterns['peak_hours']}") print(f"Business hours work: {work_patterns['business_hours_percentage']:.1f}%") print(f"Weekend work: {work_patterns['weekend_work_percentage']:.1f}%") print(f"Anomalous activity days: {len(anomalies['anomalous_days'])}") ```_

Sicherheitsrisikobewertung

```python

Security risk assessment based on FOCA findings

class SecurityRiskAssessment: def init(self, foca_data): self.documents = foca_data['documents'] self.users = foca_data['users'] self.network_info = foca_data['network_info'] self.risk_score = 0 self.risk_factors = []

def assess_information_disclosure_risk(self):
    """Assess risk from information disclosure in documents"""

    risk_indicators = {
        'high_risk': {
            'patterns': [
                r'password\s*[:=]\s*\w+',
                r'api[_-]?key\s*[:=]\s*[a-zA-Z0-9]+',
                r'secret\s*[:=]\s*\w+',
                r'confidential',
                r'internal\s+use\s+only',
                r'proprietary',
                r'ssn\s*[:=]?\s*\d{3}-?\d{2}-?\d{4}',
                r'credit\s+card\s*[:=]?\s*\d{4}[-\s]?\d{4}[-\s]?\d{4}[-\s]?\d{4}'
            ],
            'score': 10
        },
        'medium_risk': {
            'patterns': [
                r'internal',
                r'private',
                r'restricted',
                r'employee\s+id\s*[:=]?\s*\d+',
                r'phone\s*[:=]?\s*\+?\d{10,15}',
                r'address\s*[:=]?.*\d{5}'
            ],
            'score': 5
        },
        'low_risk': {
            'patterns': [
                r'draft',
                r'preliminary',
                r'work\s+in\s+progress',
                r'todo',
                r'fixme'
            ],
            'score': 2
        }
    }

    total_risk = 0
    findings = []

    for doc in self.documents:
        content = doc.get('content', '').lower()

        for risk_level, config in risk_indicators.items():
            for pattern in config['patterns']:
                matches = re.findall(pattern, content, re.IGNORECASE)
                if matches:
                    finding = {
                        'document': doc['filename'],
                        'risk_level': risk_level,
                        'pattern': pattern,
                        'matches': len(matches),
                        'score': config['score'] * len(matches)
                    }
                    findings.append(finding)
                    total_risk += finding['score']

    return {
        'total_risk_score': total_risk,
        'findings': findings,
        'risk_level': self.categorize_risk(total_risk)
    }

def assess_metadata_exposure_risk(self):
    """Assess risk from metadata exposure"""

    risk_factors = []
    total_score = 0

    # User information exposure
    unique_users = set()
    for doc in self.documents:
        if doc.get('author'):
            unique_users.add(doc['author'])
        if doc.get('last_modified_by'):
            unique_users.add(doc['last_modified_by'])

    if len(unique_users) > 10:
        risk_factors.append({
            'factor': 'High user exposure',
            'description': f'{len(unique_users)} unique users identified',
            'score': 15
        })
        total_score += 15
    elif len(unique_users) > 5:
        risk_factors.append({
            'factor': 'Medium user exposure',
            'description': f'{len(unique_users)} unique users identified',
            'score': 8
        })
        total_score += 8

    # Email domain exposure
    email_domains = set()
    for doc in self.documents:
        for email in doc.get('email_addresses', []):
            domain = email.split('@')[1] if '@' in email else None
            if domain:
                email_domains.add(domain)

    if len(email_domains) > 3:
        risk_factors.append({
            'factor': 'Multiple email domains exposed',
            'description': f'{len(email_domains)} email domains found',
            'score': 10
        })
        total_score += 10

    # Internal path exposure
    internal_paths = []
    for doc in self.documents:
        for path in doc.get('file_paths', []):
            if any(indicator in path.lower() for indicator in ['c:\\users\\', 'internal', 'private']):
                internal_paths.append(path)

    if len(internal_paths) > 5:
        risk_factors.append({
            'factor': 'Internal file path exposure',
            'description': f'{len(internal_paths)} internal paths exposed',
            'score': 12
        })
        total_score += 12

    # Software version exposure
    software_versions = []
    for doc in self.documents:
        if doc.get('application'):
            software_versions.append(doc['application'])

    if len(set(software_versions)) > 5:
        risk_factors.append({
            'factor': 'Software fingerprinting risk',
            'description': f'{len(set(software_versions))} different applications identified',
            'score': 6
        })
        total_score += 6

    return {
        'total_risk_score': total_score,
        'risk_factors': risk_factors,
        'risk_level': self.categorize_risk(total_score)
    }

def assess_network_exposure_risk(self):
    """Assess network infrastructure exposure risk"""

    risk_factors = []
    total_score = 0

    # Subdomain exposure
    subdomains = self.network_info.get('subdomains', [])
    if len(subdomains) > 20:
        risk_factors.append({
            'factor': 'High subdomain exposure',
            'description': f'{len(subdomains)} subdomains discovered',
            'score': 15
        })
        total_score += 15
    elif len(subdomains) > 10:
        risk_factors.append({
            'factor': 'Medium subdomain exposure',
            'description': f'{len(subdomains)} subdomains discovered',
            'score': 8
        })
        total_score += 8

    # Internal subdomain exposure
    internal_subdomains = [s for s in subdomains if any(
        keyword in s.lower() for keyword in ['internal', 'intranet', 'private', 'dev', 'test', 'staging']
    )]

    if len(internal_subdomains) > 0:
        risk_factors.append({
            'factor': 'Internal subdomain exposure',
            'description': f'{len(internal_subdomains)} internal subdomains found',
            'score': 20
        })
        total_score += 20

    # Technology stack exposure
    technologies = self.network_info.get('technologies', {})
    if len(technologies) > 10:
        risk_factors.append({
            'factor': 'Technology stack fingerprinting',
            'description': f'{len(technologies)} technologies identified',
            'score': 8
        })
        total_score += 8

    return {
        'total_risk_score': total_score,
        'risk_factors': risk_factors,
        'risk_level': self.categorize_risk(total_score)
    }

def categorize_risk(self, score):
    """Categorize risk level based on score"""
    if score >= 50:
        return 'CRITICAL'
    elif score >= 30:
        return 'HIGH'
    elif score >= 15:
        return 'MEDIUM'
    elif score >= 5:
        return 'LOW'
    else:
        return 'MINIMAL'

def generate_comprehensive_risk_assessment(self):
    """Generate comprehensive risk assessment report"""

    info_disclosure = self.assess_information_disclosure_risk()
    metadata_exposure = self.assess_metadata_exposure_risk()
    network_exposure = self.assess_network_exposure_risk()

    total_risk_score = (
        info_disclosure['total_risk_score'] +
        metadata_exposure['total_risk_score'] +
        network_exposure['total_risk_score']
    )

    assessment = {
        'overall_risk_score': total_risk_score,
        'overall_risk_level': self.categorize_risk(total_risk_score),
        'risk_categories': {
            'information_disclosure': info_disclosure,
            'metadata_exposure': metadata_exposure,
            'network_exposure': network_exposure
        },
        'recommendations': self.generate_recommendations(total_risk_score),
        'executive_summary': self.generate_executive_summary(total_risk_score)
    }

    return assessment

def generate_recommendations(self, risk_score):
    """Generate security recommendations based on risk assessment"""

    recommendations = []

    if risk_score >= 50:
        recommendations.extend([
            "IMMEDIATE ACTION REQUIRED: Critical security risks identified",
            "Conduct emergency security review of all public documents",
            "Implement document classification and handling procedures",
            "Review and restrict access to internal systems and documents",
            "Consider taking affected systems offline until remediation"
        ])

    if risk_score >= 30:
        recommendations.extend([
            "Implement document metadata sanitization procedures",
            "Review and update information security policies",
            "Conduct security awareness training for all staff",
            "Implement data loss prevention (DLP) solutions",
            "Regular security audits of public-facing documents"
        ])

    if risk_score >= 15:
        recommendations.extend([
            "Establish document review process before publication",
            "Implement metadata removal tools and procedures",
            "Review subdomain and network exposure",
            "Update security awareness training materials",
            "Consider implementing document watermarking"
        ])

    recommendations.extend([
        "Regular OSINT assessments of organizational exposure",
        "Monitor for new document publications and leaks",
        "Implement automated metadata scanning tools",
        "Establish incident response procedures for information disclosure",
        "Regular review of public-facing digital assets"
    ])

    return recommendations

def generate_executive_summary(self, risk_score):
    """Generate executive summary of risk assessment"""

    risk_level = self.categorize_risk(risk_score)

    summary = f"""
    EXECUTIVE SUMMARY - FOCA Security Risk Assessment

    Overall Risk Level: {risk_level}
    Risk Score: {risk_score}/100

    This assessment analyzed {len(self.documents)} documents and associated metadata
    to identify potential security risks from information disclosure.

    Key Findings:
    - {len(set(doc.get('author', '') for doc in self.documents if doc.get('author')))} unique users identified
    - {len(self.network_info.get('subdomains', []))} subdomains discovered
    - {len(self.network_info.get('technologies', {}))} technologies fingerprinted

    Risk Level Interpretation:
    - CRITICAL (50+): Immediate action required, significant security exposure
    - HIGH (30-49): High priority remediation needed within 30 days
    - MEDIUM (15-29): Moderate risk, address within 90 days
    - LOW (5-14): Low risk, include in regular security review cycle
    - MINIMAL (0-4): Minimal risk, maintain current security posture

    Recommendation: {"Immediate remediation required" if risk_score >= 50 else 
                    "Priority security review needed" if risk_score >= 30 else
                    "Include in next security review cycle" if risk_score >= 15 else
                    "Monitor and maintain current security measures"}
    """

    return summary.strip()

Usage example

foca_data = { 'documents': documents_metadata, # From previous examples 'users': user_data, # From user intelligence analysis 'network_info': network_analysis # From network analysis }

risk_assessor = SecurityRiskAssessment(foca_data) comprehensive_assessment = risk_assessor.generate_comprehensive_risk_assessment()

print("Security Risk Assessment Results:") print(f"Overall Risk Level: {comprehensive_assessment['overall_risk_level']}") print(f"Risk Score: {comprehensive_assessment['overall_risk_score']}") print("\nTop Recommendations:") for i, rec in enumerate(comprehensive_assessment['recommendations'][:5], 1): print(f"{i}. {rec}") ```_

Integration und Automatisierung

FOCA API und Automatisierung

```python

FOCA automation and integration framework

import os import subprocess import json import time from pathlib import Path import requests from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC

class FOCAAutomation: def init(self, foca_path, workspace_dir): self.foca_path = foca_path self.workspace_dir = Path(workspace_dir) self.workspace_dir.mkdir(exist_ok=True)

def create_automated_project(self, target_domain, project_name):
    """Create and configure FOCA project automatically"""

    project_config = {
        'name': project_name,
        'target': target_domain,
        'search_engines': ['google', 'bing'],
        'file_types': ['pdf', 'doc', 'docx', 'xls', 'xlsx', 'ppt', 'pptx'],
        'max_results': 100,
        'download_files': True,
        'analyze_metadata': True
    }

    # Save project configuration
    config_file = self.workspace_dir / f"{project_name}_config.json"
    with open(config_file, 'w') as f:
        json.dump(project_config, f, indent=2)

    return config_file

def automated_document_discovery(self, target_domain, output_file):
    """Automated document discovery using multiple methods"""

    discovered_urls = set()

    # Method 1: Google dorking
    google_urls = self.google_dork_search(target_domain)
    discovered_urls.update(google_urls)

    # Method 2: Bing search
    bing_urls = self.bing_search(target_domain)
    discovered_urls.update(bing_urls)

    # Method 3: Site crawling
    crawled_urls = self.crawl_site_for_documents(target_domain)
    discovered_urls.update(crawled_urls)

    # Method 4: Certificate transparency logs
    ct_urls = self.search_certificate_transparency(target_domain)
    discovered_urls.update(ct_urls)

    # Save results
    with open(output_file, 'w') as f:
        for url in sorted(discovered_urls):
            f.write(f"{url}\n")

    return list(discovered_urls)

def google_dork_search(self, domain):
    """Perform Google dorking for document discovery"""

    file_types = ['pdf', 'doc', 'docx', 'xls', 'xlsx', 'ppt', 'pptx']
    discovered_urls = set()

    for file_type in file_types:
        query = f"site:{domain} filetype:{file_type}"

        # Use requests with proper headers to avoid blocking
        headers = {
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
        }

        try:
            # Note: In practice, use Google Custom Search API
            # This is a simplified example
            search_url = f"https://www.google.com/search?q={query}"
            response = requests.get(search_url, headers=headers)

            # Parse results (simplified - use proper HTML parsing)
            # Extract URLs from search results
            # Add to discovered_urls set

            time.sleep(2)  # Rate limiting

        except Exception as e:
            print(f"Error searching for {file_type} files: {e}")

    return discovered_urls

def crawl_site_for_documents(self, domain):
    """Crawl website for document links"""

    discovered_urls = set()

    try:
        # Use Selenium for JavaScript-heavy sites
        options = webdriver.ChromeOptions()
        options.add_argument('--headless')
        options.add_argument('--no-sandbox')
        options.add_argument('--disable-dev-shm-usage')

        driver = webdriver.Chrome(options=options)

        # Start from main domain
        driver.get(f"https://{domain}")

        # Find all links
        links = driver.find_elements(By.TAG_NAME, "a")

        for link in links:
            href = link.get_attribute('href')
            if href and any(ext in href.lower() for ext in ['.pdf', '.doc', '.xls', '.ppt']):
                discovered_urls.add(href)

        # Also check common document directories
        common_paths = ['/documents/', '/files/', '/downloads/', '/resources/', '/docs/']

        for path in common_paths:
            try:
                driver.get(f"https://{domain}{path}")
                links = driver.find_elements(By.TAG_NAME, "a")

                for link in links:
                    href = link.get_attribute('href')
                    if href and any(ext in href.lower() for ext in ['.pdf', '.doc', '.xls', '.ppt']):
                        discovered_urls.add(href)

            except Exception:
                continue

        driver.quit()

    except Exception as e:
        print(f"Error crawling {domain}: {e}")

    return discovered_urls

def search_certificate_transparency(self, domain):
    """Search certificate transparency logs for subdomains"""

    discovered_urls = set()

    try:
        # Query crt.sh for certificate transparency data
        ct_url = f"https://crt.sh/?q=%.{domain}&output;=json"
        response = requests.get(ct_url, timeout=30)

        if response.status_code == 200:
            certificates = response.json()

            subdomains = set()
            for cert in certificates:
                name_value = cert.get('name_value', '')
                for name in name_value.split('\n'):
                    if domain in name and not name.startswith('*'):
                        subdomains.add(name.strip())

            # Check each subdomain for documents
            for subdomain in subdomains:
                try:
                    # Quick check for common document paths
                    for path in ['/documents/', '/files/', '/downloads/']:
                        test_url = f"https://{subdomain}{path}"
                        response = requests.head(test_url, timeout=5)
                        if response.status_code == 200:
                            discovered_urls.add(test_url)
                except:
                    continue

    except Exception as e:
        print(f"Error searching certificate transparency: {e}")

    return discovered_urls

def bulk_download_documents(self, url_list, download_dir):
    """Bulk download documents from URL list"""

    download_path = Path(download_dir)
    download_path.mkdir(exist_ok=True)

    downloaded_files = []

    for url in url_list:
        try:
            response = requests.get(url, timeout=30, stream=True)

            if response.status_code == 200:
                # Extract filename from URL or Content-Disposition header
                filename = self.extract_filename(url, response.headers)
                file_path = download_path / filename

                with open(file_path, 'wb') as f:
                    for chunk in response.iter_content(chunk_size=8192):
                        f.write(chunk)

                downloaded_files.append(str(file_path))
                print(f"Downloaded: {filename}")

            time.sleep(1)  # Rate limiting

        except Exception as e:
            print(f"Error downloading {url}: {e}")

    return downloaded_files

def extract_filename(self, url, headers):
    """Extract filename from URL or headers"""

    # Try Content-Disposition header first
    content_disposition = headers.get('Content-Disposition', '')
    if 'filename=' in content_disposition:
        filename = content_disposition.split('filename=')[1].strip('"')
        return filename

    # Extract from URL
    filename = url.split('/')[-1]
    if '?' in filename:
        filename = filename.split('?')[0]

    # Ensure valid filename
    if not filename or '.' not in filename:
        filename = f"document_{hash(url)}.pdf"

    return filename

def automated_metadata_analysis(self, file_list):
    """Perform automated metadata analysis on downloaded files"""

    analysis_results = []

    for file_path in file_list:
        try:
            # Use appropriate metadata extractor based on file type
            if file_path.lower().endswith('.pdf'):
                metadata = self.extract_pdf_metadata(file_path)
            elif file_path.lower().endswith(('.doc', '.docx')):
                metadata = self.extract_office_metadata(file_path)
            elif file_path.lower().endswith(('.xls', '.xlsx')):
                metadata = self.extract_excel_metadata(file_path)
            else:
                continue

            analysis_results.append({
                'file_path': file_path,
                'metadata': metadata
            })

        except Exception as e:
            print(f"Error analyzing {file_path}: {e}")

    return analysis_results

def generate_automated_report(self, analysis_results, output_file):
    """Generate automated FOCA analysis report"""

    report = {
        'timestamp': time.strftime('%Y-%m-%d %H:%M:%S'),
        'total_files_analyzed': len(analysis_results),
        'summary': self.generate_analysis_summary(analysis_results),
        'detailed_results': analysis_results,
        'risk_assessment': self.perform_automated_risk_assessment(analysis_results),
        'recommendations': self.generate_automated_recommendations(analysis_results)
    }

    with open(output_file, 'w') as f:
        json.dump(report, f, indent=2, default=str)

    return report

def run_full_automated_analysis(self, target_domain, project_name):
    """Run complete automated FOCA analysis"""

    print(f"Starting automated FOCA analysis for {target_domain}")

    # Step 1: Create project
    config_file = self.create_automated_project(target_domain, project_name)
    print(f"Created project configuration: {config_file}")

    # Step 2: Document discovery
    urls_file = self.workspace_dir / f"{project_name}_urls.txt"
    discovered_urls = self.automated_document_discovery(target_domain, urls_file)
    print(f"Discovered {len(discovered_urls)} document URLs")

    # Step 3: Download documents
    download_dir = self.workspace_dir / f"{project_name}_downloads"
    downloaded_files = self.bulk_download_documents(discovered_urls, download_dir)
    print(f"Downloaded {len(downloaded_files)} documents")

    # Step 4: Metadata analysis
    analysis_results = self.automated_metadata_analysis(downloaded_files)
    print(f"Analyzed {len(analysis_results)} documents")

    # Step 5: Generate report
    report_file = self.workspace_dir / f"{project_name}_report.json"
    report = self.generate_automated_report(analysis_results, report_file)
    print(f"Generated analysis report: {report_file}")

    return report

Usage example

foca_automation = FOCAAutomation( foca_path="C:\FOCA\FOCA.exe", workspace_dir="C:\FOCA_Automation" )

Run full automated analysis

report = foca_automation.run_full_automated_analysis("example.com", "example_analysis")

print("Automated FOCA Analysis Complete!") print(f"Total files analyzed: {report['total_files_analyzed']}") print(f"Risk level: {report['risk_assessment']['overall_risk_level']}") ```_

Best Practices und Optimierung

Leistungsoptimierung

```bash

FOCA Performance Optimization Tips:

1. Configure appropriate timeouts

Options > Configuration > Network

- Connection timeout: 30 seconds

- Download timeout: 60 seconds

- Maximum file size: 50MB

2. Optimize search settings

- Limit search results per engine: 100

- Use specific file type filters

- Exclude common non-target file types

3. Parallel processing

- Enable multiple concurrent downloads: 5-10

- Use multiple search engines simultaneously

- Process different file types in parallel

4. Storage optimization

- Use SSD storage for temporary files

- Regular cleanup of downloaded files

- Compress analysis results

5. Memory management

- Close unnecessary applications

- Increase virtual memory if needed

- Monitor memory usage during large analyses

```_

Rechtliche und ethische Überlegungen

```bash

Legal and Ethical Guidelines for FOCA Usage:

1. Authorization Requirements:

- Only analyze publicly available documents

- Obtain written permission for internal assessments

- Respect robots.txt and terms of service

- Follow applicable laws and regulations

2. Data Handling:

- Secure storage of downloaded documents

- Proper disposal of sensitive information

- Encryption of analysis results

- Limited retention periods

3. Responsible Disclosure:

- Report findings to appropriate parties

- Allow reasonable time for remediation

- Follow coordinated disclosure practices

- Document all activities and findings

4. Privacy Considerations:

- Minimize collection of personal information

- Anonymize data when possible

- Respect individual privacy rights

- Comply with data protection regulations

```_

Probleme bei der Fehlerbehebung

```bash

Common FOCA Issues and Solutions:

Issue 1: Search engines not returning results

Solution:

- Verify API keys are configured correctly

- Check internet connectivity

- Verify search engine quotas

- Try alternative search engines

Issue 2: Documents not downloading

Solution:

- Check file size limits

- Verify download directory permissions

- Test individual URLs manually

- Check for anti-bot protection

Issue 3: Metadata extraction failures

Solution:

- Verify file integrity

- Check file format compatibility

- Update Microsoft Office components

- Try alternative extraction tools

Issue 4: Performance issues

Solution:

- Reduce concurrent operations

- Increase system memory

- Use faster storage (SSD)

- Close unnecessary applications

Issue 5: False positive results

Solution:

- Verify document authenticity

- Cross-reference with multiple sources

- Manual verification of findings

- Update analysis rules and filters

```_

Ressourcen

Dokumentation und Schulung

%20Verwandte%20Tools%20und%20Ressourcen

-%20ExifTool - Erweiterte Metadatenextraktion - MAT2 - Metadaten-Entfernungstool - Dokument Analyzer - Alternative Metadatenanalyse - Metagoofil - Ähnliche Betriebssysteme

Rechtliche und Compliance Ressourcen

Schulung und Zertifizierung