Zum Inhalt

Semgrep Static Analysis Tool Cheat Sheet

generieren

Überblick

Semgrep ist ein schnelles, Open-Source-Statik-Analyse-Tool, um Fehler, Sicherheitslücken zu finden und Code-Standards in mehreren Programmiersprachen zu forcieren. Es verwendet musterbasierte Analyse mit einer einfachen, intuitiven Syntax, die Entwicklern erlaubt, benutzerdefinierte Regeln einfach zu schreiben. Semgrep ist besonders wertvoll in DevSecOps Pipelines für seine Geschwindigkeit, Genauigkeit und umfangreiche Regelbibliothek für Sicherheits-, Korrektheits- und Leistungsfragen.

ZEIT Anmerkung: Semgrep ist für musterbasierte statische Analyse konzipiert und kann kundenspezifische Regeln für organisatorische Sicherheitsanforderungen verlangen. Sie sollte in CI/CD-Pipelines zur kontinuierlichen Sicherheitsüberwachung integriert werden.

Installation

Verwendung von pip (empfohlen)

```bash

Install Semgrep

pip install semgrep

Install with specific version

pip install semgrep==1.45.0

Install from source

pip install git+https://github.com/returntocorp/semgrep.git

Verify installation

semgrep --version ```_

Verwendung von Homebrew (macOS)

```bash

Install Semgrep

brew install semgrep

Update Semgrep

brew upgrade semgrep ```_

Verwendung von Docker

```bash

Pull Semgrep image

docker pull returntocorp/semgrep

Run Semgrep in container

docker run --rm -v $(pwd):/src returntocorp/semgrep --config=auto /src

Create alias for convenience

alias semgrep='docker run --rm -v $(pwd):/src returntocorp/semgrep'

Build custom image

cat > Dockerfile ``<< 'EOF' FROM returntocorp/semgrep WORKDIR /src ENTRYPOINT ["semgrep"] EOF

docker build -t custom-semgrep . ```_

Paketmanager

```bash

Ubuntu/Debian (via pip)

sudo apt update sudo apt install python3-pip pip3 install semgrep

CentOS/RHEL/Fedora

sudo dnf install python3-pip pip3 install semgrep

Arch Linux

sudo pacman -S python-pip pip install semgrep ```_

Binärinstallation

```bash

Download binary (Linux)

curl -L https://github.com/returntocorp/semgrep/releases/latest/download/semgrep-linux-x86_64 -o semgrep chmod +x semgrep sudo mv semgrep /usr/local/bin/

Download binary (macOS)

curl -L https://github.com/returntocorp/semgrep/releases/latest/download/semgrep-macos-x86_64 -o semgrep chmod +x semgrep sudo mv semgrep /usr/local/bin/ ```_

Basisnutzung

Schneller Start

```bash

Scan with auto-configuration (recommended for beginners)

semgrep --config=auto .

Scan specific directory

semgrep --config=auto /path/to/project

Scan single file

semgrep --config=auto file.py

Scan with specific ruleset

semgrep --config=p/security-audit . semgrep --config=p/owasp-top-ten . semgrep --config=p/cwe-top-25 .

Scan with multiple rulesets

semgrep --config=p/security-audit --config=p/owasp-top-ten . ```_

Ausgabeformate

```bash

Default text output

semgrep --config=auto .

JSON output

semgrep --config=auto --json .

SARIF output (for GitHub integration)

semgrep --config=auto --sarif .

JUnit XML output

semgrep --config=auto --junit-xml .

Emacs output format

semgrep --config=auto --emacs .

Vim output format

semgrep --config=auto --vim .

Save output to file

semgrep --config=auto --json --output=results.json . semgrep --config=auto --sarif --output=results.sarif . ```_

Filtern und Targeting

```bash

Include specific file patterns

semgrep --config=auto --include=".py" . semgrep --config=auto --include=".js" --include="*.ts" .

Exclude specific file patterns

semgrep --config=auto --exclude="test" . semgrep --config=auto --exclude="node_modules" --exclude="vendor" .

Scan specific languages

semgrep --config=auto --lang=python . semgrep --config=auto --lang=javascript . semgrep --config=auto --lang=java .

Severity filtering

semgrep --config=auto --severity=ERROR . semgrep --config=auto --severity=WARNING . semgrep --config=auto --severity=INFO . ```_

Regelkonfiguration

Verwendung von integrierten Regeln

```bash

Security-focused rulesets

semgrep --config=p/security-audit . semgrep --config=p/owasp-top-ten . semgrep --config=p/cwe-top-25 . semgrep --config=p/secrets .

Language-specific rulesets

semgrep --config=p/python . semgrep --config=p/javascript . semgrep --config=p/java . semgrep --config=p/go .

Framework-specific rulesets

semgrep --config=p/django . semgrep --config=p/flask . semgrep --config=p/react . semgrep --config=p/express .

Code quality rulesets

semgrep --config=p/code-quality . semgrep --config=p/performance . semgrep --config=p/correctness .

List available rulesets

semgrep --config=p/ ```_

Zollvorschriften

```yaml

custom-rules.yml

rules: - id: hardcoded-password pattern: password = "..." message: Hardcoded password detected languages: [python] severity: ERROR

  • id: sql-injection pattern-either:

    • pattern: cursor.execute("..." + $VAR)
    • pattern: cursor.execute(f"...\{$VAR\}...") message: Potential SQL injection vulnerability languages: [python] severity: ERROR
  • id: unsafe-yaml-load pattern: yaml.load($DATA) message: Use yaml.safe_load() instead of yaml.load() languages: [python] severity: WARNING fix: yaml.safe_load($DATA)

  • id: missing-csrf-protection pattern: | class $CLASS(...): ... def post(self, ...): ... pattern-not: | class $CLASS(...): ... @csrf_exempt def post(self, ...): ... message: POST method missing CSRF protection languages: [python] severity: ERROR ```_

Regel Syntax Beispiele

```yaml

Pattern matching

rules: - id: basic-pattern pattern: eval($X) message: Avoid using eval() languages: [python] severity: ERROR

  • id: pattern-either pattern-either:

    • pattern: exec($X)
    • pattern: eval($X) message: Avoid using exec() or eval() languages: [python] severity: ERROR
  • id: pattern-inside pattern-inside: | def $FUNC(...): ... pattern: return $X message: Function returns value languages: [python] severity: INFO

  • id: pattern-not pattern: requests.get($URL) pattern-not: requests.get($URL, verify=True) message: HTTPS request without certificate verification languages: [python] severity: WARNING

  • id: metavariable-regex pattern: $FUNC($ARG) metavariable-regex: metavariable: $FUNC regex: ^(exec|eval)$ message: Dangerous function call languages: [python] severity: ERROR ```_

Erweiterte Nutzung

Konfigurationsdateien

```yaml

.semgrep.yml

rules: - rules/security - rules/performance

exclude: - "/tests/" - "/node_modules/" - "/vendor/" - "*.min.js"

include: - ".py" - ".js" - ".java" - ".go"

severity: - ERROR - WARNING ```_

Artikel 1

```yaml

advanced-rules.yml

rules: - id: jwt-hardcoded-secret pattern-either: - pattern: jwt.encode($PAYLOAD, "...", ...) - pattern: jwt.decode($TOKEN, "...", ...) message: JWT secret should not be hardcoded languages: [python] severity: ERROR metadata: cwe: "CWE-798: Use of Hard-coded Credentials" owasp: "A02:2021 - Cryptographic Failures"

  • id: unsafe-deserialization pattern-either:

    • pattern: pickle.loads($DATA)
    • pattern: pickle.load($FILE)
    • pattern: cPickle.loads($DATA) message: Unsafe deserialization with pickle languages: [python] severity: ERROR metadata: cwe: "CWE-502: Deserialization of Untrusted Data"
  • id: command-injection pattern-either:

    • pattern: os.system($CMD)
    • pattern: subprocess.call($CMD, shell=True)
    • pattern: subprocess.run($CMD, shell=True) pattern-not-inside: | $CMD = "..." message: Potential command injection vulnerability languages: [python] severity: ERROR fix-regex: regex: 'shell=True' replacement: 'shell=False' ```_

Taint Analyse

```yaml

taint-rules.yml

rules: - id: user-input-to-sql mode: taint pattern-sources: - pattern: request.args.get(...) - pattern: request.form.get(...) - pattern: request.json.get(...) pattern-sinks: - pattern: cursor.execute($QUERY) - pattern: db.execute($QUERY) message: User input flows to SQL query languages: [python] severity: ERROR

  • id: user-input-to-eval mode: taint pattern-sources:
    • pattern: input(...)
    • pattern: sys.argv[...] pattern-sinks:
    • pattern: eval($CODE)
    • pattern: exec($CODE) message: User input flows to code execution languages: [python] severity: ERROR ```_

CI/CD Integration

GitHub Aktionen

```yaml

.github/workflows/semgrep.yml

name: Semgrep Security Scan

on: push: branches: [ main, develop ] pull_request: branches: [ main ]

jobs: semgrep: name: Scan runs-on: ubuntu-latest

container:
  image: returntocorp/semgrep

steps:
- uses: actions/checkout@v3

- name: Run Semgrep
  run: |
    semgrep \
      --config=auto \
      --sarif \
      --output=semgrep-results.sarif \
      .

- name: Upload SARIF file
  uses: github/codeql-action/upload-sarif@v2
  with:
    sarif_file: semgrep-results.sarif
  if: always()

- name: Upload results
  uses: actions/upload-artifact@v3
  with:
    name: semgrep-report
    path: semgrep-results.sarif

```_

GitLab CI

```yaml

.gitlab-ci.yml

stages: - security

semgrep: stage: security image: returntocorp/semgrep script: - semgrep --config=auto --json --output=semgrep-report.json . artifacts: reports: sast: semgrep-report.json paths: - semgrep-report.json expire_in: 1 week allow_failure: true ```_

Jenkins Pipeline

```groovy // Jenkinsfile pipeline \{ agent any

stages \\\{
    stage('Security Scan') \\\{
        steps \\\{
            script \\\{
                docker.image('returntocorp/semgrep').inside \\\{
                    sh 'semgrep --config=auto --json --output=semgrep-results.json .'
                    sh 'semgrep --config=auto --sarif --output=semgrep-results.sarif .'
                \\\}
            \\\}
        \\\}
        post \\\{
            always \\\{
                archiveArtifacts artifacts: 'semgrep-results.*', fingerprint: true

                // Parse results and fail build if high severity issues found
                script \\\{
                    def results = readJSON file: 'semgrep-results.json'
                    def errors = results.results.findAll \\\{ it.extra.severity == 'ERROR' \\\}

                    if (errors.size() >`` 0) \\\\{
                        currentBuild.result = 'FAILURE'
                        error("Found $\\\\{errors.size()\\\\} high severity security issues")
                    \\\\}
                \\\\}
            \\\\}
        \\\\}
    \\\\}
\\\\}

\\} ```_

Azure DevOs

```yaml

azure-pipelines.yml

trigger: - main

pool: vmImage: 'ubuntu-latest'

container: returntocorp/semgrep

steps: - checkout: self

  • script: | semgrep --config=auto --json --output=$(Agent.TempDirectory)/semgrep-results.json . semgrep --config=auto --sarif --output=$(Agent.TempDirectory)/semgrep-results.sarif . displayName: 'Run Semgrep Security Scan'

  • task: PublishTestResults@2 inputs: testResultsFormat: 'JUnit' testResultsFiles: '$(Agent.TempDirectory)/semgrep-results.sarif' testRunTitle: 'Semgrep Security Scan' condition: always() ```_

Precommit Hook

```yaml

.pre-commit-config.yaml

repos: - repo: https://github.com/returntocorp/semgrep rev: 'v1.45.0' hooks: - id: semgrep args: ['--config=auto', '--error'] ```_

Sprach-spezifische Nutzung

Python Projekte

```bash

Python security scan

semgrep --config=p/python --config=p/flask --config=p/django .

Python-specific rules

semgrep --config=p/bandit . semgrep --config=p/secrets .

Custom Python rules

cat > python-rules.yml << 'EOF' rules: - id: flask-debug-mode pattern: app.run(debug=True) message: Flask debug mode should not be enabled in production languages: [python] severity: ERROR

  • id: django-debug-setting pattern: DEBUG = True message: Django DEBUG should be False in production languages: [python] severity: ERROR EOF

semgrep --config=python-rules.yml . ```_

JavaScript/TypScript Projekte

```bash

JavaScript security scan

semgrep --config=p/javascript --config=p/typescript .

Framework-specific scans

semgrep --config=p/react . semgrep --config=p/express . semgrep --config=p/nodejs .

Custom JavaScript rules

cat > js-rules.yml << 'EOF' rules: - id: eval-usage pattern-either: - pattern: eval($X) - pattern: Function($X) message: Avoid using eval() or Function() constructor languages: [javascript, typescript] severity: ERROR

  • id: innerHTML-xss pattern: $EL.innerHTML = $VAR message: Potential XSS vulnerability with innerHTML languages: [javascript, typescript] severity: WARNING EOF

semgrep --config=js-rules.yml . ```_

Java Projekte

```bash

Java security scan

semgrep --config=p/java . semgrep --config=p/spring .

Custom Java rules

cat > java-rules.yml << 'EOF' rules: - id: sql-injection-java pattern:| Statement $STMT = ...; ... $STMT.executeQuery($QUERY + ...) message: Potential SQL injection vulnerability languages: [java] severity: ERROR

  • id: hardcoded-password-java pattern:| String $VAR = "..."; metavariable-regex: metavariable: $VAR | regex: (?i)(password | passwd | pwd) | message: Hardcoded password detected languages: [java] severity: ERROR EOF

semgrep --config=java-rules.yml . ```_

Automatisierung und Schrift

Automatischer Sicherheitsscanner

```python

!/usr/bin/env python3

semgrep_scanner.py

import subprocess import json import sys import argparse from pathlib import Path

class SemgrepScanner: def init(self, project_path, config='auto'): self.project_path = Path(project_path) self.config = config self.results = \\{\\}

def run_scan(self, output_format='json', severity_filter=None):
    """Run Semgrep scan with specified parameters"""
    cmd = [
        'semgrep',
        '--config', self.config,
        f'--\\\\{output_format\\\\}',
        str(self.project_path)
    ]

    if severity_filter:
        cmd.extend(['--severity', severity_filter])

    try:
        result = subprocess.run(cmd, capture_output=True, text=True, check=False)

        if output_format == 'json':
            self.results = json.loads(result.stdout) if result.stdout else \\\\{\\\\}
        else:
            self.results = result.stdout

        return result.returncode == 0

    except subprocess.CalledProcessError as e:
        print(f"Error running Semgrep: \\\\{e\\\\}")
        return False
    except json.JSONDecodeError as e:
        print(f"Error parsing JSON output: \\\\{e\\\\}")
        return False

def get_summary(self):
    """Get scan summary"""
    if not isinstance(self.results, dict):
        return "No results available"

    findings = self.results.get('results', [])

    summary = \\\\{
        'total_findings': len(findings),
        'error_count': len([f for f in findings if f.get('extra', \\\\{\\\\}).get('severity') == 'ERROR']),
        'warning_count': len([f for f in findings if f.get('extra', \\\\{\\\\}).get('severity') == 'WARNING']),
        'info_count': len([f for f in findings if f.get('extra', \\\\{\\\\}).get('severity') == 'INFO'])
    \\\\}

    return summary

def get_findings_by_severity(self, severity='ERROR'):
    """Get findings filtered by severity"""
    if not isinstance(self.results, dict):
        return []

    findings = self.results.get('results', [])
    return [f for f in findings if f.get('extra', \\\\{\\\\}).get('severity') == severity]

def get_findings_by_rule(self):
    """Group findings by rule ID"""
    if not isinstance(self.results, dict):
        return \\\\{\\\\}

    findings = self.results.get('results', [])
    by_rule = \\\\{\\\\}

    for finding in findings:
        rule_id = finding.get('check_id', 'unknown')
        if rule_id not in by_rule:
            by_rule[rule_id] = []
        by_rule[rule_id].append(finding)

    return by_rule

def save_results(self, output_file='semgrep_results.json'):
    """Save results to file"""
    if isinstance(self.results, dict):
        with open(output_file, 'w') as f:
            json.dump(self.results, f, indent=2)
    else:
        with open(output_file, 'w') as f:
            f.write(str(self.results))

def generate_report(self, output_file='semgrep_report.html'):
    """Generate HTML report"""
    cmd = [
        'semgrep',
        '--config', self.config,
        '--output', output_file,
        str(self.project_path)
    ]

    try:
        subprocess.run(cmd, check=True)
        return True
    except subprocess.CalledProcessError:
        return False

def main(): parser = argparse.ArgumentParser(description='Automated Semgrep Scanner') parser.add_argument('project_path', help='Path to project to scan') parser.add_argument('--config', default='auto', help='Semgrep configuration') parser.add_argument('--severity', choices=['ERROR', 'WARNING', 'INFO'], help='Filter by severity level') parser.add_argument('--output', help='Output file for results') parser.add_argument('--format', default='json', choices=['json', 'sarif', 'text'], help='Output format')

args = parser.parse_args()

scanner = SemgrepScanner(args.project_path, args.config)

print(f"Scanning \\\\{args.project_path\\\\} with config \\\\{args.config\\\\}...")
success = scanner.run_scan(output_format=args.format, severity_filter=args.severity)

if success:
    if args.format == 'json':
        summary = scanner.get_summary()
        print(f"Scan completed successfully!")
        print(f"Total findings: \\\\{summary['total_findings']\\\\}")
        print(f"Errors: \\\\{summary['error_count']\\\\}")
        print(f"Warnings: \\\\{summary['warning_count']\\\\}")
        print(f"Info: \\\\{summary['info_count']\\\\}")

        # Show top issues by rule
        by_rule = scanner.get_findings_by_rule()
        if by_rule:
            print("\nTop issues by rule:")
            sorted_rules = sorted(by_rule.items(), key=lambda x: len(x[1]), reverse=True)
            for rule_id, findings in sorted_rules[:5]:
                print(f"  \\\\{rule_id\\\\}: \\\\{len(findings)\\\\} findings")

    if args.output:
        scanner.save_results(args.output)
        print(f"Results saved to \\\\{args.output\\\\}")

    # Exit with error code if high severity issues found
    if args.format == 'json':
        summary = scanner.get_summary()
        if summary['error_count'] > 0:
            print(f"Found \\\\{summary['error_count']\\\\} high severity issues!")
            sys.exit(1)
else:
    print("Scan failed!")
    sys.exit(1)

if name == 'main': main() ```_

Batch Processing Script

```bash

!/bin/bash

batch_semgrep_scan.sh

Configuration

PROJECTS_DIR="/path/to/projects" REPORTS_DIR="/path/to/reports" CONFIG="auto" DATE=$(date +%Y%m%d_%H%M%S)

Create reports directory

mkdir -p "$REPORTS_DIR"

Function to scan project

scan_project() \\{ local project_path="$1" local project_name=$(basename "$project_path") local report_file="$REPORTS_DIR/$\\{project_name\\}$\\{DATE\\}.json" local sarif_report="$REPORTS_DIR/$\\{project_name\\}$\\{DATE\\}.sarif"

echo "Scanning $project_name..."

# Run Semgrep scan
semgrep --config="$CONFIG" --json --output="$report_file" "$project_path"
semgrep --config="$CONFIG" --sarif --output="$sarif_report" "$project_path"

# Check for high severity issues
if [ -f "$report_file" ]; then

| error_count=$(jq '[.results[] | select(.extra.severity == "ERROR")] | length' "$report_file" 2>/dev/null | | echo "0") |

    if [ "$error_count" -gt 0 ]; then
        echo "WARNING: $project_name has $error_count high severity issues!"
        echo "$project_name" >> "$REPORTS_DIR/high_severity_projects.txt"
    fi
fi

echo "Scan completed for $project_name"

\\}

Find and scan all projects

find "$PROJECTS_DIR" -maxdepth 1 -type d|while read -r project_dir; do if [ "$project_dir" != "$PROJECTS_DIR" ]; then scan_project "$project_dir" fi done

echo "Batch scanning completed. Reports saved to $REPORTS_DIR"

Generate summary report

echo "=== Batch Scan Summary ===" > "$REPORTS_DIR/summary_$\\{DATE\\}.txt" echo "Scan Date: $(date)" >> "$REPORTS_DIR/summary_$\\{DATE\\}.txt" echo "Configuration: $CONFIG" >> "$REPORTS_DIR/summary_$\\{DATE\\}.txt" echo "Total projects scanned: $(find "$REPORTS_DIR" -name "*$\\{DATE\\}.json"|wc -l)" >> "$REPORTS_DIR/summary$\\{DATE\\}.txt"

if [ -f "$REPORTS_DIR/high_severity_projects.txt" ]; then echo "High severity projects: $(wc -l < "$REPORTS_DIR/high_severity_projects.txt")" >> "$REPORTS_DIR/summary_$\\{DATE\\}.txt" fi ```_

Best Practices

Regelverwaltung

```yaml

.semgrep.yml - Project configuration

rules: # Security rules - p/security-audit - p/owasp-top-ten - p/secrets

# Language-specific rules - p/python - p/javascript

# Custom rules - rules/custom-security.yml - rules/custom-performance.yml

exclude: - "/tests/" - "/test/" - "/.venv/" - "/venv/" - "/node_modules/" - "/vendor/" - ".min.js" - ".min.css"

severity: - ERROR - WARNING ```_

Artikel 1

```yaml

rules/custom-security.yml

rules: - id: custom-jwt-secret pattern-either: - pattern: jwt.encode($PAYLOAD, "...", ...) - pattern: jwt.decode($TOKEN, "...", ...) message: | JWT secret should not be hardcoded. Use environment variables or secure configuration. languages: [python] severity: ERROR metadata: category: security cwe: "CWE-798" owasp: "A02:2021" confidence: HIGH fix-regex: regex: '"[^"]*"' replacement: 'os.environ.get("JWT_SECRET")' ```_

Leistungsoptimierung

```bash

Optimize for large codebases

semgrep --config=auto --max-target-bytes=1000000 .

Use specific rules instead of auto

semgrep --config=p/security-audit --config=p/owasp-top-ten .

Exclude unnecessary files

semgrep --config=auto --exclude="/node_modules/" --exclude="/vendor/" .

Parallel processing

semgrep --config=auto --jobs=4 . ```_

Fehlerbehebung

Gemeinsame Themen

```bash

Issue: Semgrep running slowly

Solution: Exclude large directories and use specific rules

semgrep --config=p/security-audit --exclude="/node_modules/" .

Issue: Too many false positives

Solution: Use higher confidence rules and custom exclusions

semgrep --config=p/security-audit --exclude="/tests/" .

Issue: Missing language support

Solution: Check supported languages and update Semgrep

semgrep --version pip install --upgrade semgrep

Issue: Custom rules not working

Solution: Validate rule syntax

semgrep --validate rules/custom.yml ```_

Debug Mode

```bash

Verbose output

semgrep --config=auto --verbose .

Debug mode

semgrep --config=auto --debug .

Dry run (validate rules without scanning)

semgrep --config=auto --dryrun .

Test specific rule

semgrep --config=rules/custom.yml --test . ```_

Ressourcen

--

*Dieses Betrugsblatt bietet umfassende Anleitung für die Verwendung von Semgrep, um Sicherheitslücken zu finden und Code Standards durchzusetzen. Regelmäßige Regelaktualisierungen und benutzerdefinierte Regelentwicklung erhöhen die Sicherheitsabdeckung. *