Saltar a contenido

Herramienta de análisis estadístico de semgrep

"Clase de la hoja" idbutton id="semgrep-copy-btn" class="copy-btn" onclick="copyAllCommands()" Copiar todos los comandos id="semgrep-pdf-btn" class="pdf-btn" onclick="generatePDF()" Generar PDF seleccionado/button ■/div titulada

Sinopsis

Semgrep es una herramienta de análisis estática de código abierto y rápido para encontrar errores, vulnerabilidades de seguridad y cumplimiento de estándares de código en varios idiomas de programación. Utiliza el análisis basado en patrones con una sintaxis sencilla e intuitiva que permite a los desarrolladores escribir reglas personalizadas fácilmente. Semgrep es particularmente valioso en los oleoductos DevSecOps por su velocidad, precisión y amplia biblioteca de reglas que abarcan cuestiones de seguridad, corrección y rendimiento.

NOVEDAD Nota: Semgrep está diseñado para el análisis estático basado en patrones y puede requerir reglas personalizadas para requisitos de seguridad específicos para la organización. Debe integrarse en los oleoductos CI/CD para la vigilancia continua de la seguridad.

Instalación

Usando pip (Recomendado)

# Install Semgrep
pip install semgrep

# Install with specific version
pip install semgrep==1.45.0

# Install from source
pip install git+https://github.com/returntocorp/semgrep.git

# Verify installation
semgrep --version

Utilizando Homebrew (macOS)

# Install Semgrep
brew install semgrep

# Update Semgrep
brew upgrade semgrep

Usando Docker

# Pull Semgrep image
docker pull returntocorp/semgrep

# Run Semgrep in container
docker run --rm -v $(pwd):/src returntocorp/semgrep --config=auto /src

# Create alias for convenience
alias semgrep='docker run --rm -v $(pwd):/src returntocorp/semgrep'

# Build custom image
cat > Dockerfile ``<< 'EOF'
FROM returntocorp/semgrep
WORKDIR /src
ENTRYPOINT ["semgrep"]
EOF

docker build -t custom-semgrep .

Administradores de paquetes

# Ubuntu/Debian (via pip)
sudo apt update
sudo apt install python3-pip
pip3 install semgrep

# CentOS/RHEL/Fedora
sudo dnf install python3-pip
pip3 install semgrep

# Arch Linux
sudo pacman -S python-pip
pip install semgrep

Instalación binaria

# Download binary (Linux)
curl -L https://github.com/returntocorp/semgrep/releases/latest/download/semgrep-linux-x86_64 -o semgrep
chmod +x semgrep
sudo mv semgrep /usr/local/bin/

# Download binary (macOS)
curl -L https://github.com/returntocorp/semgrep/releases/latest/download/semgrep-macos-x86_64 -o semgrep
chmod +x semgrep
sudo mv semgrep /usr/local/bin/

Uso básico

Inicio rápido

# Scan with auto-configuration (recommended for beginners)
semgrep --config=auto .

# Scan specific directory
semgrep --config=auto /path/to/project

# Scan single file
semgrep --config=auto file.py

# Scan with specific ruleset
semgrep --config=p/security-audit .
semgrep --config=p/owasp-top-ten .
semgrep --config=p/cwe-top-25 .

# Scan with multiple rulesets
semgrep --config=p/security-audit --config=p/owasp-top-ten .

Formatos de salida

# Default text output
semgrep --config=auto .

# JSON output
semgrep --config=auto --json .

# SARIF output (for GitHub integration)
semgrep --config=auto --sarif .

# JUnit XML output
semgrep --config=auto --junit-xml .

# Emacs output format
semgrep --config=auto --emacs .

# Vim output format
semgrep --config=auto --vim .

# Save output to file
semgrep --config=auto --json --output=results.json .
semgrep --config=auto --sarif --output=results.sarif .

Filtración y destino

# Include specific file patterns
semgrep --config=auto --include="*.py" .
semgrep --config=auto --include="*.js" --include="*.ts" .

# Exclude specific file patterns
semgrep --config=auto --exclude="*test*" .
semgrep --config=auto --exclude="node_modules" --exclude="vendor" .

# Scan specific languages
semgrep --config=auto --lang=python .
semgrep --config=auto --lang=javascript .
semgrep --config=auto --lang=java .

# Severity filtering
semgrep --config=auto --severity=ERROR .
semgrep --config=auto --severity=WARNING .
semgrep --config=auto --severity=INFO .

Configuración de reglas

Utilizando las reglas incorporadas

# Security-focused rulesets
semgrep --config=p/security-audit .
semgrep --config=p/owasp-top-ten .
semgrep --config=p/cwe-top-25 .
semgrep --config=p/secrets .

# Language-specific rulesets
semgrep --config=p/python .
semgrep --config=p/javascript .
semgrep --config=p/java .
semgrep --config=p/go .

# Framework-specific rulesets
semgrep --config=p/django .
semgrep --config=p/flask .
semgrep --config=p/react .
semgrep --config=p/express .

# Code quality rulesets
semgrep --config=p/code-quality .
semgrep --config=p/performance .
semgrep --config=p/correctness .

# List available rulesets
semgrep --config=p/

Normas de aduana

# custom-rules.yml
rules:
  - id: hardcoded-password
    pattern: password = "..."
    message: Hardcoded password detected
    languages: [python]
    severity: ERROR

  - id: sql-injection
    pattern-either:
      - pattern: cursor.execute("..." + $VAR)
      - pattern: cursor.execute(f"...\\\{$VAR\\\}...")
    message: Potential SQL injection vulnerability
    languages: [python]
    severity: ERROR

  - id: unsafe-yaml-load
    pattern: yaml.load($DATA)
    message: Use yaml.safe_load() instead of yaml.load()
    languages: [python]
    severity: WARNING
    fix: yaml.safe_load($DATA)

  - id: missing-csrf-protection
    pattern:|
      class $CLASS(...):
        ...
        def post(self, ...):
          ...
    pattern-not:|
      class $CLASS(...):
        ...
        @csrf_exempt
        def post(self, ...):
          ...
    message: POST method missing CSRF protection
    languages: [python]
    severity: ERROR

Ejemplos de sintaxis de reglas

# Pattern matching
rules:
  - id: basic-pattern
    pattern: eval($X)
    message: Avoid using eval()
    languages: [python]
    severity: ERROR

  - id: pattern-either
    pattern-either:
      - pattern: exec($X)
      - pattern: eval($X)
    message: Avoid using exec() or eval()
    languages: [python]
    severity: ERROR

  - id: pattern-inside
    pattern-inside:|
      def $FUNC(...):
        ...
    pattern: return $X
    message: Function returns value
    languages: [python]
    severity: INFO

  - id: pattern-not
    pattern: requests.get($URL)
    pattern-not: requests.get($URL, verify=True)
    message: HTTPS request without certificate verification
    languages: [python]
    severity: WARNING

  - id: metavariable-regex
    pattern: $FUNC($ARG)
    metavariable-regex:
      metavariable: $FUNC
      regex: ^(exec|eval)$
    message: Dangerous function call
    languages: [python]
    severity: ERROR

Uso avanzado

Archivos de configuración

# .semgrep.yml
rules:
  - rules/security
  - rules/performance

exclude:
  - "*/tests/*"
  - "*/node_modules/*"
  - "*/vendor/*"
  - "*.min.js"

include:
  - "*.py"
  - "*.js"
  - "*.java"
  - "*.go"

severity:
  - ERROR
  - WARNING

Custom Rule Development

# advanced-rules.yml
rules:
  - id: jwt-hardcoded-secret
    pattern-either:
      - pattern: jwt.encode($PAYLOAD, "...", ...)
      - pattern: jwt.decode($TOKEN, "...", ...)
    message: JWT secret should not be hardcoded
    languages: [python]
    severity: ERROR
    metadata:
      cwe: "CWE-798: Use of Hard-coded Credentials"
      owasp: "A02:2021  Cryptographic Failures"

  - id: unsafe-deserialization
    pattern-either:
      - pattern: pickle.loads($DATA)
      - pattern: pickle.load($FILE)
      - pattern: cPickle.loads($DATA)
    message: Unsafe deserialization with pickle
    languages: [python]
    severity: ERROR
    metadata:
      cwe: "CWE-502: Deserialization of Untrusted Data"

  - id: command-injection
    pattern-either:
      - pattern: os.system($CMD)
      - pattern: subprocess.call($CMD, shell=True)
      - pattern: subprocess.run($CMD, shell=True)
    pattern-not-inside:|
      $CMD = "..."
    message: Potential command injection vulnerability
    languages: [python]
    severity: ERROR
    fix-regex:
      regex: 'shell=True'
      replacement: 'shell=False'

Taint Analysis

# taint-rules.yml
rules:
  - id: user-input-to-sql
    mode: taint
    pattern-sources:
      - pattern: request.args.get(...)
      - pattern: request.form.get(...)
      - pattern: request.json.get(...)
    pattern-sinks:
      - pattern: cursor.execute($QUERY)
      - pattern: db.execute($QUERY)
    message: User input flows to SQL query
    languages: [python]
    severity: ERROR

  - id: user-input-to-eval
    mode: taint
    pattern-sources:
      - pattern: input(...)
      - pattern: sys.argv[...]
    pattern-sinks:
      - pattern: eval($CODE)
      - pattern: exec($CODE)
    message: User input flows to code execution
    languages: [python]
    severity: ERROR

CI/CD Integration

GitHub Actions

# .github/workflows/semgrep.yml
name: Semgrep Security Scan

on:
  push:
    branches: [ main, develop ]
  pull_request:
    branches: [ main ]

jobs:
  semgrep:
    name: Scan
    runs-on: ubuntu-latest

    container:
      image: returntocorp/semgrep

    steps:
    - uses: actions/checkout@v3

    - name: Run Semgrep
      run:|
        semgrep \
          --config=auto \
          --sarif \
          --output=semgrep-results.sarif \
          .

    - name: Upload SARIF file
      uses: github/codeql-action/upload-sarif@v2
      with:
        sarif_file: semgrep-results.sarif
      if: always()

    - name: Upload results
      uses: actions/upload-artifact@v3
      with:
        name: semgrep-report
        path: semgrep-results.sarif

GitLab CI

# .gitlab-ci.yml
stages:
  - security

semgrep:
  stage: security
  image: returntocorp/semgrep
  script:
    - semgrep --config=auto --json --output=semgrep-report.json .
  artifacts:
    reports:
      sast: semgrep-report.json
    paths:
      - semgrep-report.json
    expire_in: 1 week
  allow_failure: true

Jenkins Pipeline

// Jenkinsfile
pipeline \\\{
    agent any

    stages \\\{
        stage('Security Scan') \\\{
            steps \\\{
                script \\\{
                    docker.image('returntocorp/semgrep').inside \\\{
                        sh 'semgrep --config=auto --json --output=semgrep-results.json .'
                        sh 'semgrep --config=auto --sarif --output=semgrep-results.sarif .'
                    \\\}
                \\\}
            \\\}
            post \\\{
                always \\\{
                    archiveArtifacts artifacts: 'semgrep-results.*', fingerprint: true

                    // Parse results and fail build if high severity issues found
                    script \\\{
                        def results = readJSON file: 'semgrep-results.json'
                        def errors = results.results.findAll \\\{ it.extra.severity == 'ERROR' \\\}

                        if (errors.size() >`` 0) \\\\{
                            currentBuild.result = 'FAILURE'
                            error("Found $\\\\{errors.size()\\\\} high severity security issues")
                        \\\\}
                    \\\\}
                \\\\}
            \\\\}
        \\\\}
    \\\\}
\\\\}

Azure DevOps

# azure-pipelines.yml
trigger:
- main

pool:
  vmImage: 'ubuntu-latest'

container: returntocorp/semgrep

steps:
- checkout: self

- script:|
    semgrep --config=auto --json --output=$(Agent.TempDirectory)/semgrep-results.json .
    semgrep --config=auto --sarif --output=$(Agent.TempDirectory)/semgrep-results.sarif .
  displayName: 'Run Semgrep Security Scan'

- task: PublishTestResults@2
  inputs:
    testResultsFormat: 'JUnit'
    testResultsFiles: '$(Agent.TempDirectory)/semgrep-results.sarif'
    testRunTitle: 'Semgrep Security Scan'
  condition: always()

Pre-commit Hook

# .pre-commit-config.yaml
repos:
  - repo: https://github.com/returntocorp/semgrep
    rev: 'v1.45.0'
    hooks:
      - id: semgrep
        args: ['--config=auto', '--error']

Uso del lenguaje-específico

Python Projects

# Python security scan
semgrep --config=p/python --config=p/flask --config=p/django .

# Python-specific rules
semgrep --config=p/bandit .
semgrep --config=p/secrets .

# Custom Python rules
cat > python-rules.yml << 'EOF'
rules:
  - id: flask-debug-mode
    pattern: app.run(debug=True)
    message: Flask debug mode should not be enabled in production
    languages: [python]
    severity: ERROR

  - id: django-debug-setting
    pattern: DEBUG = True
    message: Django DEBUG should be False in production
    languages: [python]
    severity: ERROR
EOF

semgrep --config=python-rules.yml .

JavaScript/TypeScript Proyectos

# JavaScript security scan
semgrep --config=p/javascript --config=p/typescript .

# Framework-specific scans
semgrep --config=p/react .
semgrep --config=p/express .
semgrep --config=p/nodejs .

# Custom JavaScript rules
cat > js-rules.yml << 'EOF'
rules:
  - id: eval-usage
    pattern-either:
      - pattern: eval($X)
      - pattern: Function($X)
    message: Avoid using eval() or Function() constructor
    languages: [javascript, typescript]
    severity: ERROR

  - id: innerHTML-xss
    pattern: $EL.innerHTML = $VAR
    message: Potential XSS vulnerability with innerHTML
    languages: [javascript, typescript]
    severity: WARNING
EOF

semgrep --config=js-rules.yml .

Java Projects

# Java security scan
semgrep --config=p/java .
semgrep --config=p/spring .

# Custom Java rules
cat > java-rules.yml << 'EOF'
rules:
  - id: sql-injection-java
    pattern:|
      Statement $STMT = ...;
      ...
      $STMT.executeQuery($QUERY + ...)
    message: Potential SQL injection vulnerability
    languages: [java]
    severity: ERROR

  - id: hardcoded-password-java
    pattern:|
      String $VAR = "...";
    metavariable-regex:
      metavariable: $VAR
      regex: (?i)(password|passwd|pwd)
    message: Hardcoded password detected
    languages: [java]
    severity: ERROR
EOF

semgrep --config=java-rules.yml .

Automatización y scripting

Escáner de seguridad automatizado

#!/usr/bin/env python3
# semgrep_scanner.py

import subprocess
import json
import sys
import argparse
from pathlib import Path

class SemgrepScanner:
    def __init__(self, project_path, config='auto'):
        self.project_path = Path(project_path)
        self.config = config
        self.results = \\\\{\\\\}

    def run_scan(self, output_format='json', severity_filter=None):
        """Run Semgrep scan with specified parameters"""
        cmd = [
            'semgrep',
            '--config', self.config,
            f'--\\\\{output_format\\\\}',
            str(self.project_path)
        ]

        if severity_filter:
            cmd.extend(['--severity', severity_filter])

        try:
            result = subprocess.run(cmd, capture_output=True, text=True, check=False)

            if output_format == 'json':
                self.results = json.loads(result.stdout) if result.stdout else \\\\{\\\\}
            else:
                self.results = result.stdout

            return result.returncode == 0

        except subprocess.CalledProcessError as e:
            print(f"Error running Semgrep: \\\\{e\\\\}")
            return False
        except json.JSONDecodeError as e:
            print(f"Error parsing JSON output: \\\\{e\\\\}")
            return False

    def get_summary(self):
        """Get scan summary"""
        if not isinstance(self.results, dict):
            return "No results available"

        findings = self.results.get('results', [])

        summary = \\\\{
            'total_findings': len(findings),
            'error_count': len([f for f in findings if f.get('extra', \\\\{\\\\}).get('severity') == 'ERROR']),
            'warning_count': len([f for f in findings if f.get('extra', \\\\{\\\\}).get('severity') == 'WARNING']),
            'info_count': len([f for f in findings if f.get('extra', \\\\{\\\\}).get('severity') == 'INFO'])
        \\\\}

        return summary

    def get_findings_by_severity(self, severity='ERROR'):
        """Get findings filtered by severity"""
        if not isinstance(self.results, dict):
            return []

        findings = self.results.get('results', [])
        return [f for f in findings if f.get('extra', \\\\{\\\\}).get('severity') == severity]

    def get_findings_by_rule(self):
        """Group findings by rule ID"""
        if not isinstance(self.results, dict):
            return \\\\{\\\\}

        findings = self.results.get('results', [])
        by_rule = \\\\{\\\\}

        for finding in findings:
            rule_id = finding.get('check_id', 'unknown')
            if rule_id not in by_rule:
                by_rule[rule_id] = []
            by_rule[rule_id].append(finding)

        return by_rule

    def save_results(self, output_file='semgrep_results.json'):
        """Save results to file"""
        if isinstance(self.results, dict):
            with open(output_file, 'w') as f:
                json.dump(self.results, f, indent=2)
        else:
            with open(output_file, 'w') as f:
                f.write(str(self.results))

    def generate_report(self, output_file='semgrep_report.html'):
        """Generate HTML report"""
        cmd = [
            'semgrep',
            '--config', self.config,
            '--output', output_file,
            str(self.project_path)
        ]

        try:
            subprocess.run(cmd, check=True)
            return True
        except subprocess.CalledProcessError:
            return False

def main():
    parser = argparse.ArgumentParser(description='Automated Semgrep Scanner')
    parser.add_argument('project_path', help='Path to project to scan')
    parser.add_argument('--config', default='auto', help='Semgrep configuration')
    parser.add_argument('--severity', choices=['ERROR', 'WARNING', 'INFO'],
                       help='Filter by severity level')
    parser.add_argument('--output', help='Output file for results')
    parser.add_argument('--format', default='json',
                       choices=['json', 'sarif', 'text'],
                       help='Output format')

    args = parser.parse_args()

    scanner = SemgrepScanner(args.project_path, args.config)

    print(f"Scanning \\\\{args.project_path\\\\} with config \\\\{args.config\\\\}...")
    success = scanner.run_scan(output_format=args.format, severity_filter=args.severity)

    if success:
        if args.format == 'json':
            summary = scanner.get_summary()
            print(f"Scan completed successfully!")
            print(f"Total findings: \\\\{summary['total_findings']\\\\}")
            print(f"Errors: \\\\{summary['error_count']\\\\}")
            print(f"Warnings: \\\\{summary['warning_count']\\\\}")
            print(f"Info: \\\\{summary['info_count']\\\\}")

            # Show top issues by rule
            by_rule = scanner.get_findings_by_rule()
            if by_rule:
                print("\nTop issues by rule:")
                sorted_rules = sorted(by_rule.items(), key=lambda x: len(x[1]), reverse=True)
                for rule_id, findings in sorted_rules[:5]:
                    print(f"  \\\\{rule_id\\\\}: \\\\{len(findings)\\\\} findings")

        if args.output:
            scanner.save_results(args.output)
            print(f"Results saved to \\\\{args.output\\\\}")

        # Exit with error code if high severity issues found
        if args.format == 'json':
            summary = scanner.get_summary()
            if summary['error_count'] > 0:
                print(f"Found \\\\{summary['error_count']\\\\} high severity issues!")
                sys.exit(1)
    else:
        print("Scan failed!")
        sys.exit(1)

if __name__ == '__main__':
    main()

Batch Processing Script

#!/bin/bash
# batch_semgrep_scan.sh

# Configuration
PROJECTS_DIR="/path/to/projects"
REPORTS_DIR="/path/to/reports"
CONFIG="auto"
DATE=$(date +%Y%m%d_%H%M%S)

# Create reports directory
mkdir -p "$REPORTS_DIR"

# Function to scan project
scan_project() \\\\{
    local project_path="$1"
    local project_name=$(basename "$project_path")
    local report_file="$REPORTS_DIR/$\\\\{project_name\\\\}_$\\\\{DATE\\\\}.json"
    local sarif_report="$REPORTS_DIR/$\\\\{project_name\\\\}_$\\\\{DATE\\\\}.sarif"

    echo "Scanning $project_name..."

    # Run Semgrep scan
    semgrep --config="$CONFIG" --json --output="$report_file" "$project_path"
    semgrep --config="$CONFIG" --sarif --output="$sarif_report" "$project_path"

    # Check for high severity issues
    if [ -f "$report_file" ]; then
        error_count=$(jq '[.results[]|select(.extra.severity == "ERROR")]|length' "$report_file" 2>/dev/null||echo "0")

        if [ "$error_count" -gt 0 ]; then
            echo "WARNING: $project_name has $error_count high severity issues!"
            echo "$project_name" >> "$REPORTS_DIR/high_severity_projects.txt"
        fi
    fi

    echo "Scan completed for $project_name"
\\\\}

# Find and scan all projects
find "$PROJECTS_DIR" -maxdepth 1 -type d|while read -r project_dir; do
    if [ "$project_dir" != "$PROJECTS_DIR" ]; then
        scan_project "$project_dir"
    fi
done

echo "Batch scanning completed. Reports saved to $REPORTS_DIR"

# Generate summary report
echo "=== Batch Scan Summary ===" > "$REPORTS_DIR/summary_$\\\\{DATE\\\\}.txt"
echo "Scan Date: $(date)" >> "$REPORTS_DIR/summary_$\\\\{DATE\\\\}.txt"
echo "Configuration: $CONFIG" >> "$REPORTS_DIR/summary_$\\\\{DATE\\\\}.txt"
echo "Total projects scanned: $(find "$REPORTS_DIR" -name "*_$\\\\{DATE\\\\}.json"|wc -l)" >> "$REPORTS_DIR/summary_$\\\\{DATE\\\\}.txt"

if [ -f "$REPORTS_DIR/high_severity_projects.txt" ]; then
    echo "High severity projects: $(wc -l < "$REPORTS_DIR/high_severity_projects.txt")" >> "$REPORTS_DIR/summary_$\\\\{DATE\\\\}.txt"
fi

Buenas prácticas

Gestión de las normas

# .semgrep.yml - Project configuration
rules:
  # Security rules
  - p/security-audit
  - p/owasp-top-ten
  - p/secrets

  # Language-specific rules
  - p/python
  - p/javascript

  # Custom rules
  - rules/custom-security.yml
  - rules/custom-performance.yml

exclude:
  - "*/tests/*"
  - "*/test/*"
  - "*/.venv/*"
  - "*/venv/*"
  - "*/node_modules/*"
  - "*/vendor/*"
  - "*.min.js"
  - "*.min.css"

severity:
  - ERROR
  - WARNING

Custom Rule Development

# rules/custom-security.yml
rules:
  - id: custom-jwt-secret
    pattern-either:
      - pattern: jwt.encode($PAYLOAD, "...", ...)
      - pattern: jwt.decode($TOKEN, "...", ...)
    message:|
      JWT secret should not be hardcoded. Use environment variables or secure configuration.
    languages: [python]
    severity: ERROR
    metadata:
      category: security
      cwe: "CWE-798"
      owasp: "A02:2021"
      confidence: HIGH
    fix-regex:
      regex: '"[^"]*"'
      replacement: 'os.environ.get("JWT_SECRET")'

Optimización del rendimiento

# Optimize for large codebases
semgrep --config=auto --max-target-bytes=1000000 .

# Use specific rules instead of auto
semgrep --config=p/security-audit --config=p/owasp-top-ten .

# Exclude unnecessary files
semgrep --config=auto --exclude="*/node_modules/*" --exclude="*/vendor/*" .

# Parallel processing
semgrep --config=auto --jobs=4 .

Solución de problemas

Cuestiones comunes

# Issue: Semgrep running slowly
# Solution: Exclude large directories and use specific rules
semgrep --config=p/security-audit --exclude="*/node_modules/*" .

# Issue: Too many false positives
# Solution: Use higher confidence rules and custom exclusions
semgrep --config=p/security-audit --exclude="*/tests/*" .

# Issue: Missing language support
# Solution: Check supported languages and update Semgrep
semgrep --version
pip install --upgrade semgrep

# Issue: Custom rules not working
# Solution: Validate rule syntax
semgrep --validate rules/custom.yml

Modo de depuración

# Verbose output
semgrep --config=auto --verbose .

# Debug mode
semgrep --config=auto --debug .

# Dry run (validate rules without scanning)
semgrep --config=auto --dryrun .

# Test specific rule
semgrep --config=rules/custom.yml --test .

Recursos

-...

*Esta hoja de trampa proporciona una guía completa para usar Semgrep para encontrar vulnerabilidades de seguridad y aplicar estándares de código. Las actualizaciones regulares de reglas y el desarrollo personalizado de reglas aumentan la cobertura de seguridad. *