Saltar a contenido

Semgrep Static Analysis Tool hoja de trucos

Overview

Semgrep is a fast, open-source static analysis tool for finding bugs, security vulnerabilities, and enforcing code standards across multiple programming languages. It uses pattern-based analysis with a simple, intuitive sintaxis that allows developers to write custom rules easily. Semgrep is particularly valuable in DevSecOps pipelines for its speed, accuracy, and extensive rule library covering security, correctness, and performance issues.

⚠️ Note: Semgrep is designed for pattern-based static analysis and may require custom rules for organization-specific security requirements. It should be integrated into CI/CD pipelines for continuous security monitoring.

instalación

# Install Semgrep
pip install semgrep

# Install with specific version
pip install semgrep==1.45.0

# Install from source
pip install git+https://github.com/returntocorp/semgrep.git

# Verify instalación
semgrep --version

Using Homebrew (macOS)

# Install Semgrep
brew install semgrep

# Update Semgrep
brew upgrade semgrep

Using Docker

# Pull Semgrep image
docker pull returntocorp/semgrep

# Run Semgrep in container
docker run --rm -v $(pwd):/src returntocorp/semgrep --config=auto /src

# Create alias for convenience
alias semgrep='docker run --rm -v $(pwd):/src returntocorp/semgrep'

# Build custom image
cat > Dockerfile ``<< 'EOF'
FROM returntocorp/semgrep
WORKDIR /src
ENTRYPOINT ["semgrep"]
EOF

docker build -t custom-semgrep .

Package Managers

# Ubuntu/Debian (via pip)
sudo apt update
sudo apt install python3-pip
pip3 install semgrep

# CentOS/RHEL/Fedora
sudo dnf install python3-pip
pip3 install semgrep

# Arch Linux
sudo pacman -S python-pip
pip install semgrep

Binary instalación

# Download binary (Linux)
curl -L https://github.com/returntocorp/semgrep/releases/latest/download/semgrep-linux-x86_64 -o semgrep
chmod +x semgrep
sudo mv semgrep /usr/local/bin/

# Download binary (macOS)
curl -L https://github.com/returntocorp/semgrep/releases/latest/download/semgrep-macos-x86_64 -o semgrep
chmod +x semgrep
sudo mv semgrep /usr/local/bin/

Basic uso

Quick Start

# Scan with auto-configuración (recommended for beginners)
semgrep --config=auto .

# Scan specific directory
semgrep --config=auto /path/to/project

# Scan single file
semgrep --config=auto file.py

# Scan with specific ruleset
semgrep --config=p/security-audit .
semgrep --config=p/owasp-top-ten .
semgrep --config=p/cwe-top-25 .

# Scan with multiple rulesets
semgrep --config=p/security-audit --config=p/owasp-top-ten .

Output Formats

# Default text output
semgrep --config=auto .

# JSON output
semgrep --config=auto --json .

# SARIF output (for GitHub integration)
semgrep --config=auto --sarif .

# JUnit XML output
semgrep --config=auto --junit-xml .

# Emacs output format
semgrep --config=auto --emacs .

# Vim output format
semgrep --config=auto --vim .

# Save output to file
semgrep --config=auto --json --output=results.json .
semgrep --config=auto --sarif --output=results.sarif .

Filtering and objetivoing

# Include specific file patterns
semgrep --config=auto --include="*.py" .
semgrep --config=auto --include="*.js" --include="*.ts" .

# Exclude specific file patterns
semgrep --config=auto --exclude="*test*" .
semgrep --config=auto --exclude="node_modules" --exclude="vendor" .

# Scan specific languages
semgrep --config=auto --lang=python .
semgrep --config=auto --lang=javascript .
semgrep --config=auto --lang=java .

# Severity filtering
semgrep --config=auto --severity=ERROR .
semgrep --config=auto --severity=WARNING .
semgrep --config=auto --severity=INFO .

Rule configuración

Using Built-in Rulesets

# Security-focused rulesets
semgrep --config=p/security-audit .
semgrep --config=p/owasp-top-ten .
semgrep --config=p/cwe-top-25 .
semgrep --config=p/secrets .

# Language-specific rulesets
semgrep --config=p/python .
semgrep --config=p/javascript .
semgrep --config=p/java .
semgrep --config=p/go .

# Framework-specific rulesets
semgrep --config=p/django .
semgrep --config=p/flask .
semgrep --config=p/react .
semgrep --config=p/express .

# Code quality rulesets
semgrep --config=p/code-quality .
semgrep --config=p/performance .
semgrep --config=p/correctness .

# List available rulesets
semgrep --config=p/

Custom Rules

# custom-rules.yml
rules:
  - id: hardcoded-contraseña
    pattern: contraseña = "..."
    message: Hardcoded contraseña detected
    languages: [python]
    severity: ERROR

  - id: sql-injection
    pattern-either:
      - pattern: cursor.execute("..." + $VAR)
      - pattern: cursor.execute(f"...\\\{$VAR\\\}...")
    message: Potential inyección SQL vulnerabilidad
    languages: [python]
    severity: ERROR

  - id: unsafe-yaml-load
    pattern: yaml.load($DATA)
    message: Use yaml.safe_load() instead of yaml.load()
    languages: [python]
    severity: WARNING
    fix: yaml.safe_load($DATA)

  - id: missing-csrf-protection
    pattern: |
      class $CLASS(...):
        ...
        def post(self, ...):
          ...
    pattern-not: |
      class $CLASS(...):
        ...
        @csrf_exempt
        def post(self, ...):
          ...
    message: POST method missing CSRF protection
    languages: [python]
    severity: ERROR

Rule sintaxis ejemplos

# Pattern matching
rules:
  - id: basic-pattern
    pattern: eval($X)
    message: Avoid using eval()
    languages: [python]
    severity: ERROR

  - id: pattern-either
    pattern-either:
      - pattern: exec($X)
      - pattern: eval($X)
    message: Avoid using exec() or eval()
    languages: [python]
    severity: ERROR

  - id: pattern-inside
    pattern-inside: |
      def $FUNC(...):
        ...
    pattern: return $X
    message: Function returns value
    languages: [python]
    severity: INFO

  - id: pattern-not
    pattern: requests.get($URL)
    pattern-not: requests.get($URL, verify=True)
    message: HTTPS request without certificado verification
    languages: [python]
    severity: WARNING

  - id: metavariable-regex
    pattern: $FUNC($ARG)
    metavariable-regex:
      metavariable: $FUNC
      regex: ^(exec|eval)$
    message: Dangerous function call
    languages: [python]
    severity: ERROR

Advanced uso

configuración Files

# .semgrep.yml
rules:
  - rules/security
  - rules/performance

exclude:
  - "*/tests/*"
  - "*/node_modules/*"
  - "*/vendor/*"
  - "*.min.js"

include:
  - "*.py"
  - "*.js"
  - "*.java"
  - "*.go"

severity:
  - ERROR
  - WARNING

Custom Rule Development

# advanced-rules.yml
rules:
  - id: jwt-hardcoded-secret
    pattern-either:
      - pattern: jwt.encode($payload, "...", ...)
      - pattern: jwt.decode($token, "...", ...)
    message: JWT secret should not be hardcoded
    languages: [python]
    severity: ERROR
    metadata:
      cwe: "CWE-798: Use of Hard-coded credenciales"
      owasp: "A02:2021 - Cryptographic Failures"

  - id: unsafe-deserialization
    pattern-either:
      - pattern: pickle.loads($DATA)
      - pattern: pickle.load($FILE)
      - pattern: cPickle.loads($DATA)
    message: Unsafe deserialization with pickle
    languages: [python]
    severity: ERROR
    metadata:
      cwe: "CWE-502: Deserialization of Untrusted Data"

  - id: comando-injection
    pattern-either:
      - pattern: os.system($CMD)
      - pattern: subproceso.call($CMD, shell=True)
      - pattern: subproceso.run($CMD, shell=True)
    pattern-not-inside: |
      $CMD = "..."
    message: Potential inyección de comandos vulnerabilidad
    languages: [python]
    severity: ERROR
    fix-regex:
      regex: 'shell=True'
      replacement: 'shell=False'

Taint Analysis

# taint-rules.yml
rules:
  - id: user-input-to-sql
    mode: taint
    pattern-sources:
      - pattern: request.args.get(...)
      - pattern: request.form.get(...)
      - pattern: request.json.get(...)
    pattern-sinks:
      - pattern: cursor.execute($QUERY)
      - pattern: db.execute($QUERY)
    message: User input flows to SQL query
    languages: [python]
    severity: ERROR

  - id: user-input-to-eval
    mode: taint
    pattern-sources:
      - pattern: input(...)
      - pattern: sys.argv[...]
    pattern-sinks:
      - pattern: eval($CODE)
      - pattern: exec($CODE)
    message: User input flows to code execution
    languages: [python]
    severity: ERROR

CI/CD Integration

GitHub Actions

# .github/workflows/semgrep.yml
name: Semgrep Security Scan

on:
  push:
    branches: [ main, develop ]
  pull_request:
    branches: [ main ]

jobs:
  semgrep:
    name: Scan
    runs-on: ubuntu-latest

    container:
      image: returntocorp/semgrep

    steps:
    - uses: actions/checkout@v3

    - name: Run Semgrep
      run: |
        semgrep \
          --config=auto \
          --sarif \
          --output=semgrep-results.sarif \
          .

    - name: Upload SARIF file
      uses: github/codeql-action/upload-sarif@v2
      with:
        sarif_file: semgrep-results.sarif
      if: always()

    - name: Upload results
      uses: actions/upload-artifact@v3
      with:
        name: semgrep-repuerto
        path: semgrep-results.sarif

GitLab CI

# .gitlab-ci.yml
stages:
  - security

semgrep:
  stage: security
  image: returntocorp/semgrep
  script:
    - semgrep --config=auto --json --output=semgrep-repuerto.json .
  artifacts:
    repuertos:
      sast: semgrep-repuerto.json
    paths:
      - semgrep-repuerto.json
    expire_in: 1 week
  allow_failure: true

Jenkins Pipeline

// Jenkinsfile
pipeline \\\{
    agent any

    stages \\\{
        stage('Security Scan') \\\{
            steps \\\{
                script \\\{
                    docker.image('returntocorp/semgrep').inside \\\{
                        sh 'semgrep --config=auto --json --output=semgrep-results.json .'
                        sh 'semgrep --config=auto --sarif --output=semgrep-results.sarif .'
                    \\\}
                \\\}
            \\\}
            post \\\{
                always \\\{
                    archiveArtifacts artifacts: 'semgrep-results.*', huella digital: true

                    // Parse results and fail build if high severity issues found
                    script \\\{
                        def results = readJSON file: 'semgrep-results.json'
                        def errors = results.results.findAll \\\{ it.extra.severity == 'ERROR' \\\}

                        if (errors.size() >`` 0) \\\\{
                            currentBuild.result = 'FAILURE'
                            error("Found $\\\\{errors.size()\\\\} high severity security issues")
                        \\\\}
                    \\\\}
                \\\\}
            \\\\}
        \\\\}
    \\\\}
\\\\}

Azure DevOps

# azure-pipelines.yml
trigger:
- main

pool:
  vmImage: 'ubuntu-latest'

container: returntocorp/semgrep

steps:
- checkout: self

- script: |
    semgrep --config=auto --json --output=$(Agent.TempDirectory)/semgrep-results.json .
    semgrep --config=auto --sarif --output=$(Agent.TempDirectory)/semgrep-results.sarif .
  displayName: 'Run Semgrep Security Scan'

- task: PublishTestResults@2
  inputs:
    testResultsFormat: 'JUnit'
    testResultsFiles: '$(Agent.TempDirectory)/semgrep-results.sarif'
    testRunTitle: 'Semgrep Security Scan'
  condition: always()

Pre-commit Hook

# .pre-commit-config.yaml
repos:
  - repo: https://github.com/returntocorp/semgrep
    rev: 'v1.45.0'
    hooks:
      - id: semgrep
        args: ['--config=auto', '--error']

Language-Specific uso

Python Projects

# Python security scan
semgrep --config=p/python --config=p/flask --config=p/django .

# Python-specific rules
semgrep --config=p/bandit .
semgrep --config=p/secrets .

# Custom Python rules
cat > python-rules.yml << 'EOF'
rules:
  - id: flask-debug-mode
    pattern: app.run(debug=True)
    message: Flask debug mode should not be enabled in production
    languages: [python]
    severity: ERROR

  - id: django-debug-setting
    pattern: DEBUG = True
    message: Django DEBUG should be False in production
    languages: [python]
    severity: ERROR
EOF

semgrep --config=python-rules.yml .

JavaScript/TypeScript Projects

# JavaScript security scan
semgrep --config=p/javascript --config=p/typescript .

# Framework-specific scans
semgrep --config=p/react .
semgrep --config=p/express .
semgrep --config=p/nodejs .

# Custom JavaScript rules
cat > js-rules.yml << 'EOF'
rules:
  - id: eval-uso
    pattern-either:
      - pattern: eval($X)
      - pattern: Function($X)
    message: Avoid using eval() or Function() constructor
    languages: [javascript, typescript]
    severity: ERROR

  - id: innerHTML-xss
    pattern: $EL.innerHTML = $VAR
    message: Potential XSS vulnerabilidad with innerHTML
    languages: [javascript, typescript]
    severity: WARNING
EOF

semgrep --config=js-rules.yml .

Java Projects

# Java security scan
semgrep --config=p/java .
semgrep --config=p/spring .

# Custom Java rules
cat > java-rules.yml << 'EOF'
rules:
  - id: sql-injection-java
    pattern:|
      Statement $STMT = ...;
      ...
      $STMT.executeQuery($QUERY + ...)
    message: Potential inyección SQL vulnerabilidad
    languages: [java]
    severity: ERROR

  - id: hardcoded-contraseña-java
    pattern:|
      String $VAR = "...";
    metavariable-regex:
      metavariable: $VAR
| regex: (?i)(contraseña | passwd | pwd) |
    message: Hardcoded contraseña detected
    languages: [java]
    severity: ERROR
EOF

semgrep --config=java-rules.yml .

Automation and Scripting

Automated Security Scanner

#!/usr/bin/env python3
# semgrep_scanner.py

impuerto subproceso
impuerto json
impuerto sys
impuerto argparse
from pathlib impuerto Path

class SemgrepScanner:
    def __init__(self, project_path, config='auto'):
        self.project_path = Path(project_path)
        self.config = config
        self.results = \\\\{\\\\}

    def run_scan(self, output_format='json', severity_filter=None):
        """Run Semgrep scan with specified parámetros"""
        cmd = [
            'semgrep',
            '--config', self.config,
            f'--\\\\{output_format\\\\}',
            str(self.project_path)
        ]

        if severity_filter:
            cmd.extend(['--severity', severity_filter])

        try:
            result = subproceso.run(cmd, capture_output=True, text=True, check=False)

            if output_format == 'json':
                self.results = json.loads(result.stdout) if result.stdout else \\\\{\\\\}
            else:
                self.results = result.stdout

            return result.returncode == 0

        except subproceso.CalledprocesoError as e:
            print(f"Error running Semgrep: \\\\{e\\\\}")
            return False
        except json.JSONDecodeError as e:
            print(f"Error parsing JSON output: \\\\{e\\\\}")
            return False

    def get_summary(self):
        """Get scan summary"""
        if not isinstance(self.results, dict):
            return "No results available"

        findings = self.results.get('results', [])

        summary = \\\\{
            'total_findings': len(findings),
            'error_count': len([f for f in findings if f.get('extra', \\\\{\\\\}).get('severity') == 'ERROR']),
            'warning_count': len([f for f in findings if f.get('extra', \\\\{\\\\}).get('severity') == 'WARNING']),
            'info_count': len([f for f in findings if f.get('extra', \\\\{\\\\}).get('severity') == 'INFO'])
        \\\\}

        return summary

    def get_findings_by_severity(self, severity='ERROR'):
        """Get findings filtered by severity"""
        if not isinstance(self.results, dict):
            return []

        findings = self.results.get('results', [])
        return [f for f in findings if f.get('extra', \\\\{\\\\}).get('severity') == severity]

    def get_findings_by_rule(self):
        """Group findings by rule ID"""
        if not isinstance(self.results, dict):
            return \\\\{\\\\}

        findings = self.results.get('results', [])
        by_rule = \\\\{\\\\}

        for finding in findings:
            rule_id = finding.get('check_id', 'unknown')
            if rule_id not in by_rule:
                by_rule[rule_id] = []
            by_rule[rule_id].append(finding)

        return by_rule

    def save_results(self, output_file='semgrep_results.json'):
        """Save results to file"""
        if isinstance(self.results, dict):
            with open(output_file, 'w') as f:
                json.dump(self.results, f, indent=2)
        else:
            with open(output_file, 'w') as f:
                f.write(str(self.results))

    def generate_repuerto(self, output_file='semgrep_repuerto.html'):
        """Generate HTML repuerto"""
        cmd = [
            'semgrep',
            '--config', self.config,
            '--output', output_file,
            str(self.project_path)
        ]

        try:
            subproceso.run(cmd, check=True)
            return True
        except subproceso.CalledprocesoError:
            return False

def main():
    parser = argparse.ArgumentParser(Descripción='Automated Semgrep Scanner')
    parser.add_argument('project_path', help='Path to project to scan')
    parser.add_argument('--config', default='auto', help='Semgrep configuración')
    parser.add_argument('--severity', choices=['ERROR', 'WARNING', 'INFO'],
                       help='Filter by severity level')
    parser.add_argument('--output', help='Output file for results')
    parser.add_argument('--format', default='json',
                       choices=['json', 'sarif', 'text'],
                       help='Output format')

    args = parser.parse_args()

    scanner = SemgrepScanner(args.project_path, args.config)

    print(f"Scanning \\\\{args.project_path\\\\} with config \\\\{args.config\\\\}...")
    success = scanner.run_scan(output_format=args.format, severity_filter=args.severity)

    if success:
        if args.format == 'json':
            summary = scanner.get_summary()
            print(f"Scan completed successfully!")
            print(f"Total findings: \\\\{summary['total_findings']\\\\}")
            print(f"Errors: \\\\{summary['error_count']\\\\}")
            print(f"Warnings: \\\\{summary['warning_count']\\\\}")
            print(f"Info: \\\\{summary['info_count']\\\\}")

            # Show top issues by rule
            by_rule = scanner.get_findings_by_rule()
            if by_rule:
                print("\nTop issues by rule:")
                sorted_rules = sorted(by_rule.items(), clave=lambda x: len(x[1]), reverse=True)
                for rule_id, findings in sorted_rules[:5]:
                    print(f"  \\\\{rule_id\\\\}: \\\\{len(findings)\\\\} findings")

        if args.output:
            scanner.save_results(args.output)
            print(f"Results saved to \\\\{args.output\\\\}")

        # Exit with error code if high severity issues found
        if args.format == 'json':
            summary = scanner.get_summary()
            if summary['error_count'] > 0:
                print(f"Found \\\\{summary['error_count']\\\\} high severity issues!")
                sys.exit(1)
    else:
        print("Scan failed!")
        sys.exit(1)

if __name__ == '__main__':
    main()

Batch procesoing Script

#!/bin/bash
# batch_semgrep_scan.sh

# configuración
PROJECTS_DIR="/path/to/projects"
REpuertoS_DIR="/path/to/repuertos"
CONFIG="auto"
DATE=$(date +%Y%m%d_%H%M%S)

# Create repuertos directory
mkdir -p "$REpuertoS_DIR"

# Function to scan project
scan_project() \\\\{
    local project_path="$1"
    local project_name=$(basename "$project_path")
    local repuerto_file="$REpuertoS_DIR/$\\\\{project_name\\\\}_$\\\\{DATE\\\\}.json"
    local sarif_repuerto="$REpuertoS_DIR/$\\\\{project_name\\\\}_$\\\\{DATE\\\\}.sarif"

    echo "Scanning $project_name..."

    # Run Semgrep scan
    semgrep --config="$CONFIG" --json --output="$repuerto_file" "$project_path"
    semgrep --config="$CONFIG" --sarif --output="$sarif_repuerto" "$project_path"

    # Check for high severity issues
    if [ -f "$repuerto_file" ]; then
| error_count=$(jq '[.results[] | select(.extra.severity == "ERROR")] | length' "$repuerto_file" 2>/dev/null |  | echo "0") |

        if [ "$error_count" -gt 0 ]; then
            echo "WARNING: $project_name has $error_count high severity issues!"
            echo "$project_name" >> "$REpuertoS_DIR/high_severity_projects.txt"
        fi
    fi

    echo "Scan completed for $project_name"
\\\\}

# Find and scan all projects
find "$PROJECTS_DIR" -maxdepth 1 -type d|while read -r project_dir; do
    if [ "$project_dir" != "$PROJECTS_DIR" ]; then
        scan_project "$project_dir"
    fi
done

echo "Batch scanning completed. Repuertos saved to $REpuertoS_DIR"

# Generate summary repuerto
echo "=== Batch Scan Summary ===" > "$REpuertoS_DIR/summary_$\\\\{DATE\\\\}.txt"
echo "Scan Date: $(date)" >> "$REpuertoS_DIR/summary_$\\\\{DATE\\\\}.txt"
echo "configuración: $CONFIG" >> "$REpuertoS_DIR/summary_$\\\\{DATE\\\\}.txt"
echo "Total projects scanned: $(find "$REpuertoS_DIR" -name "*_$\\\\{DATE\\\\}.json"|wc -l)" >> "$REpuertoS_DIR/summary_$\\\\{DATE\\\\}.txt"

if [ -f "$REpuertoS_DIR/high_severity_projects.txt" ]; then
    echo "High severity projects: $(wc -l < "$REpuertoS_DIR/high_severity_projects.txt")" >> "$REpuertoS_DIR/summary_$\\\\{DATE\\\\}.txt"
fi

Best Practices

Rule Management

# .semgrep.yml - Project configuración
rules:
  # Security rules
  - p/security-audit
  - p/owasp-top-ten
  - p/secrets

  # Language-specific rules
  - p/python
  - p/javascript

  # Custom rules
  - rules/custom-security.yml
  - rules/custom-performance.yml

exclude:
  - "*/tests/*"
  - "*/test/*"
  - "*/.venv/*"
  - "*/venv/*"
  - "*/node_modules/*"
  - "*/vendor/*"
  - "*.min.js"
  - "*.min.css"

severity:
  - ERROR
  - WARNING

Custom Rule Development

# rules/custom-security.yml
rules:
  - id: custom-jwt-secret
    pattern-either:
      - pattern: jwt.encode($payload, "...", ...)
      - pattern: jwt.decode($token, "...", ...)
    message: |
      JWT secret should not be hardcoded. Use environment variables or secure configuración.
    languages: [python]
    severity: ERROR
    metadata:
      category: security
      cwe: "CWE-798"
      owasp: "A02:2021"
      confidence: HIGH
    fix-regex:
      regex: '"[^"]*"'
      replacement: 'os.environ.get("JWT_SECRET")'

Performance Optimization

# Optimize for large codebases
semgrep --config=auto --max-objetivo-bytes=1000000 .

# Use specific rules instead of auto
semgrep --config=p/security-audit --config=p/owasp-top-ten .

# Exclude unnecessary files
semgrep --config=auto --exclude="*/node_modules/*" --exclude="*/vendor/*" .

# Parallel procesoing
semgrep --config=auto --jobs=4 .

solución de problemas

Common Issues

# Issue: Semgrep running slowly
# Solution: Exclude large directories and use specific rules
semgrep --config=p/security-audit --exclude="*/node_modules/*" .

# Issue: Too many false positives
# Solution: Use higher confidence rules and custom exclusions
semgrep --config=p/security-audit --exclude="*/tests/*" .

# Issue: Missing language suppuerto
# Solution: Check suppuertoed languages and update Semgrep
semgrep --version
pip install --upgrade semgrep

# Issue: Custom rules not working
# Solution: Validate rule sintaxis
semgrep --validate rules/custom.yml

Debug Mode

# Verbose output
semgrep --config=auto --verbose .

# Debug mode
semgrep --config=auto --debug .

# Dry run (validate rules without scanning)
semgrep --config=auto --dryrun .

# Test specific rule
semgrep --config=rules/custom.yml --test .

Resources


This hoja de trucos provides comprehensive guidance for using Semgrep to find security vulnerabilities and enforce code standards. Regular rule updates and custom rule development enhance security coverage.