Saltar a contenido

Linux Text Processing Cheat Sheet

"Clase de la hoja" id="copy-btn" class="copy-btn" onclick="copyAllCommands()" Copiar todos los comandos id="pdf-btn" class="pdf-btn" onclick="generatePDF()" Generar PDF seleccionado/button ■/div titulada

Sinopsis

Las herramientas de procesamiento de texto de Linux proporcionan capacidades poderosas para manipular, analizar y transformar datos de texto. Esta guía completa cubre herramientas esenciales como grep, awk, sed, etc., y muchos otros que forman la base del procesamiento de texto de línea de comandos y los flujos de trabajo de análisis de datos.

NOVEDAD Advertencia: Los comandos de procesamiento de texto pueden modificar archivos permanentemente. Siempre copia de seguridad de archivos importantes antes de realizar operaciones de texto a granel.

Vista de archivos y navegación

Pantalla de archivo básica

# Display entire file
cat filename
cat -n filename         # With line numbers
cat -b filename         # Number non-blank lines only
cat -A filename         # Show all characters including non-printing

# Display multiple files
cat file1 file2 file3

# Create file with content
cat > newfile << EOF
Line 1
Line 2
EOF

Paginated Viewing

# Page through file
less filename
more filename

# Less navigation:
# Space/f - next page
# b - previous page
# /pattern - search forward
# ?pattern - search backward
# n - next search result
# N - previous search result
# q - quit

# More options
less +F filename        # Follow file like tail -f
less +/pattern filename # Start at first match

Pantalla de archivo parcial

# First lines of file
head filename
head -n 20 filename     # First 20 lines
head -c 100 filename    # First 100 characters

# Last lines of file
tail filename
tail -n 20 filename     # Last 20 lines
tail -f filename        # Follow file changes
tail -F filename        # Follow with retry

# Specific line ranges
sed -n '10,20p' filename    # Lines 10-20
awk 'NR>=10 && NR``<=20' filename

Búsqueda de patrón con Grep

Uso básico de Grep

# Search for pattern
grep "pattern" filename
grep "pattern" file1 file2 file3

# Case-insensitive search
grep -i "pattern" filename

# Show line numbers
grep -n "pattern" filename

# Show only matching part
grep -o "pattern" filename

# Count matches
grep -c "pattern" filename

Opciones avanzadas de Grep

# Recursive search
grep -r "pattern" /path/to/directory
grep -R "pattern" /path/to/directory

# Search in specific file types
grep -r --include="*.txt" "pattern" /path
grep -r --exclude="*.log" "pattern" /path

# Invert match (show non-matching lines)
grep -v "pattern" filename

# Show context around matches
grep -A 3 "pattern" filename    # 3 lines after
grep -B 3 "pattern" filename    # 3 lines before
grep -C 3 "pattern" filename    # 3 lines before and after

# Multiple patterns
grep -E "pattern1|pattern2" filename
grep -e "pattern1" -e "pattern2" filename

Expresiones regulares con Grep

# Extended regular expressions
grep -E "^start.*end$" filename
grep -E "[0-9]\\\{3\\\}-[0-9]\\\{3\\\}-[0-9]\\\{4\\\}" filename  # Phone numbers
grep -E "\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]\\\{2,\\\}\b" filename  # Email

# Perl-compatible regular expressions
grep -P "\d\\\{3\\\}-\d\\\{3\\\}-\d\\\{4\\\}" filename

# Word boundaries
grep -w "word" filename         # Match whole word only
grep "\bword\b" filename        # Same as -w

# Character classes
grep "[0-9]" filename           # Any digit
grep "[a-zA-Z]" filename        # Any letter
grep "[^0-9]" filename          # Not a digit

Edición de corriente con Sed

Operaciones básicas de las semillas

# Substitute (replace)
sed 's/old/new/' filename              # First occurrence per line
sed 's/old/new/g' filename             # All occurrences
sed 's/old/new/2' filename             # Second occurrence per line

# In-place editing
sed -i 's/old/new/g' filename
sed -i.bak 's/old/new/g' filename      # Create backup

# Case-insensitive substitution
sed 's/old/new/gi' filename

Comandos avanzados de la semilla

# Delete lines
sed '5d' filename               # Delete line 5
sed '5,10d' filename            # Delete lines 5-10
sed '/pattern/d' filename       # Delete lines matching pattern

# Print specific lines
sed -n '5p' filename            # Print line 5 only
sed -n '5,10p' filename         # Print lines 5-10
sed -n '/pattern/p' filename    # Print matching lines

# Insert and append
sed '5i\New line' filename      # Insert before line 5
sed '5a\New line' filename      # Append after line 5

# Multiple commands
sed -e 's/old1/new1/g' -e 's/old2/new2/g' filename
sed 's/old1/new1/g; s/old2/new2/g' filename

Sed with Regular Expressions

# Address ranges with patterns
sed '/start/,/end/d' filename           # Delete from start to end pattern
sed '/pattern/,+5d' filename            # Delete matching line and next 5

# Backreferences
sed 's/\([0-9]*\)-\([0-9]*\)/\2-\1/' filename  # Swap numbers around dash

# Multiple line operations
sed 'N;s/\n/ /' filename               # Join pairs of lines

Procesamiento de texto con AWK

Basic AWK Usage

# Print specific fields
awk '\\\{print $1\\\}' filename              # First field
awk '\\\{print $1, $3\\\}' filename          # First and third fields
awk '\\\{print $NF\\\}' filename             # Last field
awk '\\\{print $(NF-1)\\\}' filename         # Second to last field

# Field separator
awk -F: '\\\{print $1\\\}' /etc/passwd       # Use colon as separator
awk -F',' '\\\{print $2\\\}' file.csv        # Use comma as separator

# Print with custom formatting
awk '\\\{printf "%-10s %s\n", $1, $2\\\}' filename

AWK Pattern Matching

# Pattern matching
awk '/pattern/ \\\{print\\\}' filename
awk '/pattern/ \\\{print $1\\\}' filename
awk '$1 ~ /pattern/ \\\{print\\\}' filename   # First field matches pattern
awk '$1 !~ /pattern/ \\\{print\\\}' filename  # First field doesn't match

# Numeric comparisons
awk '$3 >`` 100 \\\\{print\\\\}' filename         # Third field greater than 100
awk '$2 == "value" \\\\{print\\\\}' filename    # Second field equals value
awk 'NR > 1 \\\\{print\\\\}' filename          # Skip header line

AWK Programming Constructs

# Variables and calculations
awk '\\\\{sum += $1\\\\} END \\\\{print sum\\\\}' filename      # Sum first column
awk '\\\\{count++\\\\} END \\\\{print count\\\\}' filename      # Count lines

# Conditional statements
awk '\\\\{if ($1 > 100) print "High: " $0; else print "Low: " $0\\\\}' filename

# Loops
awk '\\\\{for(i=1; i<=NF; i++) print $i\\\\}' filename # Print each field on new line

# Built-in variables
awk '\\\\{print NR, NF, $0\\\\}' filename      # Line number, field count, whole line
awk 'END \\\\{print NR\\\\}' filename          # Total line count

Características avanzadas de AWK

# Multiple patterns
awk '/start/,/end/ \\\\{print\\\\}' filename    # Print from start to end pattern

# User-defined functions
awk 'function square(x) \\\\{return x*x\\\\} \\\\{print square($1)\\\\}' filename

# Arrays
awk '\\\\{count[$1]++\\\\} END \\\\{for (word in count) print word, count[word]\\\\}' filename

# String functions
awk '\\\\{print length($0)\\\\}' filename       # Line length
awk '\\\\{print substr($0, 1, 10)\\\\}' filename # First 10 characters
awk '\\\\{print toupper($0)\\\\}' filename      # Convert to uppercase

Clasificación y Unicidad

Clasificación básica

# Sort lines alphabetically
sort filename
sort -r filename                # Reverse order
sort -u filename                # Remove duplicates

# Numeric sorting
sort -n filename                # Numeric sort
sort -nr filename               # Numeric reverse sort
sort -h filename                # Human numeric sort (1K, 2M, etc.)

# Sort by specific field
sort -k2 filename               # Sort by second field
sort -k2,2 filename             # Sort by second field only
sort -k2n filename              # Numeric sort by second field

Clasificación avanzada

# Multiple sort keys
sort -k1,1 -k2n filename        # Sort by field 1, then numerically by field 2

# Custom field separator
sort -t: -k3n /etc/passwd       # Sort passwd by UID

# Sort by specific columns
sort -k1.2,1.4 filename         # Sort by characters 2-4 of first field

# Stable sort
sort -s -k2 filename            # Maintain relative order of equal elements

Operaciones únicas

# Remove duplicate lines
uniq filename                   # Remove consecutive duplicates
sort filename|uniq            # Remove all duplicates

# Count occurrences
uniq -c filename                # Count consecutive duplicates
sort filename|uniq -c         # Count all duplicates

# Show only duplicates or unique lines
uniq -d filename                # Show only duplicate lines
uniq -u filename                # Show only unique lines

# Compare fields
uniq -f1 filename               # Skip first field when comparing
uniq -s5 filename               # Skip first 5 characters

Transformación de texto

Traducción de personajes

# Character replacement
tr 'a-z' 'A-Z' < filename       # Convert to uppercase
tr 'A-Z' 'a-z' < filename       # Convert to lowercase
tr ' ' '_' < filename           # Replace spaces with underscores

# Delete characters
tr -d '0-9' < filename          # Delete all digits
tr -d '\n' < filename           # Remove newlines
tr -d '[:punct:]' < filename    # Remove punctuation

# Squeeze repeated characters
tr -s ' ' < filename            # Squeeze multiple spaces to one
tr -s '\n' < filename           # Remove blank lines

Operaciones de corte y sabor

# Extract columns
cut -c1-10 filename             # Characters 1-10
cut -c1,5,10 filename           # Characters 1, 5, and 10
cut -c10- filename              # From character 10 to end

# Extract fields
cut -d: -f1 /etc/passwd         # First field (colon delimiter)
cut -d, -f1,3 file.csv          # Fields 1 and 3 (comma delimiter)
cut -f2- filename               # From field 2 to end (tab delimiter)

# Paste files together
paste file1 file2               # Merge lines side by side
paste -d, file1 file2           # Use comma as delimiter
paste -s filename               # Merge all lines into one

Unirse a Operaciones

# Join files on common field
join file1 file2                # Join on first field
join -1 2 -2 1 file1 file2      # Join field 2 of file1 with field 1 of file2
join -t: file1 file2            # Use colon as field separator

# Outer joins
join -a1 file1 file2            # Include unmatched lines from file1
join -a2 file1 file2            # Include unmatched lines from file2
join -a1 -a2 file1 file2        # Full outer join

Análisis de textos y estadísticas

Palabra y Línea Contando

# Count lines, words, characters
wc filename
wc -l filename                  # Lines only
wc -w filename                  # Words only
wc -c filename                  # Characters only
wc -m filename                  # Characters (multibyte aware)

# Count specific patterns
grep -c "pattern" filename      # Count matching lines
grep -o "pattern" filename|wc -l  # Count pattern occurrences

Análisis de frecuencia

# Word frequency
tr ' ' '\n' < filename|sort|uniq -c|sort -nr

# Character frequency
fold -w1 filename|sort|uniq -c|sort -nr

# Line frequency
sort filename|uniq -c|sort -nr

# Field frequency
awk '\\\\{print $1\\\\}' filename|sort|uniq -c|sort -nr

Procesamiento de texto avanzado

Multi-file Operations

# Process multiple files
grep "pattern" *.txt
awk '\\\\{print FILENAME, $0\\\\}' *.txt
sed 's/old/new/g' *.txt

# Combine files
cat file1 file2 > combined
sort -m sorted1 sorted2 > merged  # Merge sorted files

Pipelines complejos

# Log analysis pipeline
cat access.log|grep "404"|awk '\\\\{print $1\\\\}'|sort|uniq -c|sort -nr

# CSV processing
cut -d, -f2,4 data.csv|grep -v "^$"|sort -u

# Text statistics
cat document.txt|tr -d '[:punct:]'|tr ' ' '\n'|grep -v "^$"|sort|uniq -c|sort -nr|head -10

Herramientas de expresión regular

# Perl-style regex
perl -pe 's/pattern/replacement/g' filename
perl -ne 'print if /pattern/' filename

# Extended grep alternatives
egrep "pattern1|pattern2" filename
fgrep "literal string" filename  # No regex interpretation

Solución de problemas de procesamiento de textos

Cuestiones comunes

# Handle different line endings
dos2unix filename               # Convert DOS to Unix line endings
unix2dos filename               # Convert Unix to DOS line endings
tr -d '\r' < filename           # Remove carriage returns

# Encoding issues
iconv -f ISO-8859-1 -t UTF-8 filename  # Convert encoding
file filename                   # Check file type and encoding

# Large file processing
split -l 1000 largefile prefix  # Split into 1000-line chunks
head -n 1000000 largefile|tail -n 1000  # Process middle section

Optimización del rendimiento

# Faster alternatives for large files
LC_ALL=C sort filename          # Use C locale for faster sorting
mawk instead of awk             # Faster AWK implementation
ripgrep (rg) instead of grep    # Faster search tool

# Memory-efficient processing
sort -S 1G filename             # Use 1GB memory for sorting
split and process in chunks     # For very large files

Recursos

-...

*Esta hoja de trampa proporciona comandos completos de procesamiento de texto para sistemas Linux. Practica con datos de muestra antes de aplicar estos comandos a archivos importantes. *