Linux Text Processing Cheat Sheet
"Clase de la hoja" id="copy-btn" class="copy-btn" onclick="copyAllCommands()" Copiar todos los comandos id="pdf-btn" class="pdf-btn" onclick="generatePDF()" Generar PDF seleccionado/button ■/div titulada
Sinopsis
Las herramientas de procesamiento de texto de Linux proporcionan capacidades poderosas para manipular, analizar y transformar datos de texto. Esta guía completa cubre herramientas esenciales como grep, awk, sed, etc., y muchos otros que forman la base del procesamiento de texto de línea de comandos y los flujos de trabajo de análisis de datos.
NOVEDAD Advertencia: Los comandos de procesamiento de texto pueden modificar archivos permanentemente. Siempre copia de seguridad de archivos importantes antes de realizar operaciones de texto a granel.
Vista de archivos y navegación
Pantalla de archivo básica
# Display entire file
cat filename
cat -n filename # With line numbers
cat -b filename # Number non-blank lines only
cat -A filename # Show all characters including non-printing
# Display multiple files
cat file1 file2 file3
# Create file with content
cat > newfile << EOF
Line 1
Line 2
EOF
Paginated Viewing
# Page through file
less filename
more filename
# Less navigation:
# Space/f - next page
# b - previous page
# /pattern - search forward
# ?pattern - search backward
# n - next search result
# N - previous search result
# q - quit
# More options
less +F filename # Follow file like tail -f
less +/pattern filename # Start at first match
Pantalla de archivo parcial
# First lines of file
head filename
head -n 20 filename # First 20 lines
head -c 100 filename # First 100 characters
# Last lines of file
tail filename
tail -n 20 filename # Last 20 lines
tail -f filename # Follow file changes
tail -F filename # Follow with retry
# Specific line ranges
sed -n '10,20p' filename # Lines 10-20
awk 'NR>=10 && NR``<=20' filename
Búsqueda de patrón con Grep
Uso básico de Grep
# Search for pattern
grep "pattern" filename
grep "pattern" file1 file2 file3
# Case-insensitive search
grep -i "pattern" filename
# Show line numbers
grep -n "pattern" filename
# Show only matching part
grep -o "pattern" filename
# Count matches
grep -c "pattern" filename
Opciones avanzadas de Grep
# Recursive search
grep -r "pattern" /path/to/directory
grep -R "pattern" /path/to/directory
# Search in specific file types
grep -r --include="*.txt" "pattern" /path
grep -r --exclude="*.log" "pattern" /path
# Invert match (show non-matching lines)
grep -v "pattern" filename
# Show context around matches
grep -A 3 "pattern" filename # 3 lines after
grep -B 3 "pattern" filename # 3 lines before
grep -C 3 "pattern" filename # 3 lines before and after
# Multiple patterns
grep -E "pattern1|pattern2" filename
grep -e "pattern1" -e "pattern2" filename
Expresiones regulares con Grep
# Extended regular expressions
grep -E "^start.*end$" filename
grep -E "[0-9]\\\{3\\\}-[0-9]\\\{3\\\}-[0-9]\\\{4\\\}" filename # Phone numbers
grep -E "\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]\\\{2,\\\}\b" filename # Email
# Perl-compatible regular expressions
grep -P "\d\\\{3\\\}-\d\\\{3\\\}-\d\\\{4\\\}" filename
# Word boundaries
grep -w "word" filename # Match whole word only
grep "\bword\b" filename # Same as -w
# Character classes
grep "[0-9]" filename # Any digit
grep "[a-zA-Z]" filename # Any letter
grep "[^0-9]" filename # Not a digit
Edición de corriente con Sed
Operaciones básicas de las semillas
# Substitute (replace)
sed 's/old/new/' filename # First occurrence per line
sed 's/old/new/g' filename # All occurrences
sed 's/old/new/2' filename # Second occurrence per line
# In-place editing
sed -i 's/old/new/g' filename
sed -i.bak 's/old/new/g' filename # Create backup
# Case-insensitive substitution
sed 's/old/new/gi' filename
Comandos avanzados de la semilla
# Delete lines
sed '5d' filename # Delete line 5
sed '5,10d' filename # Delete lines 5-10
sed '/pattern/d' filename # Delete lines matching pattern
# Print specific lines
sed -n '5p' filename # Print line 5 only
sed -n '5,10p' filename # Print lines 5-10
sed -n '/pattern/p' filename # Print matching lines
# Insert and append
sed '5i\New line' filename # Insert before line 5
sed '5a\New line' filename # Append after line 5
# Multiple commands
sed -e 's/old1/new1/g' -e 's/old2/new2/g' filename
sed 's/old1/new1/g; s/old2/new2/g' filename
Sed with Regular Expressions
# Address ranges with patterns
sed '/start/,/end/d' filename # Delete from start to end pattern
sed '/pattern/,+5d' filename # Delete matching line and next 5
# Backreferences
sed 's/\([0-9]*\)-\([0-9]*\)/\2-\1/' filename # Swap numbers around dash
# Multiple line operations
sed 'N;s/\n/ /' filename # Join pairs of lines
Procesamiento de texto con AWK
Basic AWK Usage
# Print specific fields
awk '\\\{print $1\\\}' filename # First field
awk '\\\{print $1, $3\\\}' filename # First and third fields
awk '\\\{print $NF\\\}' filename # Last field
awk '\\\{print $(NF-1)\\\}' filename # Second to last field
# Field separator
awk -F: '\\\{print $1\\\}' /etc/passwd # Use colon as separator
awk -F',' '\\\{print $2\\\}' file.csv # Use comma as separator
# Print with custom formatting
awk '\\\{printf "%-10s %s\n", $1, $2\\\}' filename
AWK Pattern Matching
# Pattern matching
awk '/pattern/ \\\{print\\\}' filename
awk '/pattern/ \\\{print $1\\\}' filename
awk '$1 ~ /pattern/ \\\{print\\\}' filename # First field matches pattern
awk '$1 !~ /pattern/ \\\{print\\\}' filename # First field doesn't match
# Numeric comparisons
awk '$3 >`` 100 \\\\{print\\\\}' filename # Third field greater than 100
awk '$2 == "value" \\\\{print\\\\}' filename # Second field equals value
awk 'NR > 1 \\\\{print\\\\}' filename # Skip header line
AWK Programming Constructs
# Variables and calculations
awk '\\\\{sum += $1\\\\} END \\\\{print sum\\\\}' filename # Sum first column
awk '\\\\{count++\\\\} END \\\\{print count\\\\}' filename # Count lines
# Conditional statements
awk '\\\\{if ($1 > 100) print "High: " $0; else print "Low: " $0\\\\}' filename
# Loops
awk '\\\\{for(i=1; i<=NF; i++) print $i\\\\}' filename # Print each field on new line
# Built-in variables
awk '\\\\{print NR, NF, $0\\\\}' filename # Line number, field count, whole line
awk 'END \\\\{print NR\\\\}' filename # Total line count
Características avanzadas de AWK
# Multiple patterns
awk '/start/,/end/ \\\\{print\\\\}' filename # Print from start to end pattern
# User-defined functions
awk 'function square(x) \\\\{return x*x\\\\} \\\\{print square($1)\\\\}' filename
# Arrays
awk '\\\\{count[$1]++\\\\} END \\\\{for (word in count) print word, count[word]\\\\}' filename
# String functions
awk '\\\\{print length($0)\\\\}' filename # Line length
awk '\\\\{print substr($0, 1, 10)\\\\}' filename # First 10 characters
awk '\\\\{print toupper($0)\\\\}' filename # Convert to uppercase
Clasificación y Unicidad
Clasificación básica
# Sort lines alphabetically
sort filename
sort -r filename # Reverse order
sort -u filename # Remove duplicates
# Numeric sorting
sort -n filename # Numeric sort
sort -nr filename # Numeric reverse sort
sort -h filename # Human numeric sort (1K, 2M, etc.)
# Sort by specific field
sort -k2 filename # Sort by second field
sort -k2,2 filename # Sort by second field only
sort -k2n filename # Numeric sort by second field
Clasificación avanzada
# Multiple sort keys
sort -k1,1 -k2n filename # Sort by field 1, then numerically by field 2
# Custom field separator
sort -t: -k3n /etc/passwd # Sort passwd by UID
# Sort by specific columns
sort -k1.2,1.4 filename # Sort by characters 2-4 of first field
# Stable sort
sort -s -k2 filename # Maintain relative order of equal elements
Operaciones únicas
# Remove duplicate lines
uniq filename # Remove consecutive duplicates
sort filename|uniq # Remove all duplicates
# Count occurrences
uniq -c filename # Count consecutive duplicates
sort filename|uniq -c # Count all duplicates
# Show only duplicates or unique lines
uniq -d filename # Show only duplicate lines
uniq -u filename # Show only unique lines
# Compare fields
uniq -f1 filename # Skip first field when comparing
uniq -s5 filename # Skip first 5 characters
Transformación de texto
Traducción de personajes
# Character replacement
tr 'a-z' 'A-Z' < filename # Convert to uppercase
tr 'A-Z' 'a-z' < filename # Convert to lowercase
tr ' ' '_' < filename # Replace spaces with underscores
# Delete characters
tr -d '0-9' < filename # Delete all digits
tr -d '\n' < filename # Remove newlines
tr -d '[:punct:]' < filename # Remove punctuation
# Squeeze repeated characters
tr -s ' ' < filename # Squeeze multiple spaces to one
tr -s '\n' < filename # Remove blank lines
Operaciones de corte y sabor
# Extract columns
cut -c1-10 filename # Characters 1-10
cut -c1,5,10 filename # Characters 1, 5, and 10
cut -c10- filename # From character 10 to end
# Extract fields
cut -d: -f1 /etc/passwd # First field (colon delimiter)
cut -d, -f1,3 file.csv # Fields 1 and 3 (comma delimiter)
cut -f2- filename # From field 2 to end (tab delimiter)
# Paste files together
paste file1 file2 # Merge lines side by side
paste -d, file1 file2 # Use comma as delimiter
paste -s filename # Merge all lines into one
Unirse a Operaciones
# Join files on common field
join file1 file2 # Join on first field
join -1 2 -2 1 file1 file2 # Join field 2 of file1 with field 1 of file2
join -t: file1 file2 # Use colon as field separator
# Outer joins
join -a1 file1 file2 # Include unmatched lines from file1
join -a2 file1 file2 # Include unmatched lines from file2
join -a1 -a2 file1 file2 # Full outer join
Análisis de textos y estadísticas
Palabra y Línea Contando
# Count lines, words, characters
wc filename
wc -l filename # Lines only
wc -w filename # Words only
wc -c filename # Characters only
wc -m filename # Characters (multibyte aware)
# Count specific patterns
grep -c "pattern" filename # Count matching lines
grep -o "pattern" filename|wc -l # Count pattern occurrences
Análisis de frecuencia
# Word frequency
tr ' ' '\n' < filename|sort|uniq -c|sort -nr
# Character frequency
fold -w1 filename|sort|uniq -c|sort -nr
# Line frequency
sort filename|uniq -c|sort -nr
# Field frequency
awk '\\\\{print $1\\\\}' filename|sort|uniq -c|sort -nr
Procesamiento de texto avanzado
Multi-file Operations
# Process multiple files
grep "pattern" *.txt
awk '\\\\{print FILENAME, $0\\\\}' *.txt
sed 's/old/new/g' *.txt
# Combine files
cat file1 file2 > combined
sort -m sorted1 sorted2 > merged # Merge sorted files
Pipelines complejos
# Log analysis pipeline
cat access.log|grep "404"|awk '\\\\{print $1\\\\}'|sort|uniq -c|sort -nr
# CSV processing
cut -d, -f2,4 data.csv|grep -v "^$"|sort -u
# Text statistics
cat document.txt|tr -d '[:punct:]'|tr ' ' '\n'|grep -v "^$"|sort|uniq -c|sort -nr|head -10
Herramientas de expresión regular
# Perl-style regex
perl -pe 's/pattern/replacement/g' filename
perl -ne 'print if /pattern/' filename
# Extended grep alternatives
egrep "pattern1|pattern2" filename
fgrep "literal string" filename # No regex interpretation
Solución de problemas de procesamiento de textos
Cuestiones comunes
# Handle different line endings
dos2unix filename # Convert DOS to Unix line endings
unix2dos filename # Convert Unix to DOS line endings
tr -d '\r' < filename # Remove carriage returns
# Encoding issues
iconv -f ISO-8859-1 -t UTF-8 filename # Convert encoding
file filename # Check file type and encoding
# Large file processing
split -l 1000 largefile prefix # Split into 1000-line chunks
head -n 1000000 largefile|tail -n 1000 # Process middle section
Optimización del rendimiento
# Faster alternatives for large files
LC_ALL=C sort filename # Use C locale for faster sorting
mawk instead of awk # Faster AWK implementation
ripgrep (rg) instead of grep # Faster search tool
# Memory-efficient processing
sort -S 1G filename # Use 1GB memory for sorting
split and process in chunks # For very large files
Recursos
- GNU Text Utilities Manual
- Guía de programación de AWK
- Sed Manual
- Regular Expressions Tutorial
- Text Processing Ejemplos
-...
*Esta hoja de trampa proporciona comandos completos de procesamiento de texto para sistemas Linux. Practica con datos de muestra antes de aplicar estos comandos a archivos importantes. *