Linux Textverarbeitung Cheat Sheet¶
Überblick¶
Linux Textverarbeitungstools bieten leistungsfähige Fähigkeiten zur Manipulation, Analyse und Transformation von Textdaten. Dieser umfassende Leitfaden umfasst wesentliche Werkzeuge wie Grep, awk, sed, sortieren und viele andere, die die Grundlage der Textverarbeitung und Datenanalyse-Workflows bilden.
ZEIT Warning: Textverarbeitungsbefehle können Dateien dauerhaft ändern. Sichern Sie immer wichtige Dateien, bevor Sie Massentextoperationen durchführen.
Dateiansicht und Navigation¶
Grundlegende Dateianzeige¶
```bash
Display entire file¶
cat filename cat -n filename # With line numbers cat -b filename # Number non-blank lines only cat -A filename # Show all characters including non-printing
Display multiple files¶
cat file1 file2 file3
Create file with content¶
cat > newfile << EOF Line 1 Line 2 EOF ```_
Paginated Viewing¶
```bash
Page through file¶
less filename more filename
Less navigation:¶
Space/f - next page¶
b - previous page¶
/pattern - search forward¶
?pattern - search backward¶
n - next search result¶
N - previous search result¶
q - quit¶
More options¶
less +F filename # Follow file like tail -f less +/pattern filename # Start at first match ```_
Teilweise Dateianzeige¶
```bash
First lines of file¶
head filename head -n 20 filename # First 20 lines head -c 100 filename # First 100 characters
Last lines of file¶
tail filename tail -n 20 filename # Last 20 lines tail -f filename # Follow file changes tail -F filename # Follow with retry
Specific line ranges¶
sed -n '10,20p' filename # Lines 10-20 awk 'NR>=10 && NR``<=20' filename ```_
Muster Suche mit Grep¶
Basic Grep Verwendung¶
```bash
Search for pattern¶
grep "pattern" filename grep "pattern" file1 file2 file3
Case-insensitive search¶
grep -i "pattern" filename
Show line numbers¶
grep -n "pattern" filename
Show only matching part¶
grep -o "pattern" filename
Count matches¶
grep -c "pattern" filename ```_
Erweiterte Grep Optionen¶
```bash
Recursive search¶
grep -r "pattern" /path/to/directory grep -R "pattern" /path/to/directory
Search in specific file types¶
grep -r --include=".txt" "pattern" /path grep -r --exclude=".log" "pattern" /path
Invert match (show non-matching lines)¶
grep -v "pattern" filename
Show context around matches¶
grep -A 3 "pattern" filename # 3 lines after grep -B 3 "pattern" filename # 3 lines before grep -C 3 "pattern" filename # 3 lines before and after
Multiple patterns¶
grep -E "pattern1|pattern2" filename grep -e "pattern1" -e "pattern2" filename ```_
Reguläre Ausdrücke mit Grep¶
```bash
Extended regular expressions¶
grep -E "^start.*end$" filename grep -E "[0-9]\{3\}-[0-9]\{3\}-[0-9]\{4\}" filename # Phone numbers grep -E "\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+.[A-Z|a-z]\{2,\}\b" filename # Email
Perl-compatible regular expressions¶
grep -P "\d\{3\}-\d\{3\}-\d\{4\}" filename
Word boundaries¶
grep -w "word" filename # Match whole word only grep "\bword\b" filename # Same as -w
Character classes¶
grep "[0-9]" filename # Any digit grep "[a-zA-Z]" filename # Any letter grep "[^0-9]" filename # Not a digit ```_
Stream Editieren mit Sed¶
Basis-Sed-Operationen¶
```bash
Substitute (replace)¶
sed 's/old/new/' filename # First occurrence per line sed 's/old/new/g' filename # All occurrences sed 's/old/new/2' filename # Second occurrence per line
In-place editing¶
sed -i 's/old/new/g' filename sed -i.bak 's/old/new/g' filename # Create backup
Case-insensitive substitution¶
sed 's/old/new/gi' filename ```_
Erweiterte Sed Befehle¶
```bash
Delete lines¶
sed '5d' filename # Delete line 5 sed '5,10d' filename # Delete lines 5-10 sed '/pattern/d' filename # Delete lines matching pattern
Print specific lines¶
sed -n '5p' filename # Print line 5 only sed -n '5,10p' filename # Print lines 5-10 sed -n '/pattern/p' filename # Print matching lines
Insert and append¶
sed '5i\New line' filename # Insert before line 5 sed '5a\New line' filename # Append after line 5
Multiple commands¶
sed -e 's/old1/new1/g' -e 's/old2/new2/g' filename sed 's/old1/new1/g; s/old2/new2/g' filename ```_
Sed mit regelmäßigen Ausdrücken¶
```bash
Address ranges with patterns¶
sed '/start/,/end/d' filename # Delete from start to end pattern sed '/pattern/,+5d' filename # Delete matching line and next 5
Backreferences¶
sed 's/\([0-9]*\)-\([0-9]*\)/\2-\1/' filename # Swap numbers around dash
Multiple line operations¶
sed 'N;s/\n/ /' filename # Join pairs of lines ```_
Textverarbeitung mit AWK¶
Basis AWK Verwendung¶
```bash
Print specific fields¶
awk '\{print $1\}' filename # First field awk '\{print $1, $3\}' filename # First and third fields awk '\{print $NF\}' filename # Last field awk '\{print $(NF-1)\}' filename # Second to last field
Field separator¶
awk -F: '\{print $1\}' /etc/passwd # Use colon as separator awk -F',' '\{print $2\}' file.csv # Use comma as separator
Print with custom formatting¶
awk '\{printf "%-10s %s\n", $1, $2\}' filename ```_
AWK Muster passend¶
```bash
Pattern matching¶
awk '/pattern/ \{print\}' filename awk '/pattern/ \{print \(1\\\}' filename awk '\)1 ~ /pattern/ \{print\}' filename # First field matches pattern awk '$1 !~ /pattern/ \{print\}' filename # First field doesn't match
Numeric comparisons¶
awk '\(3 >`` 100 \\\\{print\\\\}' filename # Third field greater than 100 awk '\)2 == "value" \\{print\\}' filename # Second field equals value awk 'NR > 1 \\{print\\}' filename # Skip header line ```_
AWK Programmierkonstrukte¶
```bash
Variables and calculations¶
awk '\\{sum += $1\\} END \\{print sum\\}' filename # Sum first column awk '\\{count++\\} END \\{print count\\}' filename # Count lines
Conditional statements¶
awk '\\{if ($1 > 100) print "High: " $0; else print "Low: " $0\\}' filename
Loops¶
awk '\\{for(i=1; i<=NF; i++) print $i\\}' filename # Print each field on new line
Built-in variables¶
awk '\\{print NR, NF, $0\\}' filename # Line number, field count, whole line awk 'END \\{print NR\\}' filename # Total line count ```_
AWK erweiterte Funktionen¶
```bash
Multiple patterns¶
awk '/start/,/end/ \\{print\\}' filename # Print from start to end pattern
User-defined functions¶
awk 'function square(x) \\{return x*x\\} \\{print square($1)\\}' filename
Arrays¶
awk '\\{count[$1]++\\} END \\{for (word in count) print word, count[word]\\}' filename
String functions¶
awk '\\{print length(\(0)\\\\}' filename # Line length awk '\\\\{print substr(\)0, 1, 10)\\}' filename # First 10 characters awk '\\{print toupper($0)\\}' filename # Convert to uppercase ```_
Sortierung und Einzigartigkeit¶
Grundlegende Sortierung¶
```bash
Sort lines alphabetically¶
sort filename sort -r filename # Reverse order sort -u filename # Remove duplicates
Numeric sorting¶
sort -n filename # Numeric sort sort -nr filename # Numeric reverse sort sort -h filename # Human numeric sort (1K, 2M, etc.)
Sort by specific field¶
sort -k2 filename # Sort by second field sort -k2,2 filename # Sort by second field only sort -k2n filename # Numeric sort by second field ```_
Erweiterte Sortierung¶
```bash
Multiple sort keys¶
sort -k1,1 -k2n filename # Sort by field 1, then numerically by field 2
Custom field separator¶
sort -t: -k3n /etc/passwd # Sort passwd by UID
Sort by specific columns¶
sort -k1.2,1.4 filename # Sort by characters 2-4 of first field
Stable sort¶
sort -s -k2 filename # Maintain relative order of equal elements ```_
Einzigartige Operationen¶
```bash
Remove duplicate lines¶
uniq filename # Remove consecutive duplicates sort filename|uniq # Remove all duplicates
Count occurrences¶
uniq -c filename # Count consecutive duplicates sort filename|uniq -c # Count all duplicates
Show only duplicates or unique lines¶
uniq -d filename # Show only duplicate lines uniq -u filename # Show only unique lines
Compare fields¶
uniq -f1 filename # Skip first field when comparing uniq -s5 filename # Skip first 5 characters ```_
Texttransformation¶
Charakter Übersetzung¶
```bash
Character replacement¶
tr 'a-z' 'A-Z' < filename # Convert to uppercase tr 'A-Z' 'a-z' < filename # Convert to lowercase tr ' ' '_' < filename # Replace spaces with underscores
Delete characters¶
tr -d '0-9' < filename # Delete all digits tr -d '\n' < filename # Remove newlines tr -d '[:punct:]' < filename # Remove punctuation
Squeeze repeated characters¶
tr -s ' ' < filename # Squeeze multiple spaces to one tr -s '\n' < filename # Remove blank lines ```_
Schneiden und Einfügen¶
```bash
Extract columns¶
cut -c1-10 filename # Characters 1-10 cut -c1,5,10 filename # Characters 1, 5, and 10 cut -c10- filename # From character 10 to end
Extract fields¶
cut -d: -f1 /etc/passwd # First field (colon delimiter) cut -d, -f1,3 file.csv # Fields 1 and 3 (comma delimiter) cut -f2- filename # From field 2 to end (tab delimiter)
Paste files together¶
paste file1 file2 # Merge lines side by side paste -d, file1 file2 # Use comma as delimiter paste -s filename # Merge all lines into one ```_
Teilnahme an Operationen¶
```bash
Join files on common field¶
join file1 file2 # Join on first field join -1 2 -2 1 file1 file2 # Join field 2 of file1 with field 1 of file2 join -t: file1 file2 # Use colon as field separator
Outer joins¶
join -a1 file1 file2 # Include unmatched lines from file1 join -a2 file1 file2 # Include unmatched lines from file2 join -a1 -a2 file1 file2 # Full outer join ```_
Textanalyse und Statistik¶
Wort- und Zeilenzählung¶
```bash
Count lines, words, characters¶
wc filename wc -l filename # Lines only wc -w filename # Words only wc -c filename # Characters only wc -m filename # Characters (multibyte aware)
Count specific patterns¶
grep -c "pattern" filename # Count matching lines grep -o "pattern" filename|wc -l # Count pattern occurrences ```_
Frequenzanalyse¶
```bash
Word frequency¶
tr ' ' '\n' < filename|sort|uniq -c|sort -nr
Character frequency¶
fold -w1 filename|sort|uniq -c|sort -nr
Line frequency¶
sort filename|uniq -c|sort -nr
Field frequency¶
awk '\\{print $1\\}' filename|sort|uniq -c|sort -nr ```_
Erweiterte Textverarbeitung¶
Multifile Operationen¶
```bash
Process multiple files¶
grep "pattern" *.txt awk '\\{print FILENAME, $0\\}' *.txt sed 's/old/new/g' *.txt
Combine files¶
cat file1 file2 > combined sort -m sorted1 sorted2 > merged # Merge sorted files ```_
Komplexe Pipelines¶
```bash
Log analysis pipeline¶
cat access.log|grep "404"|awk '\\{print $1\\}'|sort|uniq -c|sort -nr
CSV processing¶
cut -d, -f2,4 data.csv|grep -v "^$"|sort -u
Text statistics¶
cat document.txt|tr -d '[:punct:]'|tr ' ' '\n'|grep -v "^$"|sort|uniq -c|sort -nr|head -10 ```_
Regelmäßige Ausdruckswerkzeuge¶
```bash
Perl-style regex¶
perl -pe 's/pattern/replacement/g' filename perl -ne 'print if /pattern/' filename
Extended grep alternatives¶
egrep "pattern1|pattern2" filename fgrep "literal string" filename # No regex interpretation ```_
Fehlerbehebung der Textverarbeitung¶
Gemeinsame Themen¶
```bash
Handle different line endings¶
dos2unix filename # Convert DOS to Unix line endings unix2dos filename # Convert Unix to DOS line endings tr -d '\r' < filename # Remove carriage returns
Encoding issues¶
iconv -f ISO-8859-1 -t UTF-8 filename # Convert encoding file filename # Check file type and encoding
Large file processing¶
split -l 1000 largefile prefix # Split into 1000-line chunks head -n 1000000 largefile|tail -n 1000 # Process middle section ```_
Leistungsoptimierung¶
```bash
Faster alternatives for large files¶
LC_ALL=C sort filename # Use C locale for faster sorting mawk instead of awk # Faster AWK implementation ripgrep (rg) instead of grep # Faster search tool
Memory-efficient processing¶
sort -S 1G filename # Use 1GB memory for sorting split and process in chunks # For very large files ```_
Ressourcen¶
- GNU Text Utilities Manual
- [AWK Programming Guide](LINK_5
- [Sed Manual](LINK_5_
- Regular Expressions Tutorial
- Textverarbeitungsbeispiele
--
*Dieses Betrugsblatt bietet umfassende Textverarbeitungsbefehle für Linux-Systeme. Üben Sie mit Beispieldaten, bevor Sie diese Befehle auf wichtige Dateien anwenden. *