Regular Expressions (RegEx) - Musteranpassung
*Komplette Anleitung zu regelmäßigen Ausdrücken für Musteranpassung und Textverarbeitung *
Regelmäßige Ausdrücke (regex) sind leistungsstarke Muster-matching-Tools, die über Programmiersprachen, Texteditoren und Befehlszeilentools verwendet werden. Dieser umfassende Leitfaden umfasst Regex-Syntax, gemeinsame Muster und praktische Beispiele für eine effektive Textverarbeitung.
Grundprinzip
Literale Charaktere
```regex
Exact character matching
hello # Matches "hello" exactly 123 # Matches "123" exactly Hello World # Matches "Hello World" exactly
Case sensitivity (depends on flags)
Hello # Matches "Hello" but not "hello" (case sensitive) (?i)Hello # Matches "Hello", "hello", "HELLO" (case insensitive) ```_
Metacharacter
```regex
Special characters with meaning
. # Matches any single character except newline ^ # Matches start of string/line $ # Matches end of string/line * # Matches 0 or more of preceding element + # Matches 1 or more of preceding element ? # Matches 0 or 1 of preceding element |# OR operator (alternation) () # Grouping [] # Character class \\{\\} # Quantifiers \ # Escape character ```_
Zeichenklassen
```regex
Predefined character classes
\d # Any digit (0-9) \D # Any non-digit \w # Any word character (a-z, A-Z, 0-9, _) \W # Any non-word character \s # Any whitespace character (space, tab, newline) \S # Any non-whitespace character \n # Newline character \t # Tab character \r # Carriage return
Custom character classes
[abc] # Matches 'a', 'b', or 'c' [a-z] # Matches any lowercase letter [A-Z] # Matches any uppercase letter [0-9] # Matches any digit [a-zA-Z] # Matches any letter [a-zA-Z0-9] # Matches any alphanumeric character [^abc] # Matches any character except 'a', 'b', or 'c' [^0-9] # Matches any non-digit ```_
Quantifier
Grundwerte
```regex
Exact repetition
a\\{3\\} # Matches exactly 3 'a's: "aaa" a\\{2,5\\} # Matches 2 to 5 'a's: "aa", "aaa", "aaaa", "aaaaa" a\\{3,\\} # Matches 3 or more 'a's: "aaa", "aaaa", etc. a\\{,3\\} # Matches 0 to 3 'a's: "", "a", "aa", "aaa"
Common quantifiers
a* # Matches 0 or more 'a's (equivalent to a\\{0,\\}) a+ # Matches 1 or more 'a's (equivalent to a\\{1,\\}) a? # Matches 0 or 1 'a' (equivalent to a\\{0,1\\}) ```_
Greedy vs Lazy Quantifiers
```regex
Greedy (default) - matches as much as possible
.* # Matches as many characters as possible .+ # Matches as many characters as possible (at least 1) .\\{2,5\\} # Matches as many characters as possible (2-5)
Lazy (non-greedy) - matches as little as possible
.*? # Matches as few characters as possible .+? # Matches as few characters as possible (at least 1) .\\{2,5\\}? # Matches as few characters as possible (2-5)
Example difference
String: "Hello World" H.o # Greedy: matches "Hello Wo" (to last 'o') H.?o # Lazy: matches "Hello" (to first 'o') ```_
Anker und Boundaries
Positions-Anker
```regex
String/line boundaries
^ # Start of string or line $ # End of string or line \A # Start of string (not line) \Z # End of string (not line) \z # Very end of string
Examples
^Hello # Matches "Hello" at start of line World$ # Matches "World" at end of line ^Hello World$ # Matches entire line containing only "Hello World" ```_
Wortschwärmer
```regex
Word boundaries
\b # Word boundary \B # Non-word boundary
Examples
\bcat\b # Matches "cat" as whole word, not in "category" \Bcat\B # Matches "cat" only when not at word boundaries \bcat # Matches "cat" at start of word: "cat", "category" cat\b # Matches "cat" at end of word: "cat", "tomcat" ```_
Gruppen und Aktivitäten
Grundgruppen
```regex
Grouping with parentheses
(abc) # Groups "abc" together (abc)+ # Matches one or more "abc" sequences (abc|def) # Matches either "abc" or "def" (abc)\\{2,4\\} # Matches 2 to 4 "abc" sequences
Non-capturing groups
(?:abc) # Groups without capturing (?:abc|def)+ # Matches sequences of "abc" or "def" ```_
Gruppen aufnehmen
```regex
Numbered captures
(abc)(def) # Capture group 1: "abc", group 2: "def" (\d\\{4\\})-(\d\\{2\\})-(\d\\{2\\}) # Captures date parts: year, month, day
Named captures
(?
Backreferences
(\w+)\s+\1 # Matches repeated words: "hello hello" (["'])(.*?)\1 # Matches quoted strings with same quote type ```_
Lookahead und Lookbehind
Suchbegriffe
```regex
Positive lookahead
\d+(?=\s*dollars) # Matches digits followed by "dollars" \w+(?=@) # Matches username before @ in email
Negative lookahead
\d+(?!\s*cents) # Matches digits NOT followed by "cents" \w+(?!@) # Matches words NOT followed by @ ```_
Blick hinter den Himmel
```regex
Positive lookbehind
(?``<=\$)\d+ # Matches digits preceded by $ (?<=@)\w+ # Matches domain after @ in email
Negative lookbehind
(?<!\$)\d+ # Matches digits NOT preceded by $ (?<!@)\w+ # Matches words NOT preceded by @ ```_
Gemeinsame Muster
E-Mail-Bewertung
```regex
Basic email pattern
\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+.[A-Z|a-z]\{2,\}\b
More comprehensive email
^[a-zA-Z0-9.!#$%&'+/=?^_`\{|\}~-]+@a-zA-Z0-9?(?:.a-zA-Z0-9?)$
Simple email validation
^\S+@\S+.\S+$ ```_
Telefonnummern
```regex
US phone numbers
(?\d\{3\})?[-.\s]?\d\{3\}[-.\s]?\d\{4\} # (123) 456-7890 or 123-456-7890 ^+?1?[-.\s]?(?\d\{3\})?[-.\s]?\d\{3\}[-.\s]?\d\{4\}$ # With optional country code
International format
^+?[1-9]\d\{1,14\}$ # E.164 format ```_
URLs
```regex
Basic URL pattern
https?://[^\s]+
More comprehensive URL
^https?://(?:[-\w.])+(?:\:[0-9]+)?(?:/(?:[\w/_.])(?:\?(?:[\w&=%.]))?(?:#(?:[\w.])*)?)?$
Domain validation
^(?:a-zA-Z0-9?.)+[a-zA-Z]\{2,\}$ ```_
Datum
```regex
MM/DD/YYYY format
| ^(0[1-9] | 1[0-2])/(0[1-9] | [12]\d | 3[01])/\d\{4\}$ |
YYYY-MM-DD format (ISO 8601)
| ^\d\{4\}-(0[1-9] | 1[0-2])-(0[1-9] | [12]\d | 3[01])$ |
Flexible date formats
\b\d\{1,2\}[/-]\d\{1,2\}[/-]\d\{2,4\}\b ```_
Kreditkartennummern
```regex
Visa (starts with 4, 13-16 digits)
^4\d\{12\}(?:\d\{3\})?$
MasterCard (starts with 5, 16 digits)
^5[1-5]\d\{14\}$
American Express (starts with 34 or 37, 15 digits)
^3[47]\d\{13\}$
General credit card (with optional spaces/dashes)
^\d\{4\}[-\s]?\d\{4\}[-\s]?\d\{4\}[-\s]?\d\{4\}$ ```_
IP-Adressen
```regex
IPv4 address
| ^(?:(?:25[0-5] | 2[0-4]\d | [01]?\d\d?).)\{3\}(?:25[0-5] | 2[0-4]\d | [01]?\d\d?)$ |
IPv6 address (simplified)
^(?:[0-9a-fA-F]\{1,4\}:)\{7\}[0-9a-fA-F]\{1,4\}$
IPv4 or IPv6
| ^(?:(?:25[0-5] | 2[0-4]\d | [01]?\d\d?).)\{3\}(?:25[0-5] | 2[0-4]\d | [01]?\d\d?)$ | ^(?:[0-9a-fA-F]\{1,4\}:)\{7\}[0-9a-fA-F]\{1,4\}$ | ```_
Sprachenspezifische Beispiele
JavaScript
```javascript // Basic regex usage const pattern = /hello/i; // Case insensitive const text = "Hello World"; console.log(pattern.test(text)); // true
// String methods with regex text.match(/\w+/g); // ["Hello", "World"] text.replace(/hello/i, "Hi"); // "Hi World" text.split(/\s+/); // ["Hello", "World"]
// Constructor syntax const regex = new RegExp("hello", "i"); ```_
Python
```python import re
Basic usage
pattern = r'hello' text = "Hello World" match = re.search(pattern, text, re.IGNORECASE)
Common methods
re.findall(r'\w+', text) # ['Hello', 'World'] re.sub(r'hello', 'Hi', text, flags=re.IGNORECASE) # "Hi World" re.split(r'\s+', text) # ['Hello', 'World']
Compiled patterns (more efficient for repeated use)
pattern = re.compile(r'\d+') pattern.findall("123 and 456") # ['123', '456'] ```_
Java
```java import java.util.regex.*;
// Basic usage String pattern = "hello"; String text = "Hello World"; boolean matches = Pattern.matches("(?i)" + pattern, text);
// Pattern and Matcher objects Pattern p = Pattern.compile("\w+"); Matcher m = p.matcher(text); while (m.find()) \{ System.out.println(m.group()); // Prints each word \}
// String methods text.replaceAll("(?i)hello", "Hi"); // "Hi World" text.split("\s+"); // ["Hello", "World"] ```_
PHP
```php // Basic usage $pattern = '/hello/i'; $text = "Hello World"; $matches = preg_match($pattern, $text); // 1 if found
// Common functions preg_match_all('/\w+/', $text, $matches); // Find all words preg_replace('/hello/i', 'Hi', $text); // "Hi World" preg_split('/\s+/', $text); // ["Hello", "World"]
// With capture groups preg_match('/(\w+)\s+(\w+)/', $text, $matches); // $matches[1] = "Hello", $matches[2] = "World" ```_
Flaggen und Modifier
Flaggen
```regex
Case insensitive
/pattern/i # JavaScript (?i)pattern # Inline flag re.IGNORECASE # Python
Global (find all matches)
/pattern/g # JavaScript re.findall() # Python (default behavior)
Multiline (^ and $ match line breaks)
/pattern/m # JavaScript re.MULTILINE # Python
Dot matches newline
/pattern/s # JavaScript re.DOTALL # Python
Extended (ignore whitespace, allow comments)
/pattern/x # Some languages re.VERBOSE # Python ```_
Leistungsspitzen
Optimierungsstrategien
```regex
Use specific character classes instead of .
\d+ # Better than .+ for digits [a-zA-Z]+ # Better than .+ for letters
Anchor patterns when possible
^pattern # Faster when pattern should be at start pattern$ # Faster when pattern should be at end
Use non-capturing groups when you don't need the capture
(?:abc)+ # Better than (abc)+ if you don't need the group
Avoid catastrophic backtracking
(a+)+b # Dangerous pattern a+b # Better alternative
Use atomic groups or possessive quantifiers
(?>``a+)b # Atomic group (some languages) a++b # Possessive quantifier (some languages) ```_
Häufige Fallstricke
```regex
Greedy quantifiers can be slow
.expensive # Can be slow on long strings .?expensive # Often faster (lazy)
Alternation order matters
cat|catch # "cat" will match first part of "catch" catch|cat # Better: longer alternative first
Escape special characters
. # Literal dot \$ # Literal dollar sign ( # Literal parenthesis ```_
Prüfung und Debugging
Online-Tools
- regex101.com - Interaktiver Regex-Tester mit Erläuterungen
- regexr.com - Visual Regex Builder und Tester
- regexpal.com - Einfaches Regex-Testwerkzeug
- regexper.com - Visual Regex Diagramme
Teststrategien
```regex
Start simple and build complexity
\d # Start with basic digit matching \d+ # Add quantifier \d\{2,4\} # Add specific range ^\d\{2,4\}$ # Add anchors
Test edge cases
"" # Empty string "a" # Single character "aaa...aaa" # Very long strings "special!@#" # Special characters ```_
Diese umfassende Regex-Führung umfasst die wesentlichen Muster und Techniken, die für eine effektive Textverarbeitung über verschiedene Programmiersprachen und Werkzeuge benötigt werden. Praxis mit realen Beispielen, um diese leistungsstarken Muster-matching Fähigkeiten zu meistern.