Aller au contenu

Expressions régulières (RegEx) - Correspondance des motifs

*Remplir le guide des expressions régulières pour la correspondance des motifs et le traitement de texte *

Les expressions régulières (regex) sont de puissants outils d'appariement de motifs utilisés dans les langages de programmation, les éditeurs de texte et les outils en ligne de commande. Ce guide complet couvre la syntaxe régex, les modèles communs et des exemples pratiques pour un traitement efficace du texte.

Syntaxe de base

Personnages littéraires

# Exact character matching
hello           # Matches "hello" exactly
123             # Matches "123" exactly
Hello World     # Matches "Hello World" exactly

# Case sensitivity (depends on flags)
Hello           # Matches "Hello" but not "hello" (case sensitive)
(?i)Hello       # Matches "Hello", "hello", "HELLO" (case insensitive)

Métacaractères

# Special characters with meaning
.               # Matches any single character except newline
^               # Matches start of string/line
$               # Matches end of string/line
*               # Matches 0 or more of preceding element
+               # Matches 1 or more of preceding element
?               # Matches 0 or 1 of preceding element
|# OR operator (alternation)
()              # Grouping
[]              # Character class
\\\\{\\\\}              # Quantifiers
\               # Escape character
```_

### Classes de caractères

```regex
# Predefined character classes
\d              # Any digit (0-9)
\D              # Any non-digit
\w              # Any word character (a-z, A-Z, 0-9, _)
\W              # Any non-word character
\s              # Any whitespace character (space, tab, newline)
\S              # Any non-whitespace character
\n              # Newline character
\t              # Tab character
\r              # Carriage return

# Custom character classes
[abc]           # Matches 'a', 'b', or 'c'
[a-z]           # Matches any lowercase letter
[A-Z]           # Matches any uppercase letter
[0-9]           # Matches any digit
[a-zA-Z]        # Matches any letter
[a-zA-Z0-9]     # Matches any alphanumeric character
[^abc]          # Matches any character except 'a', 'b', or 'c'
[^0-9]          # Matches any non-digit
```_

## Quantités

### Quantités de base

```regex
# Exact repetition
a\\\\{3\\\\}            # Matches exactly 3 'a's: "aaa"
a\\\\{2,5\\\\}          # Matches 2 to 5 'a's: "aa", "aaa", "aaaa", "aaaaa"
a\\\\{3,\\\\}           # Matches 3 or more 'a's: "aaa", "aaaa", etc.
a\\\\{,3\\\\}           # Matches 0 to 3 'a's: "", "a", "aa", "aaa"

# Common quantifiers
a*              # Matches 0 or more 'a's (equivalent to a\\\\{0,\\\\})
a+              # Matches 1 or more 'a's (equivalent to a\\\\{1,\\\\})
a?              # Matches 0 or 1 'a' (equivalent to a\\\\{0,1\\\\})

Quantités de graisse et de lazy

# Greedy (default) - matches as much as possible
.*              # Matches as many characters as possible
.+              # Matches as many characters as possible (at least 1)
.\\\\{2,5\\\\}          # Matches as many characters as possible (2-5)

# Lazy (non-greedy) - matches as little as possible
.*?             # Matches as few characters as possible
.+?             # Matches as few characters as possible (at least 1)
.\\\\{2,5\\\\}?         # Matches as few characters as possible (2-5)

# Example difference
String: "Hello World"
H.*o            # Greedy: matches "Hello Wo" (to last 'o')
H.*?o           # Lazy: matches "Hello" (to first 'o')

Ancres et limites

Ancres de position

# String/line boundaries
^               # Start of string or line
$               # End of string or line
\A              # Start of string (not line)
\Z              # End of string (not line)
\z              # Very end of string

# Examples
^Hello          # Matches "Hello" at start of line
World$          # Matches "World" at end of line
^Hello World$   # Matches entire line containing only "Hello World"

Limites des mots

# Word boundaries
\b              # Word boundary
\B              # Non-word boundary

# Examples
\bcat\b         # Matches "cat" as whole word, not in "category"
\Bcat\B         # Matches "cat" only when not at word boundaries
\bcat           # Matches "cat" at start of word: "cat", "category"
cat\b           # Matches "cat" at end of word: "cat", "tomcat"

Groupes et capture

Groupe de base

# Grouping with parentheses
(abc)           # Groups "abc" together
(abc)+          # Matches one or more "abc" sequences
(abc|def)       # Matches either "abc" or "def"
(abc)\\\\{2,4\\\\}      # Matches 2 to 4 "abc" sequences

# Non-capturing groups
(?:abc)         # Groups without capturing
(?:abc|def)+    # Matches sequences of "abc" or "def"

Groupes de capture

# Numbered captures
(abc)(def)      # Capture group 1: "abc", group 2: "def"
(\d\\\\{4\\\\})-(\d\\\\{2\\\\})-(\d\\\\{2\\\\})  # Captures date parts: year, month, day

# Named captures
(?<year>\d\\\\{4\\\\})-(?<month>\d\\\\{2\\\\})-(?<day>\d\\\\{2\\\\})  # Named groups
(?P<name>\w+)   # Python-style named group

# Backreferences
(\w+)\s+\1      # Matches repeated words: "hello hello"
(["'])(.*?)\1   # Matches quoted strings with same quote type

Lookahead et Lookbehind

Lookahead Assertions

# Positive lookahead
\d+(?=\s*dollars)    # Matches digits followed by "dollars"
\w+(?=@)             # Matches username before @ in email

# Negative lookahead
\d+(?!\s*cents)      # Matches digits NOT followed by "cents"
\w+(?!@)             # Matches words NOT followed by @

Regardez derrière Assertions

# Positive lookbehind
(?``<=\$)\d+           # Matches digits preceded by $
(?<=@)\w+            # Matches domain after @ in email

# Negative lookbehind
(?<!\$)\d+           # Matches digits NOT preceded by $
(?<!@)\w+            # Matches words NOT preceded by @

Modèles communs

Validation par courriel

# Basic email pattern
\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]\\\{2,\\\}\b

# More comprehensive email
^[a-zA-Z0-9.!#$%&'*+/=?^_`\\\{|\\\}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]\\\{0,61\\\}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]\\\{0,61\\\}[a-zA-Z0-9])?)*$

# Simple email validation
^\S+@\S+\.\S+$

Numéro de téléphone

# US phone numbers
\(?\d\\\{3\\\}\)?[-.\s]?\d\\\{3\\\}[-.\s]?\d\\\{4\\\}    # (123) 456-7890 or 123-456-7890
^\+?1?[-.\s]?\(?\d\\\{3\\\}\)?[-.\s]?\d\\\{3\\\}[-.\s]?\d\\\{4\\\}$  # With optional country code

# International format
^\+?[1-9]\d\\\{1,14\\\}$                     # E.164 format

URLs

# Basic URL pattern
https?://[^\s]+

# More comprehensive URL
^https?://(?:[-\w.])+(?:\:[0-9]+)?(?:/(?:[\w/_.])*(?:\?(?:[\w&=%.])*)?(?:\#(?:[\w.])*)?)?$

# Domain validation
^(?:[a-zA-Z0-9](?:[a-zA-Z0-9-]\\\{0,61\\\}[a-zA-Z0-9])?\.)+[a-zA-Z]\\\{2,\\\}$

Dates

# MM/DD/YYYY format
^(0[1-9]|1[0-2])/(0[1-9]|[12]\d|3[01])/\d\\\{4\\\}$

# YYYY-MM-DD format (ISO 8601)
^\d\\\{4\\\}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])$

# Flexible date formats
\b\d\\\{1,2\\\}[/-]\d\\\{1,2\\\}[/-]\d\\\{2,4\\\}\b

Numéros de carte de crédit

# Visa (starts with 4, 13-16 digits)
^4\d\\\{12\\\}(?:\d\\\{3\\\})?$

# MasterCard (starts with 5, 16 digits)
^5[1-5]\d\\\{14\\\}$

# American Express (starts with 34 or 37, 15 digits)
^3[47]\d\\\{13\\\}$

# General credit card (with optional spaces/dashes)
^\d\\\{4\\\}[-\s]?\d\\\{4\\\}[-\s]?\d\\\{4\\\}[-\s]?\d\\\{4\\\}$

Adresses IP

# IPv4 address
^(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.)\\\{3\\\}(?:25[0-5]|2[0-4]\d|[01]?\d\d?)$

# IPv6 address (simplified)
^(?:[0-9a-fA-F]\\\{1,4\\\}:)\\\{7\\\}[0-9a-fA-F]\\\{1,4\\\}$

# IPv4 or IPv6
^(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.)\\\{3\\\}(?:25[0-5]|2[0-4]\d|[01]?\d\d?)$|^(?:[0-9a-fA-F]\\\{1,4\\\}:)\\\{7\\\}[0-9a-fA-F]\\\{1,4\\\}$

Exemples spécifiques à la langue

JavaScript

// Basic regex usage
const pattern = /hello/i;              // Case insensitive
const text = "Hello World";
console.log(pattern.test(text));      // true

// String methods with regex
text.match(/\w+/g);                    // ["Hello", "World"]
text.replace(/hello/i, "Hi");          // "Hi World"
text.split(/\s+/);                     // ["Hello", "World"]

// Constructor syntax
const regex = new RegExp("hello", "i");

Python

import re

# Basic usage
pattern = r'hello'
text = "Hello World"
match = re.search(pattern, text, re.IGNORECASE)

# Common methods
re.findall(r'\w+', text)              # ['Hello', 'World']
re.sub(r'hello', 'Hi', text, flags=re.IGNORECASE)  # "Hi World"
re.split(r'\s+', text)                # ['Hello', 'World']

# Compiled patterns (more efficient for repeated use)
pattern = re.compile(r'\d+')
pattern.findall("123 and 456")        # ['123', '456']

Java

import java.util.regex.*;

// Basic usage
String pattern = "hello";
String text = "Hello World";
boolean matches = Pattern.matches("(?i)" + pattern, text);

// Pattern and Matcher objects
Pattern p = Pattern.compile("\\w+");
Matcher m = p.matcher(text);
while (m.find()) \\\{
    System.out.println(m.group());    // Prints each word
\\\}

// String methods
text.replaceAll("(?i)hello", "Hi");   // "Hi World"
text.split("\\s+");                   // ["Hello", "World"]

PHP

// Basic usage
$pattern = '/hello/i';
$text = "Hello World";
$matches = preg_match($pattern, $text);  // 1 if found

// Common functions
preg_match_all('/\w+/', $text, $matches);     // Find all words
preg_replace('/hello/i', 'Hi', $text);        // "Hi World"
preg_split('/\s+/', $text);                   // ["Hello", "World"]

// With capture groups
preg_match('/(\w+)\s+(\w+)/', $text, $matches);
// $matches[1] = "Hello", $matches[2] = "World"

Drapeaux et modifications

Drapeaux communs

# Case insensitive
/pattern/i      # JavaScript
(?i)pattern     # Inline flag
re.IGNORECASE   # Python

# Global (find all matches)
/pattern/g      # JavaScript
re.findall()    # Python (default behavior)

# Multiline (^ and $ match line breaks)
/pattern/m      # JavaScript
re.MULTILINE    # Python

# Dot matches newline
/pattern/s      # JavaScript
re.DOTALL       # Python

# Extended (ignore whitespace, allow comments)
/pattern/x      # Some languages
re.VERBOSE      # Python

Conseils de performance

Stratégies d'optimisation

# Use specific character classes instead of .
\d+             # Better than .+ for digits
[a-zA-Z]+       # Better than .+ for letters

# Anchor patterns when possible
^pattern        # Faster when pattern should be at start
pattern$        # Faster when pattern should be at end

# Use non-capturing groups when you don't need the capture
(?:abc)+        # Better than (abc)+ if you don't need the group

# Avoid catastrophic backtracking
(a+)+b          # Dangerous pattern
a+b             # Better alternative

# Use atomic groups or possessive quantifiers
(?>``a+)b         # Atomic group (some languages)
a++b            # Possessive quantifier (some languages)

Pièges fréquents

# Greedy quantifiers can be slow
.*expensive     # Can be slow on long strings
.*?expensive    # Often faster (lazy)

# Alternation order matters
cat|catch       # "cat" will match first part of "catch"
catch|cat       # Better: longer alternative first

# Escape special characters
\.              # Literal dot
\$              # Literal dollar sign
\(              # Literal parenthesis

Essais et débogage

Outils en ligne

  • regex101.com - Testeur régex interactif avec explications
  • regexr.com - Constructeur et testeur de regex visuel
  • regexpal.com - Outil de test simple regex
  • regexper.com - Diagrammes visuels

Stratégies d'essai

# Start simple and build complexity
\d              # Start with basic digit matching
\d+             # Add quantifier
\d\\{2,4\\}         # Add specific range
^\d\\{2,4\\}$       # Add anchors

# Test edge cases
""              # Empty string
"a"             # Single character
"aaa...aaa"     # Very long strings
"special!@#"    # Special characters

Ce guide complet du régex couvre les modèles et les techniques indispensables pour un traitement efficace du texte dans différents langages et outils de programmation. Pratiquez avec de vrais exemples pour maîtriser ces puissantes capacités d'appariement de motifs.