1. Introduction
Regular expressions (regex) are a powerful tool in Python for pattern matching and text manipulation. The re module in Python provides functions to work with regex.
2. Basics of Regular Expressions
2.1 Character Classes
A character class defines a set of characters that can match at a given position in a string. It is denoted by enclosing the characters in square brackets [].
import re
text = "Hello, world!"
pattern = r'[llo]'
matches = re.findall(pattern, text)
print(matches)
In the above code, the pattern [llo] matches any of the characters 'l', 'l', or 'o'. The findall() function returns a list of all matches found in the text.
2.2 Quantifiers
Quantifiers are used to specify how many times a character or group can occur in the input string.
import re
text = "Hello, world!"
pattern = r'o{1,2}'
matches = re.findall(pattern, text)
print(matches)
In the above code, the pattern 'o{1,2}' matches either one or two occurrences of the letter 'o'.
3. Special Sequences
3.1 Anchors
Anchors are used to match specific positions in a string. Commonly used anchors are:
^ - Matches the beginning of a string.
$ - Matches the end of a string.
\b - Matches a word boundary.
\B - Matches a non-word boundary.
import re
text = "Hello, world!"
pattern = r'^H'
matches = re.findall(pattern, text)
print(matches)
In the above code, the pattern '^H' matches the 'H' at the beginning of the string.
3.2 Groups
Groups are portions of a pattern that can be matched individually.
import re
text = "Hello, world!"
pattern = r'(Hello), (world!)'
matches = re.findall(pattern, text)
print(matches)
In the above code, the pattern '(Hello), (world!)' defines two groups: 'Hello' and 'world!'.
4. Practical Examples
4.1 Email Validation
One practical example of using regex is email validation.
import re
def validate_email(email):
pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
if re.match(pattern, email):
return True
return False
email = "test@example.com"
if validate_email(email):
print("Valid email")
else:
print("Invalid email")
The above code uses a regex pattern to validate an email address. It checks if the given email matches the pattern and returns True or False accordingly.
4.2 Phone Number Extraction
Another practical example is extracting phone numbers from a text.
import re
text = "Contact us at 123-456-7890 or email@example.com"
pattern = r'\d{3}-\d{3}-\d{4}'
matches = re.findall(pattern, text)
print(matches)
The above code uses a regex pattern to find all phone numbers in the given text. It searches for patterns in the format '123-456-7890' and returns a list of matched phone numbers.
5. Conclusion
The re module in Python provides a powerful and flexible way to work with regular expressions. It allows you to perform pattern matching and text manipulation efficiently. Regular expressions are useful in many scenarios, such as data validation, text extraction, and search operations. By understanding the basics of regex and using the re module effectively, you can greatly enhance your text processing capabilities in Python.