Python3使用re模块解析正则表达式-猿码集

1. 概述

Python中的re模块是用于使用正则表达式处理字符串的模块。正则表达式是一种可以用来描述匹配规则的特殊字符集合，可以匹配一系列符合该规则的字符串。re模块提供了多个工具函数，如re.match()、re.search()、re.findall()等，用于在字符串中匹配并返回匹配对象或匹配的字符串等。在本文中，我们将学习Python3中如何使用re模块解析正则表达式。

2. re模块的基本用法

2.1 re.match()函数

re.match()函数用于从字符串开头开始匹配正则表达式。如果正则表达式在字符串开头处匹配成功，返回一个匹配对象；否则返回None。以下代码演示了如何使用re.match()函数：


import re
pattern = r"hello"
string = "hello world"
match = re.match(pattern, string)
if match:
    print(match.group())
else:
    print("match failed")

在上述代码中，pattern是正则表达式字符串，string是待匹配的字符串，match是通过re.match()函数返回的匹配对象。由于pattern在字符串的开头处匹配成功，因此输出结果为“hello”。

2.2 re.search()函数

re.search()函数在整个字符串中搜索匹配正则表达式的第一个位置。如果匹配成功，返回一个匹配对象；否则返回None。以下代码演示了如何使用re.search()函数：


import re
pattern = r"world"
string = "hello world"
search = re.search(pattern, string)
if search:
    print(search.group())
else:
    print("search failed")

在上述代码中，pattern是正则表达式字符串，string是待匹配的字符串，search是通过re.search()函数返回的匹配对象。由于pattern在整个字符串中匹配成功，因此输出结果为“world”。

2.3 re.findall()函数

re.findall()函数返回一个列表，其中包含匹配正则表达式的所有非重叠子字符串。以下代码演示了如何使用re.findall()函数：


import re
pattern = r"\d+"
string = "there are 123 apples and 456 pears"
findall = re.findall(pattern, string)
print(findall)

在上述代码中，pattern是正则表达式字符串，string是待匹配的字符串，findall是通过re.findall()函数返回的列表。由于pattern匹配的是所有数字，因此输出结果为[‘123’, ‘456’]。

3. re模块的高级用法

3.1 re.sub()函数

re.sub()函数可以在字符串中替换所有匹配正则表达式的子字符串。其函数原型如下：


re.sub(pattern, repl, string, count=0, flags=0)

其中，pattern是正则表达式字符串；repl是要替换成的字符串；string是待替换的字符串；count是替换次数，默认为0，代表替换所有匹配成功的字符串；flags是正则表达式的匹配模式，如re.I表示忽略大小写。以下代码演示了如何使用re.sub()函数：


import re
pattern = r"apple"
string = "there are 3 apples and 1 apple pie"
repl = "orange"
sub = re.sub(pattern, repl, string)
print(sub)

在上述代码中，pattern是正则表达式字符串，string是待替换的字符串，repl是要替换成的字符串，sub是通过re.sub()函数返回的字符串。由于上述代码将所有的“apple”替换成了“orange”，因此输出结果为“there are 3 oranges and 1 orange pie”。

3.2 re.split()函数

re.split()函数根据正则表达式的匹配结果，将待切分的字符串分割成多个子字符串。以下代码演示了如何使用re.split()函数：


import re
pattern = r"[,;\s]+"
string = "apple, pear; pear  grape   orange, fruit"
split = re.split(pattern, string)
print(split)

在上述代码中，pattern是正则表达式字符串，string是待切分的字符串，split是通过re.split()函数返回的列表。由于 pattern 匹配的是逗号、分号、空格等分隔符，因此输出结果为[‘apple’, ‘pear’, ‘grape’, ‘orange’, ‘fruit’]。

3.3 re.compile()函数

re.compile()函数用于将正则表达式字符串编译成正则表达式对象，以提高匹配效率。以下代码演示了如何使用re.compile()函数：


import re
string = "there are 4 apples and 3 pears"
pattern = re.compile(r"\d+")
match = pattern.findall(string)
print(match)

在上述代码中，string是待匹配的字符串，pattern是用正则表达式字符串编译成的正则表达式对象，match是通过pattern.findall()函数返回的列表，和re.findall()函数的用法相同。由于上述代码匹配的是所有数字，因此输出结果为[‘4’, ‘3’]。

4. 总结

本文介绍了Python3中re模块的基本用法和高级用法。在实际编程中，需要根据不同的匹配需求选择适当的函数和正则表达式，以实现字符串处理的功能。re模块还提供了其他函数和标志位，如re.IGNORECASE表示忽略大小写，re.DOTALL表示匹配任意字符（包括换行符），re.MULTILINE表示多行匹配等。在使用中需要仔细阅读官方文档和参考资料，以便更好地理解和使用re模块。

Python3使用re模块解析正则表达式