python正则表达式-猿码集

正则表达式简介

正则表达式是一种强大的工具，可以用来匹配字符串中的模式，基本上可以看做是一种字符串匹配工具，可以用来从文本中提取出所需的信息，或者是用来验证文本是否符合特定的格式要求。

正则表达式语法

正则表达式语法是一些特殊字符及其组合形成的一套规则，用来匹配文本字符串中的某些部分。以下是一些常用的正则表达式语法符号及其所代表的含义：

.
^
$
*
+
?
{}
[]
|
()

正则表达式的使用

在Python中，可以使用re模块来处理正则表达式。re模块的主要函数有：

search(pattern, string, flags=0)
findall(pattern, string, flags=0)
sub(pattern, repl, string, count=0, flags=0)

正则表达式实例

假如我们有一个字符串，想要从中提取出所有的数字。那么可以使用正则表达式来匹配数字，并将结果返回：

import re
s = 'hello 123 world 456'
pattern = r'\d+'
result = re.findall(pattern, s)
print(result) #['123', '456']

将上述代码复制到Python的交互式环境中，执行后将会输出['123', '456']，即从字符串中提取出了所有的数字。

常用正则表达式模式

匹配数字

匹配任意数字可以使用 \d，如果要匹配多位数字，可以使用 \d+：

import re
s = '123 hello world 456'
pattern = r'\d+'
result = re.findall(pattern, s)
print(result) #['123', '456']

匹配字母

匹配任意字母可以使用 \w，如果要匹配多个字母，可以使用 \w+：

import re
s = '123 hello world ABC'
pattern = r'\w+'
result = re.findall(pattern, s)
print(result) #['123', 'hello', 'world', 'ABC']

匹配单词

要匹配一个单词，可以使用 \b，例如：

import re
s = 'hello world'
pattern = r'\bworld\b'
result = re.findall(pattern, s)
print(result) #['world']

上述代码匹配了字符串中的单词“world”，而不是像 \w+ 一样匹配“helloworld”。

匹配空白字符

空白字符包括空格、制表符、换行等，可以使用 \s 来匹配任意一个空白字符：

import re
s = 'hello   world \n'
pattern = r'\s+'
result = re.findall(pattern, s)
print(result) #['   ', ' ', '\n']

上述代码找到了字符串 s 中的所有空白字符。

匹配邮箱地址

邮箱地址的正则表达式匹配模式比较复杂，正确的邮箱地址包含用户名、@符号、域名以及后缀。假设我们要从一段文本中找出所有的邮箱地址：

import re
s = 'my email is abc@def.com, and his email is xyz@uvw.com'
pattern = r'\b\w+@\w+\.\w+\b'
result = re.findall(pattern, s)
print(result) #['abc@def.com', 'xyz@uvw.com']

上述代码找到了指定文本中的所有正确的邮箱地址。

匹配手机号码

手机号码的正则表达式匹配模式也比较复杂，正确的手机号码包括区号、电话号码和分机号码等，我们可以使用正则表达式来匹配以数字开头的11位数字：

import re
s = 'my phone number is 12345678900, and his phone number is 98765432100'
pattern = r'\b\d{11}\b'
result = re.findall(pattern, s)
print(result) #['12345678900', '98765432100']

上述代码找到了指定文本中的所有11位数字，即手机号码。

结束语

本文介绍了正则表达式的基本语法、用法以及常用模式。正则表达式是Python中应用广泛的字符串处理工具之一，在文本处理、数据清洗等方面都有很多应用场景。通过学习本文，读者可以初步了解正则表达式相关知识，并应用到实际项目中。

python正则表达式