当前位置：首页 > news >正文

Python 进阶（三）：正则表达式（re 模块）

news 2026/5/22 21:15:43

在这里插入图片描述

❤️ 博客主页：水滴技术
🌸 订阅专栏：Python 入门核心技术
🚀 支持水滴：点赞👍 + 收藏⭐ + 留言💬

文章目录

1. 导入re模块
2. re模块中的常用函数
- 2.1 re.search()
- 2.2 re.findall()
- 2.3 re.sub()
- 2.4 re.compile()
- 2.5 re.split()
3. 正则表达式的语法
4. 匹配对象的属性和方法
- 4.1 group()
- 4.2 start()
- 4.3 end()
- 4.4 span()
5. 常用示例
- 5.1 匹配数字
- 5.2 匹配邮箱
- 5.3 匹配URL
- 5.4 匹配日期
- 5.5 匹配IP地址
- 5.6 匹配HTML标签
- 5.7 匹配手机号码
- 5.8 匹配身份证号码
- 5.9 匹配QQ号码
- 5.10 匹配微信号
- 5.11 匹配邮政编码
- 5.12 匹配中文字符
- 5.13 匹配空白字符
- 5.14 匹配非空白字符
- 5.15 匹配多行文本
- 5.16 匹配特定字符集
- 5.17 匹配特定字符集的补集
- 5.18 匹配重复字符
6. 总结
系列文章
热门专栏

大家好，我是水滴~~

Python标准库中的re模块是用于处理正则表达式的模块。正则表达式是一种用于匹配字符串的强大工具，可以方便地从文本中提取和处理数据。在本教程中，我们将介绍re模块的基本用法和示例。

1. 导入re模块

在使用re模块之前，需要先导入它。可以使用以下代码导入re模块：

import re

2. re模块中的常用函数

Python的re模块提供了众多用于处理正则表达式的函数。下面是一些常用的函数：

2.1 re.search()

re.search()函数用于在字符串中搜索正则表达式的第一个匹配项。如果匹配成功，返回匹配对象；否则返回None。

import restring = "The quick brown fox jumps over the lazy dog."
match = re.search("fox", string)if match:print("Found:", match.group())
else:print("Not found")

输出：

Found: fox

2.2 re.findall()

re.findall()函数用于在字符串中查找所有匹配的子串，并以列表形式返回。如果没有找到匹配，返回空列表。

import restring = "The quick brown fox jumps over the lazy dog."
matches = re.findall("o", string)if matches:print("Found:", matches)
else:print("Not found")

输出：

Found: ['o', 'o', 'o', 'o']

2.3 re.sub()

re.sub()函数用于在字符串中用指定的字符串替换所有匹配的子串，并返回新的字符串。可以使用正则表达式来指定要替换的模式。

import restring = "The quick brown fox jumps over the lazy dog."
new_string = re.sub("fox", "cat", string)print("Old string:", string)
print("New string:", new_string)

输出：

Old string: The quick brown fox jumps over the lazy dog.
New string: The quick brown cat jumps over the lazy dog.

2.4 re.compile()

re.compile()函数用于将正则表达式编译成一个正则表达式对象，以便在后续的匹配中重复使用。

import repattern = re.compile("fox")
string = "The quick brown fox jumps over the lazy dog."
match = pattern.search(string)if match:print("Found:", match.group())
else:print("Not found")

输出：

Found: fox

2.5 re.split()

re.split()函数用于根据正则表达式来分割字符串，并返回分割后的列表。

import restring = "The quick brown fox jumps over the lazy dog."
words = re.split("\W+", string)print(words)

输出：

['The', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog', '']

3. 正则表达式的语法

在使用Python的re模块时，需要使用正则表达式来指定要匹配的模式。正则表达式由一系列字符和特殊字符组成。下面是一些常用的特殊字符：

字符	描述
.	匹配除了换行符外的任意单个字符
*	匹配前面的字符0次或多次
+	匹配前面的字符1次或多次
?	匹配前面的字符0次或1次
{n}	匹配前面的字符n次
{n,}	匹配前面的字符至少n次
{n,m}	匹配前面的字符至少n次，但不超过m次
[]	匹配方括号中的任意单个字符
[^]	匹配不在方括号中的任意单个字符

()	分组，可以在匹配中引用

下面是一些示例：

正则表达式	描述
abc	匹配字符串"abc"
.	匹配任意单个字符
a*	匹配零个或多个字符"a"
a+	匹配一个或多个字符"a"
a?	匹配零个或一个字符"a"
a{3}	匹配三个字符"a"
a{3,}	匹配至少三个字符"a"
a{3,6}	匹配三到六个字符"a"
[abc]	匹配单个字符"a"、“b"或"c”
[^abc]	匹配除了字符"a"、"b"和"c"以外的任意单个字符
(ab)+	匹配一个或多个"ab"字符串

4. 匹配对象的属性和方法

当使用re模块的函数对字符串进行匹配时，将返回一个匹配对象。匹配对象具有以下常用属性和方法：

4.1 group()

group()方法返回匹配的子串。

import restring = "The quick brown fox jumps over the lazy dog."
match = re.search("fox", string)if match:print("Found:", match.group())
else:print("Not found")

输出：

Found: fox

4.2 start()

start()方法返回匹配的子串在原始字符串中的开始位置的索引。

import restring = "The quick brown fox jumps over the lazy dog."
match = re.search("fox", string)if match:print("Start index:", match.start())
else:print("Not found")

输出：

Start index: 16

4.3 end()

end()方法返回匹配的子串在原始字符串中的结束位置的索引。

import restring = "The quick brown fox jumps over the lazy dog."
match = re.search("fox", string)if match:print("End index:", match.end())
else:print("Not found")

输出：

End index: 19

4.4 span()

span()方法返回匹配的子串在原始字符串中的开始和结束位置的索引。

import restring = "The quick brown fox jumps over the lazy dog."
match = re.search("fox", string)if match:print("Span:", match.span())
else:print("Not found")

输出：

Span: (16, 19)

5. 常用示例

正则表达式是一种强大的文本匹配工具，可以用于处理字符串中的数据和信息。在Python中，可以使用re模块来处理正则表达式。下面是一些常见的正则表达式示例，用于展示如何使用Python中的正则表达式。

5.1 匹配数字

匹配数字非常常见，可以使用\d元字符来匹配任意数字。例如，下面的正则表达式将匹配任意数字：

import restring = "There are 123 apples in the basket."
matches = re.findall("\d+", string)print(matches)

输出：

['123']

5.2 匹配邮箱

匹配邮箱也是一种常见的需求，可以使用正则表达式来实现。下面的正则表达式将匹配有效的邮箱地址：

import reemail = "example@domain.com"
pattern = r"^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$"
matches = re.findall(pattern, email)print(matches)

输出：

['example@domain.com']

5.3 匹配URL

匹配URL也是一种常见的需求，可以使用正则表达式来实现。下面的正则表达式将匹配有效的URL地址：

import reurl = "https://www.google.com/search?q=python"
pattern = r"https?://(?:[-\w.]|(?:%[\da-fA-F]{2}))+"
matches = re.findall(pattern, url)print(matches)

输出：

['https://www.google.com']

5.4 匹配日期

匹配日期也是一种常见的需求，可以使用正则表达式来实现。下面的正则表达式将匹配格式为YYYY-MM-DD的日期：

import redate = "Today is 2023-07-29."
pattern = r"\d{4}-\d{2}-\d{2}"
matches = re.findall(pattern, date)print(matches)

输出：

['2023-07-29']

5.5 匹配IP地址

匹配IP地址也是一种常见的需求，可以使用正则表达式来实现。下面的正则表达式将匹配有效的IP地址：

import reip = "192.168.1.1"
pattern = r"\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b"
matches = re.findall(pattern, ip)print(matches)

输出：

['192.168.1.1']

5.6 匹配HTML标签

匹配HTML标签也是一种常见的需求，可以使用正则表达式来实现。下面的正则表达式将匹配HTML标签：

import rehtml = "<p>This is a paragraph.</p>"
pattern = r"<.*?>"
matches = re.findall(pattern, html)print(matches)

输出：

['<p>', '</p>']

5.7 匹配手机号码

匹配手机号码也是一种常见的需求，可以使用正则表达式来实现。下面的正则表达式将匹配有效的手机号码：

import rephone = "13812345678"
pattern = r"1[3-9]\d{9}"
matches = re.findall(pattern, phone)print(matches)

输出：

['13812345678']

5.8 匹配身份证号码

import re
idcard = "310101198001010001"
pattern = r"\d{6}(?:19|20)\d{2}(?:0[1-9]|1[0-2])(?:0[1-9]|[1-2]\d|3[0-1])\d{3}[0-9Xx]"
matches = re.findall(pattern, idcard)
print(matches)

输出：

['310101198001010001']

5.9 匹配QQ号码

import re
qq = "123456"
pattern = r"^[1-9]\d{4,10}$"
matches = re.findall(pattern, qq)
print(matches)

输出：

['123456']

5.10 匹配微信号

import re
wechat = "wx123456"
pattern = r"^[a-zA-Z][-_a-zA-Z0-9]{5,19}$"
matches = re.findall(pattern, wechat)
print(matches)

输出：

['wx123456']

5.11 匹配邮政编码

import re
zipcode = "12345-6789"
pattern = r"\d{5}(?:-\d{4})?"
matches = re.findall(pattern, zipcode)
print(matches)

输出：

['12345-6789']

5.12 匹配中文字符

import re
text = "这是一段中文文本。"
pattern = r"[\u4e00-\u9fa5]+"
matches = re.findall(pattern, text)
print(matches)

输出：

['这是一段中文文本']

5.13 匹配空白字符

import re
text = "This is a sentence with spaces."
matches = re.findall("\s+", text)
print(matches)

输出：

[' ', ' ', ' ', ' ', ' ']

5.14 匹配非空白字符

import re
text = "This is a sentence with spaces."
matches = re.findall("\S+", text)
print(matches)

输出：

['This', 'is', 'a', 'sentence', 'with', 'spaces.']

5.15 匹配多行文本

import re
text = "Line 1\nLine 2\nLine 3"
matches = re.findall(r"^.*$", text, re.MULTILINE)
print(matches)

输出：

['Line 1', 'Line 2', 'Line 3']

5.16 匹配特定字符集

import re
text = "The quick brown fox jumps over the lazy dog."
matches = re.findall("[aeiou]", text)
print(matches)

输出：

['u', 'i', 'o', 'o', 'u', 'o', 'e', 'a', 'o']

5.17 匹配特定字符集的补集

import re
text = "The quick brown fox jumps over the lazy dog."
matches = re.findall("[^aeiou]", text)
print(matches)

输出：

['T', 'h', ' ', 'q', 'c', 'k', ' ', 'b', 'r', 'w', 'n', ' ', 'f', 'x', ' ', 'j', 'm', 'p', 's', ' ', 'v', 'r', ' ', 't', 'h', ' ', 'l', 'z', 'y', ' ', 'd', 'g', '.']

5.18 匹配重复字符

import re
text = "The quick brown fox jumps over the lazy dog."
matches = re.findall("o+", text)
print(matches)

输出：

['o', 'oo', 'o', 'o', 'o']

这些示例展示了如何使用Python中的正则表达式进行文本匹配。正则表达式是一种非常强大的文本处理工具，可以用于处理各种文本数据和信息。在处理和清洗大量文本数据时，正则表达式可以提高工作效率和准确性。

6. 总结

re模块是Python标准库中用于处理正则表达式的模块。它提供了一系列函数和方法，用于在字符串中搜索、替换和分割子串。要使用re模块，需要熟悉正则表达式的语法和常用特殊字符。在匹配成功后，将返回一个匹配对象，可以使用其属性和方法来获取匹配的子串和位置等信息。

系列文章

🔥 Python 进阶（一）：PyCharm 下载、安装和使用
🔥 Python 进阶（二）：操作字符串的常用方法

热门专栏

👍 《Python入门核心技术》
👍 《IDEA 教程：从入门到精通》
👍 《Java 教程：从入门到精通》
👍 《MySQL 教程：从入门到精通》
👍 《大数据核心技术从入门到精通》