python re.match()用法相关示例

(编辑：jimmy 日期: 2026/1/28 浏览：3 次 )

学习python爬虫时遇到了一个问题，书上有示例如下：

import re

line='Cats are smarter than dogs'
matchObj=re.match(r'(.*)are(.*"htmlcode">

matchObj=re.match(r'(.*)are(.*"htmlcode">

import re

line='Cats are smarter than dogs'
matchObj=re.match(r'(.*)are(.*"matchObj.group():",matchObj.group())
 print("matchObj.group(1):", matchObj.group(1))
 print("matchObj.group(2):", matchObj.group(2))
 print("matchObj.group(3):", matchObj.group(3))
else:
 print('No match!\n')




得到的结果是：

matchObj.group(): Cats are smarter than dogs

matchObj.group(1): Cats 

matchObj.group(2): 

matchObj.group(3):  smarter than dogs



可见第二个括号里的内容被默认为空了，然后删去那个？，可以看到结果变成：

matchObj.group(): Cats are smarter than dogs

matchObj.group(1): Cats 

matchObj.group(2):  smarter than dogs

matchObj.group(3): 



那么这是否就意味着？的默认值很可能是0次，那？这个符号到底有什么用呢
仔细想来这个说法并不是很严谨。尝试使用单独的."htmlcode">

import re

line='Cats are smarter than dogs'
matchObj=re.match(r'(.*) are(.*)"matchObj.group():",matchObj.group())
 print("matchObj.group(1):", matchObj.group(1))
 print("matchObj.group(2):", matchObj.group(2))




也能在组别2中正常提取到are之后的字符内容，但稍微改动一下将？放到第二个括号内，
就什么也提取不到，同时导致group(0)中匹配的字符到Cats are就截止了（也就是第二个括号匹配失败）。
令人感到奇怪的是，如果将上面的代码改成


import re

line='Cats are smarter than dogs'
matchObj=re.match(r'(.*) are (.*)+',line)

if matchObj:
 print("matchObj.group():",matchObj.group())
 print("matchObj.group(1):", matchObj.group(1))
 print("matchObj.group(2):", matchObj.group(2))




也就是仅仅将？改为+，虽然能成功匹配整个line但group(2)中没有内容，
如果把+放到第二个括号中就会产生报错，匹配失败。
那么是否可以认为.*"htmlcode">

import re

line='Cats are smarter than dogs'
matchObj=re.match(r'(.*) are (.*r).*',line)

if matchObj:
 print("matchObj.group():",matchObj.group())
 print("matchObj.group(1):", matchObj.group(1))
 print("matchObj.group(2):", matchObj.group(2))
 #print("matchObj.group(3):", matchObj.group(3))
else:
 print('No match!\n')




为了泛用性尝试了一下把r改成‘ '但是得到的结果是‘smarter than '。于是尝试把.换成表示任意字母的
[a-zA-Z]，成功提取出了单个smarter，代码如下：


import re

line='Cats are smarter than dogs'
matchObj=re.match(r'(.*) are ([a-zA-Z]* ).*',line)

if matchObj:
 print("matchObj.group():",matchObj.group())
 print("matchObj.group(1):", matchObj.group(1))
 print("matchObj.group(2):", matchObj.group(2))
 #print("matchObj.group(3):", matchObj.group(3))
else:
 print('No match!\n')

上一篇：Python爬虫实现selenium处理iframe作用域问题
下一篇：python利用appium实现手机APP自动化的示例

友情链接:杰晶网络 DDR爱好者之家南强小屋黑松山资源网白云城资源网站点地图 SiteMap

Design by m.dyhadc.com