当前位置：首页 > news >正文

Kaggle Python练习：字符串和字典（Exercise: Strings and Dictionaries）

news 2026/6/2 0:18:22

文章目录

- 问题：搜索特定单词并定位
- 思路
- - 代码实现
  - 官方代码
  - 代码解析
- 更进一步

问题：搜索特定单词并定位

一位研究人员收集了数千篇新闻文章。但她想将注意力集中在包含特定单词的文章上。完成以下功能以帮助她过滤文章列表。

您的函数应满足以下条件：

不要包含关键字字符串仅作为较大单词的一部分出现的文档。例如，如果她正在查找关键字“close”，则您不会包含字符串“enlined”。
她不希望你区分大小写字母。所以这句话“结案了”。当关键字“关闭”时将被包含
不要让句号或逗号影响匹配的内容。 “已经关门了。”当关键字为“close”时将被包含。但您可以假设没有其他类型的标点符号

思路

读取列表中的字符串并转为小写
去除两边的干扰符号",.?"，使用strip()函数
将中间的逗号替换为空格使用split()函数划分为单词
然后将划分出的单词与keyword进行比对，如果在则在空列表中保存索引
返回结果列表

# doc_list = ["The Learn Python Challenge Casino.", "They bought a car", "Casinoville"]
doc_list=['The Learn Python Challenge Casino', 'They bought a car, and a horse', 'Casinoville?']
keyword = 'Casino'
list = []
l = len(doc_list)
for i in range(l):words = doc_list[i].lower()print(words)words = words.strip('.,?')print(words)wordlist = words.replace(",","").split()print(wordlist)for word in wordlist:if word == keyword.lower():list.append(i)print(i)
#         if keyword in wordlist:
#             print(i)
print(list)

在这里插入图片描述

代码实现

def word_search(doc_list, keyword):"""Takes a list of documents (each document is a string) and a keyword. Returns list of the index values into the original list for all documents containing the keyword.Example:doc_list = ["The Learn Python Challenge Casino.", "They bought a car", "Casinoville"]>>> word_search(doc_list, 'casino')>>> [0]"""list = []l = len(doc_list)for i in range(l):words = doc_list[i].lower()words = words.strip(',.?')wordlist = words.replace(",","").split()for word in wordlist:if word == keyword:list.append(i)breakreturn list

官方代码

def word_search(doc_list, keyword):# list to hold the indices of matching documentsindices = [] # Iterate through the indices (i) and elements (doc) of documentsfor i, doc in enumerate(doc_list):# Split the string doc into a list of words (according to whitespace)tokens = doc.split()# Make a transformed list where we 'normalize' each word to facilitate matching.# Periods and commas are removed from the end of each word, and it's set to all lowercase.normalized = [token.rstrip('.,').lower() for token in tokens]# Is there a match? If so, update the list of matching indices.if keyword.lower() in normalized:indices.append(i)return indices

代码解析

enumerate() 是 Python 的一个内置函数，用于为可迭代对象（如列表、元组或字符串）提供一个自动计数器，同时遍历该对象。它返回一个包含索引和值的迭代器，常用于 for 循环中。
enumerate(iterable, start=0)

iterable：任何可以遍历的对象，如列表、字符串等。
start（可选）：计数的起始值，默认为 0，也可以指定其他起始值。
enumerate() 返回一个迭代器对象，每次迭代返回一个元组，包含当前元素的索引和元素值。
向字典中添加键值对（元素对）
dictionary[key] = value
key：表示字典的键。
value：表示该键对应的值。

str.split() 方法用于根据指定的分隔符将字符串拆分为子字符串列表。默认情况下，分隔符是任意的空白字符（空格、制表符或换行符）
string.split(separator, maxsplit)
separator（可选）：指定的分隔符字符串。如果没有提供，字符串会按空白字符进行拆分。
maxsplit（可选）：指定最大拆分次数。默认值是 -1，表示不限制拆分次数。

str.rstrip() 是 Python 中的一个字符串方法，用于删除字符串末尾的指定字符（默认为空白字符）。
string.rstrip([chars])
chars（可选）：指定要移除的字符序列。如果没有提供，默认会移除末尾的所有空白字符（包括空格、换行符、制表符等）。

str.strip() 是 Python 中用于删除字符串两端（开头和结尾）指定字符（默认为空白字符）的一个方法。它可以同时移除字符串开头和末尾的字符。
string.strip([chars])
chars（可选）：指定要移除的字符序列。如果没有提供，默认会移除两端的所有空白字符（如空格、换行符、制表符等）。
result = text.strip(“，。？”) # 删除两端的 ‘，’、‘。’、‘？’

更进一步

现在研究人员想要提供多个关键字进行搜索。完成下面的函数来帮助她。

（我们鼓励您在实现此函数时使用刚刚编写的word_search函数。以这种方式重用代码可以使您的程序更加健壮和可读 - 并且可以节省打字！）
1、在里面改写函数，使用循环对多个keywords进行判断

def multi_word_search(doc_list, keywords):"""Takes list of documents (each document is a string) and a list of keywords.  Returns a dictionary where each key is a keyword, and the value is a list of indices(from doc_list) of the documents containing that keyword>>> doc_list = ["The Learn Python Challenge Casino.", "They bought a car and a casino", "Casinoville"]>>> keywords = ['casino', 'they']>>> multi_word_search(doc_list, keywords){'casino': [0, 1], 'they': [1]}"""# list to hold the indices of matching documents
#     indices = []dictionary = {}for keyword in keywords:indices = []# Iterate through the indices (i) and elements (doc) of documentsfor i, doc in enumerate(doc_list):# Split the string doc into a list of words (according to whitespace)tokens = doc.split()# Make a transformed list where we 'normalize' each word to facilitate matching.# Periods and commas are removed from the end of each word, and it's set to all lowercase.normalized = [token.rstrip('.,').lower() for token in tokens]# Is there a match? If so, update the list of matching indices.if keyword.lower() in normalized:indices.append(i)dictionary[keyword] = indicesreturn dictionary# Check your answer
q3.check()

2、直接调用前面已经实现的函数word_search(doc_list, keyword)

def multi_word_search(doc_list, keywords):"""Takes list of documents (each document is a string) and a list of keywords.  Returns a dictionary where each key is a keyword, and the value is a list of indices(from doc_list) of the documents containing that keyword>>> doc_list = ["The Learn Python Challenge Casino.", "They bought a car and a casino", "Casinoville"]>>> keywords = ['casino', 'they']>>> multi_word_search(doc_list, keywords){'casino': [0, 1], 'they': [1]}"""keyword_to_indices = {}for keyword in keywords:keyword_to_indices[keyword] = word_search(doc_list, keyword)return keyword_to_indices

Kaggle Python练习：字符串和字典（Exercise: Strings and Dictionaries）

文章目录

问题：搜索特定单词并定位

思路

代码实现

官方代码

代码解析

更进一步

相关文章：

Kaggle Python练习：字符串和字典（Exercise: Strings and Dictionaries）

React(四) 事件总线，setState的原理，PureComponent优化React性能,ref获取类组件与函数组件

Java学习-JVM

leed认证分几个级别

3.C++经典实例-计算一个数的阶乘

深入理解Qt中的QTableView、Model与Delegate机制

解读《ARM Cortex-M3 与Cortex-M4 权威指南》——第1章 ARM Cortex-M处理器简介

java集合类的框架体系

基于SpringBoot+Vue+Uniapp家具购物小程序的设计与实现

什么是模糊测试？

3.C++经典实例-奇数还是偶数

真牛啊！全球人工智能标准教科书，斯坦福、麻省理工、加州大学等十多所顶尖机构为它点赞！！

Android——通过MediaStore查询图片

手写Spring IOC-简易版

【算法题】62. 不同路径(LeetCode)

【VUE】Vue中的data属性为什么是一个函数而不是一个对象

ddos攻击介绍和排查方法

git clone --single-branch 提升效率

代码随想录算法训练营第十天|1. 两数之和，第454题.四数相加II

龙迅LT8911EX LVDS转EDP 点屏，大批量出货产品

Ventoy终极指南：一个U盘启动所有系统，告别重复格式化烦恼 [特殊字符]

SMUDebugTool终极指南：如何深度掌控AMD Ryzen处理器的隐藏性能

WarcraftHelper终极指南：魔兽争霸3兼容性问题一站式解决方案

淘宝淘金币自动化脚本终极指南：如何每天节省25分钟实现智能任务管理

如何扩展GASShooter：添加新武器、新能力与新游戏机制的终极指南

Taotoken用量看板功能详解，助你洞察团队AI资源消耗模式

别再纠结了！给激光焊接新手讲透单模和多模激光到底怎么选（附M²因子解读）

别只盯着主控芯片！拆解STM32最小系统板：电源、时钟、复位三大支柱电路深度解析

茉莉花插件：如何让中文文献管理效率提升300%

如何用OpenHRMS打造企业级人力资源管理系统：30+模块完全指南