当前位置：首页 > news >正文

使用 PyAudio、语音识别、pyttsx3 和 SerpApi 构建简单的基于 CLI 的语音助手

news 2026/2/9 15:54:05

一、介绍

正如您从标题中看到的，这是一个演示项目，显示了一个非常基本的语音助手脚本，可以根据 Google 搜索结果在终端中回答您的问题。

您可以在 GitHub 存储库中找到完整代码：dimitryzub/serpapi-demo-projects/speech-recognition/cli-based/

后续博客文章将涉及：

使用Flask、一些 HTML、CSS 和 Javascript 的基于 Web 的解决方案。
使用Flutter和Dart的基于 Android 和 Windows 的解决方案。

二、我们将在这篇博文中构建什么

2.1 环境准备

首先，让我们确保我们处于不同的环境中，并正确安装项目所需的库。最难（可能）是安装 .pyaudio，关于此种困难可以参看下文克服：

[解决]修复 win 32/64 位操作系统上的 PyAudio pip 安装错误

2.2 虚拟环境和库安装

在开始安装库之前，我们需要为此项目创建并激活一个新环境：

# if you're on Linux based systems
$ python -m venv env && source env/bin/activate
$ (env) <path># if you're on Windows and using Bash terminal
$ python -m venv env && source env/Scripts/activate
$ (env) <path># if you're on Windows and using CMD
python -m venv env && .\env\Scripts\activate
$ (env) <path>

解释python -m venv env告诉 Python 运行 module( -m)venv并创建一个名为的文件夹env。&&代表“与”。source <venv_name>/bin/activate将激活您的环境，并且您将只能在该环境中安装库。

现在安装所有需要的库：

pip install rich pyttsx3 SpeechRecognition google-search-results

现在到pyaudio. 请记住，pyaudio安装时可能会引发错误。您可能需要进行额外的研究。

如果您使用的是 Linux，我们需要安装一些开发依赖项才能使用pyaudio：

$ sudo apt-get install -y libasound-dev portaudio19-dev
$ pip install pyaudio

如果您使用的是 Windows，则更简单（使用 CMD 和 Git Bash 进行测试）：

pip install pyaudio

三、完整代码

import os
import speech_recognition
import pyttsx3
from serpapi import GoogleSearch
from rich.console import Console
from dotenv import load_dotenvload_dotenv('.env')
console = Console()def main():console.rule('[bold yellow]SerpApi Voice Assistant Demo Project')recognizer = speech_recognition.Recognizer()while True:with console.status(status='Listening you...', spinner='point') as progress_bar:try:with speech_recognition.Microphone() as mic:recognizer.adjust_for_ambient_noise(mic, duration=0.1)audio = recognizer.listen(mic)text = recognizer.recognize_google(audio_data=audio).lower()console.print(f'[bold]Recognized text[/bold]: {text}')progress_bar.update(status='Looking for answers...', spinner='line')params = {'api_key': os.getenv('API_KEY'),'device': 'desktop','engine': 'google','q': text,'google_domain': 'google.com','gl': 'us','hl': 'en'}search = GoogleSearch(params)results = search.get_dict()try:if 'answer_box' in results:try:primary_answer = results['answer_box']['answer']except:primary_answer = results['answer_box']['result']console.print(f'[bold]The answer is[/bold]: {primary_answer}')elif 'knowledge_graph' in results:secondary_answer = results['knowledge_graph']['description']console.print(f'[bold]The answer is[/bold]: {secondary_answer}')else:tertiary_answer = results['answer_box']['list']console.print(f'[bold]The answer is[/bold]: {tertiary_answer}')progress_bar.stop() # if answered is success -> stop progress bar.user_promnt_to_contiune_if_answer_is_success = input('Would you like to to search for something again? (y/n) ')if user_promnt_to_contiune_if_answer_is_success == 'y':recognizer = speech_recognition.Recognizer()continue # run speech recognizion again until `user_promt` == 'n'else:console.rule('[bold yellow]Thank you for cheking SerpApi Voice Assistant Demo Project')breakexcept KeyError:progress_bar.stop()error_user_promt = input("Sorry, didn't found the answer. Would you like to rephrase it? (y/n) ")if error_user_promt == 'y':recognizer = speech_recognition.Recognizer()continue # run speech recognizion again until `user_promt` == 'n'else:console.rule('[bold yellow]Thank you for cheking SerpApi Voice Assistant Demo Project')breakexcept speech_recognition.UnknownValueError:progress_bar.stop()user_promt_to_continue = input('Sorry, not quite understood you. Could say it again? (y/n) ')if user_promt_to_continue == 'y':recognizer = speech_recognition.Recognizer()continue # run speech recognizion again until `user_promt` == 'n'else:progress_bar.stop()console.rule('[bold yellow]Thank you for cheking SerpApi Voice Assistant Demo Project')breakif __name__ == '__main__':main()

四、代码说明

导入库：

import os
import speech_recognition
import pyttsx3
from serpapi import GoogleSearch
from rich.console import Console
from dotenv import load_dotenv

rich用于在终端中进行漂亮格式化的 Python 库。
pyttsx3Python 的文本到语音转换器可离线工作。
SpeechRecognition用于将语音转换为文本的 Python 库。
google-search-resultsSerpApi 的 Python API 包装器，可解析来自 15 个以上搜索引擎的数据。
os读取秘密环境变量。在本例中，它是 SerpApi API 密钥。
dotenv从文件加载环境变量（SerpApi API 密钥）.env。.env文件可以重命名为任何文件：（.napoleon .点）代表环境变量文件。

定义rich Console(). 它将用于美化终端输出（动画等）：

console = Console()

定义main所有发生的函数：

def main():console.rule('[bold yellow]SerpApi Voice Assistant Demo Project')recognizer = speech_recognition.Recognizer()

在函数的开头，我们定义speech_recognition.Recognizer()并将console.rule创建以下输出：

───────────────────────────────────── SerpApi Voice Assistant Demo Project ─────────────────────────────────────

下一步是创建一个 while 循环，该循环将不断监听麦克风输入以识别语音：

while True:with console.status(status='Listening you...', spinner='point') as progress_bar:try:with speech_recognition.Microphone() as mic:recognizer.adjust_for_ambient_noise(mic, duration=0.1)audio = recognizer.listen(mic)text = recognizer.recognize_google(audio_data=audio).lower()console.print(f'[bold]Recognized text[/bold]: {text}')

console.status-rich进度条，仅用于装饰目的。
speech_recognition.Microphone()开始从麦克风拾取输入。
recognizer.adjust_for_ambient_noise旨在根据环境能量水平校准能量阈值。
recognizer.listen监听实际的用户文本。
recognizer.recognize_google使用 Google Speech Recongition API 执行语音识别。lower()是降低识别文本。
console.print允许使用文本修改的语句rich print，例如添加粗体、斜体等。

spinner='point'将产生以下输出（使用python -m rich.spinner查看列表spinners）：

之后，我们需要初始化 SerpApi 搜索参数以进行搜索：

progress_bar.update(status='Looking for answers...', spinner='line') 
params = {'api_key': os.getenv('API_KEY'),  # serpapi api key   'device': 'desktop',              # device used for 'engine': 'google',               # serpapi parsing engine: https://serpapi.com/status'q': text,                        # search query 'google_domain': 'google.com',    # google domain:          https://serpapi.com/google-domains'gl': 'us',                       # country of the search:  https://serpapi.com/google-countries'hl': 'en'                        # language of the search: https://serpapi.com/google-languages# other parameters such as locations: https://serpapi.com/locations-api
}
search = GoogleSearch(params)         # where data extraction happens on the SerpApi backend
results = search.get_dict()           # JSON -> Python dict

progress_bar.update将会progress_bar用新的status（控制台中打印的文本）进行更新，spinner='line'并将产生以下动画：

之后，使用 SerpApi 的Google 搜索引擎 API从 Google 搜索中提取数据。

代码的以下部分将执行以下操作：

try:if 'answer_box' in results:try:primary_answer = results['answer_box']['answer']except:primary_answer = results['answer_box']['result']console.print(f'[bold]The answer is[/bold]: {primary_answer}')elif 'knowledge_graph' in results:secondary_answer = results['knowledge_graph']['description']console.print(f'[bold]The answer is[/bold]: {secondary_answer}')else:tertiary_answer = results['answer_box']['list']console.print(f'[bold]The answer is[/bold]: {tertiary_answer}')progress_bar.stop()  # if answered is success -> stop progress baruser_promnt_to_contiune_if_answer_is_success = input('Would you like to to search for something again? (y/n) ')if user_promnt_to_contiune_if_answer_is_success == 'y':recognizer = speech_recognition.Recognizer()continue         # run speech recognizion again until `user_promt` == 'n'else:console.rule('[bold yellow]Thank you for cheking SerpApi Voice Assistant Demo Project')breakexcept KeyError:progress_bar.stop()  # if didn't found the answer -> stop progress barerror_user_promt = input("Sorry, didn't found the answer. Would you like to rephrase it? (y/n) ")if error_user_promt == 'y':recognizer = speech_recognition.Recognizer()continue         # run speech recognizion again until `user_promt` == 'n'else:console.rule('[bold yellow]Thank you for cheking SerpApi Voice Assistant Demo Project')break

最后一步是处理麦克风没有拾取声音时的错误：

# while True:
#     with console.status(status='Listening you...', spinner='point') as progress_bar:
#         try:# speech recognition code# data extraction codeexcept speech_recognition.UnknownValueError:progress_bar.stop()         # if didn't heard the speech -> stop progress baruser_promt_to_continue = input('Sorry, not quite understood you. Could say it again? (y/n) ')if user_promt_to_continue == 'y':recognizer = speech_recognition.Recognizer()continue               # run speech recognizion again until `user_promt` == 'n'else:progress_bar.stop()    # if want to quit -> stop progress barconsole.rule('[bold yellow]Thank you for cheking SerpApi Voice Assistant Demo Project')break

console.rule()将提供以下输出：

───────────────────── Thank you for cheking SerpApi Voice Assistant Demo Project ──────────────────────

添加if __name__ == '__main__'惯用语，以防止用户在无意时意外调用某些脚本，并调用main将运行整个脚本的函数：

if __name__ == '__main__':main()

五、链接

rich
pyttsx3
SpeechRecognition
google-search-results
os
dotenv

使用 PyAudio、语音识别、pyttsx3 和 SerpApi 构建简单的基于 CLI 的语音助手

德米特里祖布☀️ 一、介绍正如您从标题中看到的，这是一个演示项目，显示了一个非常基本的语音助手脚本，可以根据 Google 搜索结果在终端中回答您的问题。您可以在 GitHub 存储库中找到完整代码：dimitryzub/serpapi-demo-project…...

编程日记 2023/10/23 0:51:26

C++11——多线程

目录一.thread类的简单介绍二.线程函数参数三.原子性操作库(atomic) 四.lock_guard与unique_lock 1.lock_guard 2.unique_lock 五.条件变量一.thread类的简单介绍在C11之前，涉及到多线程问题，都是和平台相关的，比如windows和linu…...

编程日记 2023/10/23 0:50:25

力扣每日一题48：旋转图像

题目描述： 给定一个 n n 的二维矩阵 matrix 表示一个图像。请你将图像顺时针旋转 90 度。你必须在原地旋转图像，这意味着你需要直接修改输入的二维矩阵。请不要使用另一个矩阵来旋转图像。示例 1： 输入：matrix [[1,2,3],…...

编程日记 2023/10/23 0:49:23

操作系统——吸烟者问题（王道视频p34、课本ch6）

1.问题分析：这个问题可以看作是可以生产多种产品的单生产者-多消费者问题 2.代码——这里就是由于同步信号量的初值都是1，所以没有使用mutex互斥信号， 总共4个同步信号量，其中一个是 finish信号量...

编程日记 2023/10/23 0:48:22

通讯协议学习之路：CAN协议理论

通讯协议之路主要分为两部分，第一部分从理论上面讲解各类协议的通讯原理以及通讯格式，第二部分从具体运用上讲解各类通讯协议的具体应用方法。后续文章会同时发表在个人博客(jason1016.club)、CSDN；视频会发布在bilibili(UID:399951374) 序、…...

编程日记 2023/10/23 0:46:20

Redis常用配置详解

目录一、Redis查看当前配置命令二、Redis基本配置三、RDB全量持久化配置（默认开启）四、AOF增量持久化配置五、Redis key过期监听配置六、Redis内存淘汰策略七、总结一、Redis查看当前配置命令 # Redis查看当前全部配置信息 127.0.0.1:6379> CONFIG…...

编程日记 2023/10/23 0:44:18

目录 1.函数签名:2.学习中的疑问3.代码 1.函数签名: torch.nn.MaxPool2d(kernel_size, strideNone, padding0, dilation1, return_indicesFalse, ceil_modeFalse) 2.学习中的疑问 Q:使用MaxPool2D池化时,当卷积核移动到某位置,该卷积核覆盖区域超过了输入尺寸时,MaxPool2D会…...

编程日记 2023/10/23 0:43:17

基于图像字典学习的去噪技术研究与实践

图像去噪是计算机视觉领域的一个重要研究方向，其目标是从受到噪声干扰的图像中恢复出干净的原始图像。字典学习是一种常用的图像去噪方法，它通过学习图像的稀疏表示字典，从而实现对图像的去噪处理。本文将详细介绍基于字典学习的图像去噪技术…...

编程日记 2023/10/23 0:42:16

记一次Clickhouse 复制表同步延迟排查

现象数据从集群中一个节点写入之后，其他两个节点无法及时查询到数据，等了几分钟。因为我们ck集群是读写分离架构，也就是一个节点写数据，其他节点供读取。排查思路从业务得知，数据更新时间点为：11:30。…...

编程日记 2023/10/23 0:41:15

Maven的详细安装步骤说明

Step 1: 下载Maven 首先，您需要从Maven官方网站（https://maven.apache.org/）下载Maven的最新版本。在下载页面上，找到与您操作系统对应的二进制文件（通常是.zip或.tar.gz格式），下载到本地。 St…...

编程日记 2023/10/23 0:40:14

金融机器学习方法：K-均值算法

目录 1.算法介绍 2.算法原理 3.python实现示例 1.算法介绍 K均值聚类算法是机器学习和数据分析中常用的无监督学习方法之一，主要用于数据的分类。它的目标是将数据划分为几个独特的、互不重叠的子集或“集群”，以使得同一集群内的数据点彼此相似&…...

编程日记 2023/10/23 0:38:12

移远通信携手MIKROE推出搭载LC29H系列模组的Click boards开发板，为物联网应用带来高精定位服务

近日，移远通信与MikroElektronika（以下简称“MIKROE”）展开合作，基于移远LC29H系列模组推出了多款支持实时动态载波相位差分技术（RTK）和惯性导航（DR）技术的Click Boards™ 开发板&am…...

编程日记 2023/10/23 0:37:11

Spring Cloud 之 Sentinel简介与GATEWAY整合实现

简介随着微服务的流行，服务和服务之间的稳定性变得越来越重要。Sentinel 是面向分布式服务架构的流量控制组件，主要以流量为切入点，从限流、流量整形、熔断降级、系统负载保护、热点防护等多个维度来帮助开发者保障微服务的稳定性。熔断 …...

编程日记 2023/10/23 0:35:09

搭建网站七牛云CDN加速配置

打开七牛云后台；添加域名； 添加需要加速的域名，比如我添加的是motoshare.cn 源站配置，这里要用IP地址，访问的目录下面要有能访问测试的文件，尽量不要用源站域名，这个只能用加速二级域名&#x…...

编程日记 2023/10/23 0:34:08

算法|每日一题|做菜顺序|贪心

1402. 做菜顺序原题地址： 力扣每日一题：做菜顺序一个厨师收集了他 n 道菜的满意程度 satisfaction ，这个厨师做出每道菜的时间都是 1 单位时间。一道菜的「 like-time 系数」定义为烹饪这道菜结束的时间（包含之前每道菜所花…...

编程日记 2023/10/23 0:33:08

json-server工具准备后端接口服务环境

1.安装全局工具json-server（全局工具仅需要安装一次） 官网：json-server - npm 点击Getting started可以查看使用方法在终端中输入yarn global add json-server或npm i json-server -g 如果输入json-server -v报错再输入npm install -g j…...

编程日记 2023/10/23 0:32:07

文心一言 VS 讯飞星火 VS chatgpt （118）-- 算法导论10.3 3题

三、用go语言，在ALLOCATE-OBJECT和 FREE-OBJECT过程的实现中，为什么不需要设置或重置对象的 prev 属性呢? 文心一言： 在 Go 语言中，ALLOCATE-OBJECT 和 FREE-OBJECT 过程的实现通常不需要显式地设置或重置对象的 prev 属性。这…...

编程日记 2023/10/23 0:31:06

numpy矩阵画框框

在n>5(n是奇数)的nn数组中，用*画外方框和内接菱形。 (本笔记适合熟悉numpy的coder翻阅) 【学习的细节是欢悦的历程】 Python 官网：https://www.python.org/ Free：大咖免费“圣经”教程《 python 完全自学教程》，不仅仅是基础那…...

编程日记 2023/10/23 0:30:05

三十六、【进阶】show profiles分析

1、profiles （1）详情可以帮助清楚的展现，每一条SQL语句的执行耗时，以及时间都耗费到哪里去了 （2）基础语句 2、查看是否支持profiles mysql> select have_profiling; ------------------ | have_prof…...

编程日记 2023/10/23 0:28:03

商品规格项数据的遍历以及添加

简介今天在处理规格项的数据时遇到了一些问题，接下来就给大家分享一下规格项数据设计 "specifications": [{"goodsSpecificationId": 6,"goodsSpecificationName": "网络类型","goodsTypeId": 24,"goods…...

编程日记 2023/10/23 0:27:02

SpringBoot-17-MyBatis动态SQL标签之常用标签

文章目录 1 代码1.1 实体User.java1.2 接口UserMapper.java1.3 映射UserMapper.xml1.3.1 标签if1.3.2 标签if和where1.3.3 标签choose和when和otherwise1.4 UserController.java2 常用动态SQL标签2.1 标签set2.1.1 UserMapper.java2.1.2 UserMapper.xml2.1.3 UserController.ja…...

编程新知 2026/2/7 23:18:39

遍历 Map 类型集合的方法汇总

1 方法一先用方法 keySet() 获取集合中的所有键。再通过 gey(key) 方法用对应键获取值 import java.util.HashMap; import java.util.Set;public class Test {public static void main(String[] args) {HashMap hashMap new HashMap();hashMap.put("语文",99);has…...

编程新知 2026/1/24 15:08:45

MODBUS TCP转CANopen 技术赋能高效协同作业

在现代工业自动化领域，MODBUS TCP和CANopen两种通讯协议因其稳定性和高效性被广泛应用于各种设备和系统中。而随着科技的不断进步，这两种通讯协议也正在被逐步融合，形成了一种新型的通讯方式——开疆智能MODBUS TCP转CANopen网关KJ-TCPC-CANP…...

编程新知 2026/2/6 9:48:02

Python爬虫（一）：爬虫伪装

一、网站防爬机制概述在当今互联网环境中，具有一定规模或盈利性质的网站几乎都实施了各种防爬措施。这些措施主要分为两大类： 身份验证机制：直接将未经授权的爬虫阻挡在外反爬技术体系：通过各种技术手段增加爬虫获取数据的难度…...

编程新知 2026/1/30 20:42:46

【配置 YOLOX 用于按目录分类的图片数据集】

现在的图标点选越来越多，如何一步解决，采用 YOLOX 目标检测模式则可以轻松解决要在 YOLOX 中使用按目录分类的图片数据集（每个目录代表一个类别，目录下是该类别的所有图片），你需要进行以下配置步骤&#x…...

编程新知 2026/1/30 4:52:56

Mac下Android Studio扫描根目录卡死问题记录

环境信息操作系统: macOS 15.5 (Apple M2芯片)Android Studio版本: Meerkat Feature Drop | 2024.3.2 Patch 1 (Build #AI-243.26053.27.2432.13536105, 2025年5月22日构建) 问题现象在项目开发过程中，提示一个依赖外部头文件的cpp源文件需要同步，点…...

编程新知 2026/2/6 1:00:00

Angular微前端架构：Module Federation + ngx-build-plus (Webpack)

以下是一个完整的 Angular 微前端示例，其中使用的是 Module Federation 和 npx-build-plus 实现了主应用（Shell）与子应用（Remote）的集成。 🛠️ 项目结构 angular-mf/ ├── shell-app/ # 主应用&…...

编程新知 2025/12/3 10:52:37

Android第十三次面试总结（四大组件基础）

Activity生命周期和四大启动模式详解一、Activity 生命周期 Activity 的生命周期由一系列回调方法组成，用于管理其创建、可见性、焦点和销毁过程。以下是核心方法及其调用时机： onCreate() 调用时机：Activity 首次创建时调用。…...

编程新知 2025/10/15 15:07:34

NXP S32K146 T-Box 携手 SD NAND（贴片式TF卡）：驱动汽车智能革新的黄金组合

在汽车智能化的汹涌浪潮中，车辆不再仅仅是传统的交通工具，而是逐步演变为高度智能的移动终端。这一转变的核心支撑，来自于车内关键技术的深度融合与协同创新。车载远程信息处理盒（T-Box）方案：NXP S32K146 与…...

编程新知 2026/1/24 14:43:35

JVM 内存结构详解

内存结构运行时数据区： Java虚拟机在运行Java程序过程中管理的内存区域。程序计数器： 线程私有，程序控制流的指示器，分支、循环、跳转、异常处理、线程恢复等基础功能都依赖这个计数器完成。每个线程都有一个程序计数…...

编程新知 2026/2/6 3:24:58

使用 PyAudio、语音识别、pyttsx3 和 SerpApi 构建简单的基于 CLI 的语音助手

一、介绍

二、我们将在这篇博文中构建什么

2.1 环境准备

2.2 虚拟环境和库安装

三、完整代码

四、代码说明

五、链接

相关文章：

使用 PyAudio、语音识别、pyttsx3 和 SerpApi 构建简单的基于 CLI 的语音助手

C++11——多线程

力扣每日一题48：旋转图像

操作系统——吸烟者问题（王道视频p34、课本ch6）

通讯协议学习之路：CAN协议理论

Redis常用配置详解

卷积神经网络CNN学习笔记-MaxPool2D函数解析

基于图像字典学习的去噪技术研究与实践

记一次Clickhouse 复制表同步延迟排查

Maven的详细安装步骤说明

金融机器学习方法：K-均值算法

移远通信携手MIKROE推出搭载LC29H系列模组的Click boards开发板，为物联网应用带来高精定位服务

Spring Cloud 之 Sentinel简介与GATEWAY整合实现

搭建网站七牛云CDN加速配置

算法|每日一题|做菜顺序|贪心

json-server工具准备后端接口服务环境

文心一言 VS 讯飞星火 VS chatgpt （118）-- 算法导论10.3 3题

numpy矩阵画框框

三十六、【进阶】show profiles分析

商品规格项数据的遍历以及添加

SpringBoot-17-MyBatis动态SQL标签之常用标签

遍历 Map 类型集合的方法汇总

MODBUS TCP转CANopen 技术赋能高效协同作业

Python爬虫（一）：爬虫伪装

【配置 YOLOX 用于按目录分类的图片数据集】

Mac下Android Studio扫描根目录卡死问题记录

Angular微前端架构：Module Federation + ngx-build-plus (Webpack)

Android第十三次面试总结（四大组件基础）

NXP S32K146 T-Box 携手 SD NAND（贴片式TF卡）：驱动汽车智能革新的黄金组合

JVM 内存结构详解