当前位置：首页 > news >正文

paddlespeech asr脚本demo

news 2026/3/30 18:26:41

概述

paddlespeech是百度飞桨平台的开源工具包，主要用于语音和音频的分析处理，其中包含多个可选模型，提供语音识别、语音合成、说话人验证、关键词识别、音频分类和语音翻译等功能。

本文介绍利用ps中的asr功能实现批量处理音频文件的demo。

环境

centos 7.9

Python 3.10.3

paddlepaddle 2.5.1

paddlespeech 1.4.1

demo代码

demo的代码如下，使用python3.10版本运行。

# -*- coding: utf-8 -*-

#required python3.10

###paddlespeech asr demo

# paddlespeech asr -y --lang zh --model conformer_wenetspeech --input $audiofile

### demo基本的业务流程

### 给定目录，扫描目录下的音频文件，对音频文件进行asr接口操作，写入对应的结果文件

import os

from paddlespeech.cli.asr.infer import ASRExecutor

import soundfile as sf

srcPath = r'/home/admin/test'

resultFile = r'/home/admin/test/asr-result-file.txt'

##打开结果文件

rfile = open(resultFile, 'a')

##获取asr对象

asr = ASRExecutor()

for filename in os.listdir(srcPath):

if filename.endswith('.wav') or filename.endswith('.mp3'):

audio_file_path = os.path.join(srcPath, filename)

##获取文件参数，计算音频长度

audio_data, sample_rate = sf.read(audio_file_path)

duration = len(audio_data) / sample_rate

##当前的asr接口不能处理超过50秒的音频文件，自动跳过

if duration >= 50:

resultStr = 'srcFile:{}, duration >= 50, skip.'.format(audio_file_path)

print(resultStr)

rfile.write(resultStr + '\n')

else:

result = asr(audio_file=audio_file_path, model='conformer_wenetspeech', lang='zh', force_yes='y')

print('srcFile:{}, asrResult:{}.'.format(audio_file_path, result))

rfile.write('srcFile:{}, asrResult:{}.\n'.format(audio_file_path, result))

rfile.close()

测试

demo的测试结果如下。

$ python3 ps-asr-demo.py

/usr/local/python3/lib/python3.10/site-packages/librosa/core/constantq.py:1059: DeprecationWarning: `np.complex` is a deprecated alias for the builtin `complex`. To silence this warning, use `complex` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.complex128` here.

Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations

dtype=np.complex,

2023-09-11 16:10:12.299 | INFO | paddlespeech.s2t.modules.embedding:__init__:150 - max len: 5000

/usr/local/python3/lib/python3.10/site-packages/paddle/fluid/dygraph/math_op_patch.py:275: UserWarning: The dtype of left and right variables are not the same, left dtype is paddle.int64, but right dtype is paddle.bool, the right dtype will convert to paddle.int64

warnings.warn(

srcFile:/home/admin/test/zh.wav, asrResult:我认为跑步最重要的就是给我带来了身体健康.

srcFile:/home/admin/test/en.wav, asrResult:那摘了的标准.

[2023-09-11 16:10:20,223] [ WARNING] - The sample rate of the input file is not 16000.

The program will resample the wav file to 16000.

If the result does not meet your expectations，

Please input the 16k 16 bit 1 channel wav file.

warnings.warn(

srcFile:/home/admin/test/output.wav, asrResult:你好欢迎使用百度非讲深度学习框架.

srcFile:/home/admin/test/test-long-file.mp3, duration >= 50, skip.

...

总结

ps的asr功能中有多个模型可选，目前测试中的“conformer_wenetspeech”识别准确率较高。

识别速度有待提高，音频长度的限制也待解决。

空空如常

求真得真

paddlespeech asr脚本demo

概述

环境

demo代码

测试

总结

相关文章：

paddlespeech asr脚本demo

算法分析与设计编程题递归与分治策略

Java的XWPFTemplate工具类导出word.docx的使用

Science adv | 转录因子SPIC连接胚胎干细胞中的细胞代谢与表观调控

机器学习实战-系列教程7：SVM分类实战2线性SVM（鸢尾花数据集/软间隔/线性SVM/非线性SVM/scikit-learn框架）项目实战、代码解读

DOM渲染与优化 - CSS、JS、DOM解析和渲染阻塞问题

基于小程序的理发店预约系统

MD5 算法流程

TCP/IP协议详解

SSM SpringBoot vue快递柜管理系统

期权交易保证金比例一般是多少？

029：vue项目，勾选后今天不再弹窗提示

Unet语义分割-语义分割与实例分割概述-001

Linux常用命令字典篇

__declspec(novtable) 在C++

ChatGPT充值，银行卡被拒绝

算法通过村第七关-树(递归/二叉树遍历)白银笔记|递归实战

抖音小程序开发教学系列（6）- 抖音小程序高级功能

SpringBoot运行原理

为什么Proteus串口无法正常显示

FPGA实战：单总线协议解析与DHT11温湿度数据采集

2026指纹浏览器与Web端设备识别技术的对抗与协同：从风控博弈到合规共生

Splitting.js终极指南：深度解析网页文本动画的魔法引擎

SDMatte数据库课程设计案例：电商商品图库智能管理系统

从Netfilter到IPVS：深入解析Linux内核负载均衡的实现与配置

如何使用Kubernetes Python Client实现安全策略：准入Webhook完整指南

用Qwen3-VL-30B做智能助手：上传文档图片，自动提取关键信息

GLM-4v-9b行业落地：跨境电商商品图多语言描述生成自动化方案

StructBERT在代码仓库管理中的重复代码检测应用

如何7天免费使用Cursor Pro：无限制AI编程助手完整指南