当前位置：首页 > news >正文

[工具]-ffmpeg-笔记

news 2026/2/11 0:41:49

朋友有一个需求，将视频文件转化为音频文件、音频文件获取音频转化为文本文件。

思路：通过ffmpeg转化视频为音频，通过百度ai提供的voice_t_text接口提取语音文本，但是需要将音频分割成1分钟内的pcm编码，采样率16000的小文件。关键过程如下：

配置

1、将ffmpeg, ffplay, ffprobe拷贝到文件夹下，设置环境变量。

2、安装ffmpeg-python

pip install ffmpeg-python

#调用
import ffmpeg

3、这个库的本质还是调用以上3个工具的命令行执行，如下获取视频或者音频的信息，返回Json,源代码如下：

def probe(filename, cmd='ffprobe', **kwargs):"""Run ffprobe on the specified file and return a JSON representation of the output.Raises::class:`ffmpeg.Error`: if ffprobe returns a non-zero exit code,an :class:`Error` is returned with a generic error message.The stderr output can be retrieved by accessing the``stderr`` property of the exception."""args = [cmd, '-show_format', '-show_streams', '-of', 'json']args += convert_kwargs_to_cmd_line_args(kwargs)args += [filename]p = subprocess.Popen(args, stdout=subprocess.PIPE, stderr=subprocess.PIPE)out, err = p.communicate()if p.returncode != 0:raise Error('ffprobe', out, err)return json.loads(out.decode('utf-8'))

等价于在命令行里执行命令：

ffprobe -v quiet -print_format json -show_format -show_streams xdhyxl.mp3
{"streams": [{"index": 0,"codec_name": "mp3","codec_long_name": "MP3 (MPEG audio layer 3)","codec_type": "audio","codec_tag_string": "[0][0][0][0]","codec_tag": "0x0000","sample_fmt": "fltp","sample_rate": "44100","channels": 2,"channel_layout": "stereo","bits_per_sample": 0,"initial_padding": 0,"r_frame_rate": "0/0","avg_frame_rate": "0/0","time_base": "1/14112000","start_pts": 353600,"start_time": "0.025057","duration_ts": 37154488320,"duration": "2632.829388","bit_rate": "128000","disposition": {"default": 0,"dub": 0,"original": 0,"comment": 0,"lyrics": 0,"karaoke": 0,"forced": 0,"hearing_impaired": 0,"visual_impaired": 0,"clean_effects": 0,"attached_pic": 0,"timed_thumbnails": 0,"non_diegetic": 0,"captions": 0,"descriptions": 0,"metadata": 0,"dependent": 0,"still_image": 0,"multilayer": 0},"tags": {"encoder": "Lavf"}}],"format": {"filename": "xdhyxl.mp3","nb_streams": 1,"nb_programs": 0,"nb_stream_groups": 0,"format_name": "mp3","format_long_name": "MP2/3 (MPEG audio layer 2/3)","start_time": "0.025057","duration": "2632.829388","size": "42125732","bit_rate": "128001","probe_score": 51,"tags": {"encoder": "Lavf57.83.100"}}
}

参数说明

1、音频参数说明

-aframes number (output)
Set the number of audio frames to output. This is an obsolete alias for -frames:a, which you should use instead.-ar[:stream_specifier] freq (input/output,per-stream)
Set the audio sampling frequency. For output streams it is set by default to the frequency of the corresponding input stream. For input streams this option only makes sense for audio grabbing devices and raw demuxers and is mapped to the corresponding demuxer options.-aq q (output)
Set the audio quality (codec-specific, VBR). This is an alias for -q:a.-ac[:stream_specifier] channels (input/output,per-stream)
Set the number of audio channels. For output streams it is set by default to the number of input audio channels. For input streams this option only makes sense for audio grabbing devices and raw demuxers and is mapped to the corresponding demuxer options.-an (input/output)
As an input option, blocks all audio streams of a file from being filtered or being automatically selected or mapped for any output. See -discard option to disable streams individually.As an output option, disables audio recording i.e. automatic selection or mapping of any audio stream. For full manual control see the -map option.-acodec codec (input/output)
Set the audio codec. This is an alias for -codec:a.-sample_fmt[:stream_specifier] sample_fmt (output,per-stream)
Set the audio sample format. Use -sample_fmts to get a list of supported sample formats.-af filtergraph (output)
Create the filtergraph specified by filtergraph and use it to filter the stream.This is an alias for -filter:a, see the -filter option.

2、视频参数说明

-vframes number (output)
Set the number of video frames to output. This is an obsolete alias for -frames:v, which you should use instead.-r[:stream_specifier] fps (input/output,per-stream)
Set frame rate (Hz value, fraction or abbreviation).As an input option, ignore any timestamps stored in the file and instead generate timestamps assuming constant frame rate fps. This is not the same as the -framerate option used for some input formats like image2 or v4l2 (it used to be the same in older versions of FFmpeg). If in doubt use -framerate instead of the input option -r.As an output option:video encoding
Duplicate or drop frames right before encoding them to achieve constant output frame rate fps.video streamcopy
Indicate to the muxer that fps is the stream frame rate. No data is dropped or duplicated in this case. This may produce invalid files if fps does not match the actual stream frame rate as determined by packet timestamps. See also the setts bitstream filter.-fpsmax[:stream_specifier] fps (output,per-stream)
Set maximum frame rate (Hz value, fraction or abbreviation).Clamps output frame rate when output framerate is auto-set and is higher than this value. Useful in batch processing or when input framerate is wrongly detected as very high. It cannot be set together with -r. It is ignored during streamcopy.-s[:stream_specifier] size (input/output,per-stream)
Set frame size.As an input option, this is a shortcut for the video_size private option, recognized by some demuxers for which the frame size is either not stored in the file or is configurable – e.g. raw video or video grabbers.As an output option, this inserts the scale video filter to the end of the corresponding filtergraph. Please use the scale filter directly to insert it at the beginning or some other place.The format is ‘wxh’ (default - same as source).-aspect[:stream_specifier] aspect (output,per-stream)
Set the video display aspect ratio specified by aspect.aspect can be a floating point number string, or a string of the form num:den, where num and den are the numerator and denominator of the aspect ratio. For example "4:3", "16:9", "1.3333", and "1.7777" are valid argument values.If used together with -vcodec copy, it will affect the aspect ratio stored at container level, but not the aspect ratio stored in encoded frames, if it exists.-display_rotation[:stream_specifier] rotation (input,per-stream)
Set video rotation metadata.rotation is a decimal number specifying the amount in degree by which the video should be rotated counter-clockwise before being displayed.This option overrides the rotation/display transform metadata stored in the file, if any. When the video is being transcoded (rather than copied) and -autorotate is enabled, the video will be rotated at the filtering stage. Otherwise, the metadata will be written into the output file if the muxer supports it.If the -display_hflip and/or -display_vflip options are given, they are applied after the rotation specified by this option.-display_hflip[:stream_specifier] (input,per-stream)
Set whether on display the image should be horizontally flipped.See the -display_rotation option for more details.-display_vflip[:stream_specifier] (input,per-stream)
Set whether on display the image should be vertically flipped.See the -display_rotation option for more details.-vn (input/output)
As an input option, blocks all video streams of a file from being filtered or being automatically selected or mapped for any output. See -discard option to disable streams individually.As an output option, disables video recording i.e. automatic selection or mapping of any video stream. For full manual control see the -map option.-vcodec codec (output)
Set the video codec. This is an alias for -codec:v.-pass[:stream_specifier] n (output,per-stream)
Select the pass number (1 or 2). It is used to do two-pass video encoding. The statistics of the video are recorded in the first pass into a log file (see also the option -passlogfile), and in the second pass that log file is used to generate the video at the exact requested bitrate. On pass 1, you may just deactivate audio and set output to null, examples for Windows and Unix:ffmpeg -i foo.mov -c:v libxvid -pass 1 -an -f rawvideo -y NUL
ffmpeg -i foo.mov -c:v libxvid -pass 1 -an -f rawvideo -y /dev/null
-passlogfile[:stream_specifier] prefix (output,per-stream)
Set two-pass log file name prefix to prefix, the default file name prefix is “ffmpeg2pass”. The complete file name will be PREFIX-N.log, where N is a number specific to the output stream-vf filtergraph (output)
Create the filtergraph specified by filtergraph and use it to filter the stream.This is an alias for -filter:v, see the -filter option.-autorotate
Automatically rotate the video according to file metadata. Enabled by default, use -noautorotate to disable it.-autoscale
Automatically scale the video according to the resolution of first frame. Enabled by default, use -noautoscale to disable it. When autoscale is disabled, all output frames of filter graph might not be in the same resolution and may be inadequate for some encoder/muxer. Therefore, it is not recommended to disable it unless you really know what you are doing. Disable autoscale at your own risk.

3、命令

获取多媒体元数据信息：
ffprobe -v quiet -print_format json -show_format -show_streams xd.mp3
简化为：
ffprobe -show_format -show_streams -of json xd.mp3audio转化：
ffmpeg -i xd.mp3 -ss 00:00:00 -t 00:00:60 -acodec pcm_s16le -ar 16000 output.wav

实例

ffmpeg -i in_file -codec:a pcm_s16le -ac 1 -ar 16000 out_file -loglevel quiet

命令	含义
-i filemname.fmt	后面跟设置输入文件名filemname.fmt
-f fmt	强制格式，设置输出格式为fmt
-c/-codec codec	编解码器名称codec（wav格式对应pcm_s16le，signed 16 bits little endian, 有符号 16 位小端）
-ar samplerate	设置音频采样率（Hz）
-ac channels	设置音频通道数，比如-ac 1为单通道
-acodec copy	指定音频编码，若用参数copy是直接复制相应的流

异常

ffmpeg-python不能够正常输出一些参数，ffmpy3是可以正常使用的

pip install ffmpy3

#正常
ff = ffmpy3.FFmpeg(# executable的值为ffmpeg的路径，配置了环境变量可以不写inputs={fls: '-y'},outputs={pcmpath: '-acodec pcm_s16le -f s16le -ac 1 -ar 16000'}

#命令行中可以执行，但是脚本中报错,实际执行失败#这里加上Pcm参数执行失败，需要用os.system或者subprocessstream=ffmpeg.input(fls)#stream=ffmpeg.output(stream,pcmpath,ac=1,ar=16000)stream = ffmpeg.output(stream, pcmpath,  ac=1, ar=16000)ffmpeg.run(stream)'''命令行中可以运行成功，但是在这里运行失败'''command = ['ffmpeg','-i', fls,  # 输入音频文件'-f','s161e','-acodec', 'pcm_s16le',  # 音频编码为PCM 16位小端'-ac', '1',  # 单声道'-ar', '16000',  # 采样率为16000Hzpcmpath  # 输出文件路径]# 运行FFmpeg命令subprocess.run(command)

[工具]-ffmpeg-笔记

朋友有一个需求，将视频文件转化为音频文件、音频文件获取音频转化为文本文件。思路：通过ffmpeg转化视频为音频，通过百度ai提供的voice_t_text接口提取语音文本，但是需要将音频分割成1分钟内的pcm编码 ，采样率16000的…...

编程日记 2024/8/9 16:43:01

Android Fragment：详解，结合真实开发场景Navigation

目录 1）Fragment是什么 2）Fragment的应用场景 3）为什么使用Fragment? 4）Fragment如何使用 5）Fragment的生命周期 6）Android开发，建议是多个activity，还是activity结合fragment&…...

编程日记 2024/8/9 16:40:59

JavaWeb中的Servlet

本笔记基于【尚硅谷全新JavaWeb教程，企业主流javaweb技术栈】https://www.bilibili.com/video/BV1UN411x7xe?vd_sourcea91dafe0f846ad7bd19625e392cf76d8总结 Servlet Servlet简介动态资源和静态资源静态资源无需在程序运行时通过代码运行生成的资源,在程序运…...

编程日记 2024/8/9 16:38:57

SpringBoot AOP 简单的权限校验

本篇文章的主要内容是通过AOP切面编程实现简单的权限校验。书接上回登录与注册功能我们的用户表里面不是有role(权限)这个字段吗在JWT令牌的生成中，我们加入了role字段。那么接下来，我们就可以通过这个字段来实现权限校验。我这里就很简单&#x…...

编程日记 2024/8/9 16:37:55

Java生成Word-＞PDF-＞图片：基于poi-tl 进行word模板渲染

文章目录引言I Java生成Word、PDF、图片文档获取标签渲染数据生成文档案例II 工具类封装2.1 word 渲染和word 转 pfd2.2 pdf转成一张图片III poi-tl(word模板渲染) 标签简介文本标签{{var}}图片标签表格标签引用标签IV poi-tl提供了类 Configure 来配置常用的设置标签类型前后…...

编程日记 2024/8/9 16:36:54

JVM内存模型笔记

1. 运行时数据区概述 JVM内存布局规定了Java运行过程中的内存申请、分配和管理策略。运行时数据区分为线程私有和线程共享两种。 2. 线程私有内存程序计数器：存储当前线程执行的字节码指令地址。虚拟机栈：保存方法调用的局部变量和部分结果。本地方法…...

编程日记 2024/8/9 16:35:52

每日一练 - eSight 网管远程告警通知方式

01 真题题目 eSight 网管支持的远程告警通知方式包括:(多选) A.邮件 B.语音 C.短信 D.微信 02 真题答案 AC 03 答案解析 eSight 网管系统支持多种远程告警通知方式，包括邮件和短信。这些通知方式可以帮助网络管理员及时了解网络设备的状态和告警信息&#xff0…...

编程日记 2024/8/9 16:32:48

[matlab] 鲸鱼优化算法优化KNN分类器的特征选择

目录引言智能优化算法概述智能优化算法在KNN特征选择中的应用应用步骤 UCI数据集鲸鱼优化算法一、算法背景与原理二、算法组成与步骤三、算法特点与优势四、应用与挑战代码实现鲸鱼优化算法主程序打印结果引言智能优化算法在优化KNN（…...

编程日记 2024/8/9 16:30:46

vscode ssh-remote 疑似内存泄漏问题

vscode ssh-remote疑似内存泄漏问题系统信息与版本号版本：1.88.1（通用） 日期：2024-04-10T17:42:52.765Z Electron: 28.2.8 ElectronBuildId: 27744544 Chromium：120.0.6099.291 Node.js：18.18.2 V8&…...

编程日记 2024/8/9 16:29:44

初识自然语言处理NLP

文章目录 1、简介2、自然语言处理的发展简史3、语言学理论句法学（Syntax）语义学（Semantics）语用学（Pragmatics）形态学（Morphology） 4、统计与机器学习方法n-gram 模型隐马尔可夫模型…...

编程日记 2024/8/9 16:28:44

分布式系统架构-微服务架构

一.什么是分布式系统架构分布式系统架构是指将一个单一的应用程序或服务拆分成多个独立的部分，这些部分可以在不同的计算机、服务器或者地理位置上运行，并通过网络进行通信和协作。分布式系统的设计旨在提高系统的可靠性、可用性和扩展性，同…...

编程日记 2024/8/9 16:27:43

docker搭建内网穿透服务 frpfrpsfrpc zerotier构建 moon构建 planet查询客户端配置moon方法 nps frp 参考文章：https://blog.csdn.net/weixin_43909881/article/details/126526059 frps docker pull snowdreamtech/frps docker run --restartalways --network ho…...

编程日记 2024/8/9 16:26:41

html+css+js网页设计体育金轮健身7个页面

htmlcssjs网页设计体育金轮健身7个页面网页作品代码简单，可使用任意HTML编辑软件（如：Dreamweaver、HBuilder、Vscode 、Sublime 、Webstorm、Text 、Notepad 等任意html编辑软件进行运行及修改编辑等操作）。获取源码 1&…...

编程日记 2024/8/9 16:25:40

BGP基础简介（一）

AS 是一组运行相同IGP协议的设备组成的网络 AS号： 16bit：64512~65535为私有AS32bit:4200000000~4294967294为私有AS其余都是共有AS，需要向IANA申请 EGP 外部网关协议，bgp的前身，缺点:只发布路由信息，不…...

编程日记 2024/8/9 16:23:38

力扣面试150 反转链表 II 三指针

Problem: 92. 反转链表 II 👨‍🏫 参考题解特殊情况 /*** Definition for singly-linked list.* public class ListNode {* int val;* ListNode next;* ListNode() {}* ListNode(int val) { this.val val; }* ListNode(int val…...

编程日记 2024/8/9 16:21:36

GPT-4.o mini

https://share.xuzhugpt.cloud/ GPT-4.o mini 目前免费使用把上面[chatgpt4o-mini-xuzhu]复制到UserToken的文本框中点击[个人账户] 测试一下哈，看看： GPT-4.o代码有时候还是有严重错误：好奇怎么来的上面是我写得，下面是GPT写…...

编程日记 2024/8/9 16:20:34

【C++】优先级队列(容器适配器)

欢迎来到我的Blog，点击关注哦💕 前言 string vector list 这种线性结构是最基础的存储结构，C（STL）container很好的帮助我们数据存储的问题。容器适配器介绍容器适配器是C标准模板库（STL）中…...

编程日记 2024/8/9 16:15:28

docker代理

Dockerd 代理 sudo mkdir -p /etc/systemd/system/docker.service.d sudo touch /etc/systemd/system/docker.service.d/proxy.confproxy.conf [Service] Environment"HTTP_PROXYproxy.example.com:8080/" Environment"HTTPS_PROXYproxy.example.com:8080/&qu…...

编程日记 2024/8/9 16:14:26

（四）activit5.23.0修复跟踪高亮显示BUG

一、先看bug 在 （三）springboot2.7.6集成activit5.23.0之流程跟踪高亮显示末尾就发现高亮显示与预期不一样，比如上面的任务2前面的箭头没有高亮显示。二、分析原因具体分析步骤省略了，主要是ProcessInstanceHighlightsResour…...

编程日记 2024/8/9 16:13:25

AsyncTask

AsyncTask简介 AsyncTask 是 Android 提供的一个轻量级的异步任务类，它允许在后台线程中执行耗时操作（如网络请求、数据库操作等），并在操作完成后更新 UI。其设计初衷是为了简化后台任务的处理，特别是在不需要复杂并发…...

编程日记 2024/8/9 16:11:23

LBE-LEX系列工业语音播放器|预警播报器|喇叭蜂鸣器的上位机配置操作说明

LBE-LEX系列工业语音播放器|预警播报器|喇叭蜂鸣器专为工业环境精心打造，完美适配AGV和无人叉车。同时，集成以太网与语音合成技术，为各类高级系统（如MES、调度系统、库位管理、立库等）提供高效便捷的语音交互体验。 L…...

编程新知 2026/2/10 23:40:34

【Python】 -- 趣味代码 - 小恐龙游戏

文章目录文章目录 00 小恐龙游戏程序设计框架代码结构和功能游戏流程总结01 小恐龙游戏程序设计02 百度网盘地址00 小恐龙游戏程序设计框架这段代码是一个基于 Pygame 的简易跑酷游戏的完整实现，玩家控制一个角色（龙）躲避障碍物（仙人掌和乌鸦）。以下是代码的详细介绍：…...

编程新知 2026/2/8 20:43:07

多模态2025：技术路线“神仙打架”，视频生成冲上云霄

文｜魏琳华编｜王一粟一场大会，聚集了中国多模态大模型的“半壁江山”。智源大会2025为期两天的论坛中，汇集了学界、创业公司和大厂等三方的热门选手，关于多模态的集中讨论达到了前所未有的热度。其中，…...

编程新知 2026/2/8 20:43:00

【SQL学习笔记1】增删改查+多表连接全解析（内附SQL免费在线练习工具）

可以使用Sqliteviz这个网站免费编写sql语句，它能够让用户直接在浏览器内练习SQL的语法，不需要安装任何软件。链接如下： sqliteviz 注意： 在转写SQL语法时，关键字之间有一个特定的顺序，这个顺序会影响到…...

编程新知 2026/2/5 4:36:53

IoT/HCIP实验-3/LiteOS操作系统内核实验(任务、内存、信号量、CMSIS..)

文章目录概述HelloWorld 工程C/C配置编译器主配置Makefile脚本烧录器主配置运行结果程序调用栈任务管理实验实验结果osal 系统适配层osal_task_create 其他实验实验源码内存管理实验互斥锁实验信号量实验 CMISIS接口实验还是得JlINKCMSIS 简介LiteOS->CMSIS任务间消息交互…...

编程新知 2026/1/31 23:44:39

mysql已经安装，但是通过rpm -q 没有找mysql相关的已安装包

文章目录现象：mysql已经安装，但是通过rpm -q 没有找mysql相关的已安装包遇到 rpm 命令找不到已经安装的 MySQL 包时，可能是因为以下几个原因：1.MySQL 不是通过 RPM 包安装的2.RPM 数据库损坏3.使用了不同的包名或路径4.使用其他包…...

编程新知 2026/2/4 16:17:25

Java编程之桥接模式

定义桥接模式（Bridge Pattern）属于结构型设计模式，它的核心意图是将抽象部分与实现部分分离，使它们可以独立地变化。这种模式通过组合关系来替代继承关系，从而降低了抽象和实现这两个可变维度之间的耦合度。用例子…...

编程新知 2026/1/23 10:28:48

【Redis】笔记｜第8节｜大厂高并发缓存架构实战与优化

缓存架构代码结构代码详情功能点： 多级缓存，先查本地缓存，再查Redis，最后才查数据库热点数据重建逻辑使用分布式锁，二次查询更新缓存采用读写锁提升性能采用Redis的发布订阅机制通知所有实例更新本地缓存适用读多…...

编程新知 2026/1/26 3:45:28

Git 3天2K星标：Datawhale 的 Happy-LLM 项目介绍（附教程）

引言在人工智能飞速发展的今天，大语言模型（Large Language Models, LLMs）已成为技术领域的焦点。从智能写作到代码生成，LLM 的应用场景不断扩展，深刻改变了我们的工作和生活方式。然而，理解这些模型的内部…...

编程新知 2026/1/29 11:06:33

4. TypeScript 类型推断与类型组合

一、类型推断 (一) 什么是类型推断 TypeScript 的类型推断会根据变量、函数返回值、对象和数组的赋值和使用方式，自动确定它们的类型。这一特性减少了显式类型注解的需要，在保持类型安全的同时简化了代码。通过分析上下文和初始值，TypeSc…...

编程新知 2025/11/2 1:48:43

配置

参数说明

1、音频参数说明

2、视频参数说明

3、命令

实例

异常

相关文章：