20240202在WIN10下使用whisper.cpp
20240202在WIN10下使用whisper.cpp
2024/2/2 14:15
【结论:在Windows10下,确认large模式识别7分钟中文视频,需要83.7284 seconds,需要大概1.5分钟!效率太差!】
83.7284/420=0.19935333333333333333333333333333
前提条件,可以通过技术手段上外网!^_
首先你要有一张NVIDIA的显卡,比如我用的PDD拼多多的二手GTX1080显卡。【并且极其可能是矿卡!】800¥
2、请正确安装好NVIDIA最新的545版本的驱动程序和CUDA、cuDNN。
2、安装Torch
3、配置whisper
识别得到的字幕chs.srt是繁体中文的,将来要想办法更换为简体中文的!
1
00:00:00,000 --> 00:00:01,400
前段時間有個巨石恆虎
2
00:00:01,400 --> 00:00:03,000
某某是男人最好的醫妹
3
00:00:03,000 --> 00:00:04,800
這裡的某某可以替換為減肥
4
00:00:04,800 --> 00:00:07,800
長髮 西裝 考研 速唱 永潔無間等等等等
https://github.com/Const-me/Whisper/releases
https://www.cnblogs.com/jike9527/p/17545484.html?share_token=5af4092d-5b67-4e52-8231-0ae220fd2185
https://www.cnblogs.com/jike9527/p/17545484.html
使用whisper批量生成字幕(whisper.cpp)
c:\>
c:\>git clone https://github.com/ggerganov/whisper.cpp
Cloning into 'whisper.cpp'...
remote: Enumerating objects: 6773, done.
remote: Counting objects: 100% (1995/1995), done.
remote: Compressing objects: 100% (275/275), done.
remote: Total 6773 (delta 1826), reused 1810 (delta 1714), pack-reused 4778
Receiving objects: 100% (6773/6773), 10.18 MiB | 6.55 MiB/s, done.
Resolving deltas: 100% (4368/4368), done.
c:\>cd whisper.cpp
c:\whisper.cpp>dir
驱动器 C 中的卷是 WIN10
卷的序列号是 9273-D6A8
c:\whisper.cpp 的目录
2024/02/02 14:20 <DIR> .
2024/02/02 14:20 <DIR> ..
2024/02/02 14:20 <DIR> .devops
2024/02/02 14:20 <DIR> .github
2024/02/02 14:20 863 .gitignore
2024/02/02 14:20 99 .gitmodules
2024/02/02 14:20 <DIR> bindings
2024/02/02 14:20 <DIR> cmake
2024/02/02 14:20 19,729 CMakeLists.txt
2024/02/02 14:20 <DIR> coreml
2024/02/02 14:20 <DIR> examples
2024/02/02 14:20 <DIR> extra
2024/02/02 14:20 32,539 ggml-alloc.c
2024/02/02 14:20 4,149 ggml-alloc.h
2024/02/02 14:20 5,996 ggml-backend-impl.h
2024/02/02 14:20 69,048 ggml-backend.c
2024/02/02 14:20 11,932 ggml-backend.h
2024/02/02 14:20 451,408 ggml-cuda.cu
2024/02/02 14:20 2,156 ggml-cuda.h
2024/02/02 14:20 7,813 ggml-impl.h
2024/02/02 14:20 2,425 ggml-metal.h
2024/02/02 14:20 152,813 ggml-metal.m
2024/02/02 14:20 231,753 ggml-metal.metal
2024/02/02 14:20 87,989 ggml-opencl.cpp
2024/02/02 14:20 1,422 ggml-opencl.h
2024/02/02 14:20 411,673 ggml-quants.c
2024/02/02 14:20 13,983 ggml-quants.h
2024/02/02 14:20 696,627 ggml.c
2024/02/02 14:20 87,399 ggml.h
2024/02/02 14:20 <DIR> grammars
2024/02/02 14:20 1,093 LICENSE
2024/02/02 14:20 15,341 Makefile
2024/02/02 14:20 <DIR> models
2024/02/02 14:20 <DIR> openvino
2024/02/02 14:20 1,835 Package.swift
2024/02/02 14:20 39,942 README.md
2024/02/02 14:20 <DIR> samples
2024/02/02 14:20 <DIR> spm-headers
2024/02/02 14:20 <DIR> tests
2024/02/02 14:20 239,648 whisper.cpp
2024/02/02 14:20 30,873 whisper.h
26 个文件 2,620,548 字节
15 个目录 128,119,971,840 可用字节
c:\whisper.cpp>
c:\whisper.cpp>
c:\whisper.cpp>
c:\whisper.cpp>cd models
c:\whisper.cpp\models>dir
驱动器 C 中的卷是 WIN10
卷的序列号是 9273-D6A8
c:\whisper.cpp\models 的目录
2024/02/02 14:20 <DIR> .
2024/02/02 14:20 <DIR> ..
2024/02/02 14:20 7 .gitignore
2024/02/02 14:20 4,980 convert-h5-to-coreml.py
2024/02/02 14:20 7,584 convert-h5-to-ggml.py
2024/02/02 14:20 10,955 convert-pt-to-ggml.py
2024/02/02 14:20 12,761 convert-whisper-to-coreml.py
2024/02/02 14:20 1,799 convert-whisper-to-openvino.py
2024/02/02 14:20 2,272 download-coreml-model.sh
2024/02/02 14:20 1,440 download-ggml-model.cmd
2024/02/02 14:20 3,039 download-ggml-model.sh
2024/02/02 14:20 575,451 for-tests-ggml-base.bin
2024/02/02 14:20 586,836 for-tests-ggml-base.en.bin
2024/02/02 14:20 575,451 for-tests-ggml-large.bin
2024/02/02 14:20 575,451 for-tests-ggml-medium.bin
2024/02/02 14:20 586,836 for-tests-ggml-medium.en.bin
2024/02/02 14:20 575,451 for-tests-ggml-small.bin
2024/02/02 14:20 586,836 for-tests-ggml-small.en.bin
2024/02/02 14:20 575,451 for-tests-ggml-tiny.bin
2024/02/02 14:20 586,836 for-tests-ggml-tiny.en.bin
2024/02/02 14:20 1,506 generate-coreml-interface.sh
2024/02/02 14:20 1,355 generate-coreml-model.sh
2024/02/02 14:20 3,711 ggml_to_pt.py
2024/02/02 14:20 42 openvino-conversion-requirements.txt
2024/02/02 14:20 5,615 README.md
23 个文件 5,281,665 字节
2 个目录 105,396,047,872 可用字节
c:\whisper.cpp\models>main.exe -f samples\jfk.wav
'main.exe' 不是内部或外部命令,也不是可运行的程序
或批处理文件。
c:\whisper.cpp\models>dir
驱动器 C 中的卷是 WIN10
卷的序列号是 9273-D6A8
c:\whisper.cpp\models 的目录
2024/02/02 14:23 <DIR> .
2024/02/02 14:23 <DIR> ..
2024/02/02 14:20 7 .gitignore
2024/02/02 14:20 4,980 convert-h5-to-coreml.py
2024/02/02 14:20 7,584 convert-h5-to-ggml.py
2024/02/02 14:20 10,955 convert-pt-to-ggml.py
2024/02/02 14:20 12,761 convert-whisper-to-coreml.py
2024/02/02 14:20 1,799 convert-whisper-to-openvino.py
2024/02/02 14:20 2,272 download-coreml-model.sh
2024/02/02 14:20 1,440 download-ggml-model.cmd
2024/02/02 14:20 3,039 download-ggml-model.sh
2024/02/02 14:20 575,451 for-tests-ggml-base.bin
2024/02/02 14:20 586,836 for-tests-ggml-base.en.bin
2024/02/02 14:20 575,451 for-tests-ggml-large.bin
2024/02/02 14:20 575,451 for-tests-ggml-medium.bin
2024/02/02 14:20 586,836 for-tests-ggml-medium.en.bin
2024/02/02 14:20 575,451 for-tests-ggml-small.bin
2024/02/02 14:20 586,836 for-tests-ggml-small.en.bin
2024/02/02 14:20 575,451 for-tests-ggml-tiny.bin
2024/02/02 14:20 586,836 for-tests-ggml-tiny.en.bin
2024/02/02 14:20 1,506 generate-coreml-interface.sh
2024/02/02 14:20 1,355 generate-coreml-model.sh
2024/02/02 13:23 37,922,638 ggml-base-encoder.mlmodelc.zip
2024/02/02 13:23 59,707,625 ggml-base-q5_1.bin
2024/02/02 13:24 147,951,465 ggml-base.bin
2024/02/02 13:24 37,950,917 ggml-base.en-encoder.mlmodelc.zip
2024/02/02 13:24 59,721,011 ggml-base.en-q5_1.bin
2024/02/02 13:24 147,964,211 ggml-base.en.bin
2024/02/02 13:30 1,177,529,527 ggml-large-v1-encoder.mlmodelc.zip
2024/02/02 13:35 3,094,623,691 ggml-large-v1.bin
2024/02/02 13:31 1,174,643,458 ggml-large-v2-encoder.mlmodelc.zip
2024/02/02 13:30 1,080,732,091 ggml-large-v2-q5_0.bin
2024/02/02 13:35 3,094,623,691 ggml-large-v2.bin
2024/02/02 13:31 1,175,711,232 ggml-large-v3-encoder.mlmodelc.zip
2024/02/02 13:32 1,081,140,203 ggml-large-v3-q5_0.bin
2024/02/02 13:35 3,095,033,483 ggml-large-v3.bin
2024/02/02 13:57 567,829,413 ggml-medium-encoder.mlmodelc.zip
2024/02/02 13:57 539,212,467 ggml-medium-q5_0.bin
2024/02/02 14:03 1,533,763,059 ggml-medium.bin
2024/02/02 13:59 566,993,085 ggml-medium.en-encoder.mlmodelc.zip
2024/02/02 13:59 539,225,533 ggml-medium.en-q5_0.bin
2024/02/02 14:04 1,533,774,781 ggml-medium.en.bin
2024/02/02 14:08 163,083,239 ggml-small-encoder.mlmodelc.zip
2024/02/02 14:07 190,085,487 ggml-small-q5_1.bin
2024/02/02 14:09 487,601,967 ggml-small.bin
2024/02/02 14:09 162,952,446 ggml-small.en-encoder.mlmodelc.zip
2024/02/02 14:09 190,098,681 ggml-small.en-q5_1.bin
2024/02/02 14:11 487,614,201 ggml-small.en.bin
2024/02/02 14:10 15,037,446 ggml-tiny-encoder.mlmodelc.zip
2024/02/02 14:10 32,152,673 ggml-tiny-q5_1.bin
2024/02/02 14:11 77,691,713 ggml-tiny.bin
2024/02/02 14:11 15,034,655 ggml-tiny.en-encoder.mlmodelc.zip
2024/02/02 14:11 32,166,155 ggml-tiny.en-q5_1.bin
2024/02/02 14:12 43,550,795 ggml-tiny.en-q8_0.bin
2024/02/02 14:12 77,704,715 ggml-tiny.en.bin
2024/02/02 14:20 3,711 ggml_to_pt.py
2024/02/02 13:23 1,477 gitattributes
2024/02/02 14:20 42 openvino-conversion-requirements.txt
2024/02/02 13:23 1,311 README.md
57 个文件 22,726,106,592 字节
2 个目录 105,396,191,232 可用字节
c:\whisper.cpp\models>cd ..
c:\whisper.cpp>dir
c:\whisper.cpp>
c:\whisper.cpp>
c:\whisper.cpp>main.exe -f samples\jfk.wav
Using GPU "NVIDIA GeForce GTX 1080", feature level 12.1, effective flags Wave32 | NoReshapedMatMul
Loaded MEL filters, 62.8 kb RAM
Loaded vocabulary, 51864 strings, 3050.6 kb RAM
Loaded 245 GPU tensors, 140.539 MB VRAM
Computed CPU base frequency: 2.29469 GHz
Loaded model from "models/ggml-base.en.bin" to VRAM
Created source reader from the file "samples\jfk.wav"
[00:00:00.000 --> 00:00:11.000] And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country.
CPU Tasks
LoadModel 577.635 milliseconds
RunComplete 422.9 milliseconds
Run 319.505 milliseconds
Callbacks 5.4751 milliseconds, 2 calls, 2.73755 milliseconds average
Spectrogram 52.7935 milliseconds, 3 calls, 17.5978 milliseconds average
Sample 7.6473 milliseconds, 27 calls, 283.233 microseconds average
Encode 188.011 milliseconds
Decode 125.975 milliseconds
DecodeStep 118.306 milliseconds, 27 calls, 4.38169 milliseconds average
GPU Tasks
LoadModel 249.459 milliseconds
Run 231.117 milliseconds
Encode 99.0044 milliseconds
EncodeLayer 77.7554 milliseconds, 6 calls, 12.9592 milliseconds average
Decode 132.112 milliseconds
DecodeStep 132.103 milliseconds, 27 calls, 4.89271 milliseconds average
DecodeLayer 87.4824 milliseconds, 162 calls, 540.015 microseconds average
Compute Shaders
mulMatTiled 63.4898 milliseconds, 60 calls, 1.05816 milliseconds average
mulMatByRowTiled 50.9198 milliseconds, 1959 calls, 25.9928 microseconds average
softMaxLong 27.5314 milliseconds, 27 calls, 1.01968 milliseconds average
norm 12.3785 milliseconds, 526 calls, 23.5333 microseconds average
addRepeatGelu 11.9749 milliseconds, 170 calls, 70.4406 microseconds average
fmaRepeat1 7.652 milliseconds, 526 calls, 14.5475 microseconds average
addRepeatEx 7.4319 milliseconds, 498 calls, 14.9235 microseconds average
softMaxFixed 6.913 milliseconds, 168 calls, 41.1488 microseconds average
copyConvert 5.397 milliseconds, 348 calls, 15.5086 microseconds average
convolutionMain 5.3903 milliseconds
convolutionMain2Fixed 5.2572 milliseconds
copyTranspose 4.6246 milliseconds, 336 calls, 13.7637 microseconds average
scaleInPlace 4.5107 milliseconds, 168 calls, 26.8494 microseconds average
addRepeatScale 3.7607 milliseconds, 324 calls, 11.6071 microseconds average
softMax 2.9733 milliseconds, 162 calls, 18.3537 microseconds average
addRepeat 1.8574 milliseconds, 180 calls, 10.3189 microseconds average
diagMaskInf 1.3711 milliseconds, 162 calls, 8.46358 microseconds average
convolutionPrep1 439.3 microseconds, 2 calls, 219.65 microseconds average
convolutionPrep2 229.4 microseconds, 2 calls, 114.7 microseconds average
addRows 191.5 microseconds, 27 calls, 7.09259 microseconds average
add 60.4 microseconds
mulMatByScalar 29.7 microseconds, 6 calls, 4.95 microseconds average
mulMatByRow 27.6 microseconds, 6 calls, 4.6 microseconds average
Memory Usage
Model 858.5 KB RAM, 140.539 MB VRAM
Context 1.19063 MB RAM, 186.732 MB VRAM
Total 2.02901 MB RAM, 327.271 MB VRAM
c:\whisper.cpp>main.exe -l zh -osrt -m models/ggml-medium.bin chs.wav
Using GPU "NVIDIA GeForce GTX 1080", feature level 12.1, effective flags Wave32 | NoReshapedMatMul
Loaded MEL filters, 62.8 kb RAM
Loaded vocabulary, 51865 strings, 3037.1 kb RAM
Loaded 947 GPU tensors, 1462.12 MB VRAM
Computed CPU base frequency: 2.29469 GHz
Loaded model from "models/ggml-medium.bin" to VRAM
Created source reader from the file "chs.wav"
[00:00:00.000 --> 00:00:01.400] ?????????????
[00:00:01.400 --> 00:00:03.000] ????????????
[00:00:03.000 --> 00:00:04.800] ?????????????????
[00:00:04.800 --> 00:00:07.800] ??? ?? ??? ?? ?????????
[00:00:07.800 --> 00:00:09.200] ???????????
[00:00:09.200 --> 00:00:12.000] ??????????????????????
[00:00:12.000 --> 00:00:13.400] ?????????
[00:00:13.400 --> 00:00:14.400] ???????
[00:00:14.400 --> 00:00:17.400] ?????????????????????????
[00:00:17.400 --> 00:00:20.000] ?????????????????????
[00:00:20.000 --> 00:00:21.600] ???????????????
[00:00:21.600 --> 00:00:22.800] ?????????
[00:00:22.800 --> 00:00:24.400] ?????????????
[00:00:24.400 --> 00:00:29.600] ?????????????????? ?????????????????????
[00:00:29.600 --> 00:00:32.400] ??????? ???????? ???
[00:00:32.400 --> 00:00:34.600] ??????????????????
[00:00:34.600 --> 00:00:36.200] ???????????
[00:00:36.200 --> 00:00:37.000] ???
[00:00:37.000 --> 00:00:38.000] ?????
[00:00:38.000 --> 00:00:39.400] ???????????
[00:00:39.400 --> 00:00:40.600] ????????
[00:00:40.600 --> 00:00:41.800] ????? ?????
[00:00:41.800 --> 00:00:44.000] ???????????????????
[00:00:44.000 --> 00:00:46.600] ?????????????????????????
[00:00:46.600 --> 00:00:49.600] ???????????????????????
[00:00:49.600 --> 00:00:52.000] ???????????????????
[00:00:52.000 --> 00:00:54.200] ???????????????????
[00:00:54.200 --> 00:00:56.000] ??????? ??????
[00:00:56.000 --> 00:00:58.000] ???????????????????
[00:00:58.000 --> 00:01:00.000] ??????????????
[00:01:00.000 --> 00:01:01.000] ????????
[00:01:01.000 --> 00:01:02.600] ???????????
[00:01:02.600 --> 00:01:04.800] ????????????? ????????
[00:01:04.800 --> 00:01:07.000] ??11 ??????????????????
[00:01:07.000 --> 00:01:10.000] ?????????????????? ????????
[00:01:10.000 --> 00:01:13.200] ???? ??????????????????296%
[00:01:13.200 --> 00:01:16.000] ?????????????????????
[00:01:16.000 --> 00:01:20.000] ??????11 ?????? ????????????7????????
[00:01:20.000 --> 00:01:21.000] ?????????
[00:01:21.000 --> 00:01:22.400] ???????????
[00:01:22.400 --> 00:01:24.200] ???? ????????
[00:01:24.200 --> 00:01:26.800] ???????????????????????
[00:01:26.800 --> 00:01:28.400] ???? ?????????
[00:01:28.400 --> 00:01:29.800] ??????????
[00:01:29.800 --> 00:01:31.800] ?????????????? ????
[00:01:31.800 --> 00:01:33.400] ??????????????
[00:01:33.400 --> 00:01:35.400] ???????????????
[00:01:35.400 --> 00:01:37.600] ??? ?????2198
[00:01:37.600 --> 00:01:40.600] ????????? ??????699
[00:01:40.600 --> 00:01:42.200] ?????? ???????
[00:01:42.200 --> 00:01:45.000] 400?????? ?????????300?
[00:01:45.000 --> 00:01:48.200] ??????? ????????200???????????
[00:01:48.200 --> 00:01:51.600] ????? ????????????Citywalk????
[00:01:51.600 --> 00:01:54.600] ?????? ???????1000????
[00:01:54.600 --> 00:01:58.200] ????????????????????????????
[00:01:58.200 --> 00:02:00.400] ?????????????????
[00:02:00.400 --> 00:02:02.200] ?????????????
[00:02:02.200 --> 00:02:05.000] ???????????????????????
[00:02:05.000 --> 00:02:07.400] ????????? ???????????
[00:02:07.400 --> 00:02:08.600] ????????
[00:02:08.600 --> 00:02:10.000] ??????????
[00:02:10.000 --> 00:02:13.400] ???????????????????????? ????1?1???
[00:02:13.400 --> 00:02:15.800] ??????????????? ?????
[00:02:15.800 --> 00:02:18.200] ?????????? ?????????
[00:02:18.200 --> 00:02:20.600] ???????????? ???????
[00:02:20.600 --> 00:02:22.400] ?????????? ???
[00:02:22.400 --> 00:02:26.400] ????????? ????? ???? ??????????
[00:02:26.400 --> 00:02:29.200] ???????? ???????????????????
[00:02:29.200 --> 00:02:30.800] ????????????
[00:02:30.800 --> 00:02:32.600] ???? ???????
[00:02:32.600 --> 00:02:35.400] ????????? ????????
[00:02:35.400 --> 00:02:38.600] ????????????? ???????????
[00:02:38.600 --> 00:02:41.000] ?????? ???????????
[00:02:41.000 --> 00:02:43.600] ?????????1000? ???????
[00:02:43.600 --> 00:02:46.400] 500???????? 200???????
[00:02:46.400 --> 00:02:48.400] ?99 ??????????
[00:02:48.400 --> 00:02:50.800] ???????????? ?????????
[00:02:50.800 --> 00:02:53.800] ???????GORTEX??????? ??3000??
[00:02:53.800 --> 00:02:56.200] ???????????????????????
[00:02:56.200 --> 00:03:00.000] ???????????GORTEX???????????4500
[00:03:00.000 --> 00:03:03.000] ?????GORTEX ?????????????
[00:03:03.000 --> 00:03:05.800] ????? ???????????????????
[00:03:05.800 --> 00:03:08.000] ???????? ????? ????
[00:03:08.000 --> 00:03:09.800] ?????????????????
[00:03:09.800 --> 00:03:11.800] ????????????????????
[00:03:11.800 --> 00:03:14.200] ???????? ????????????
[00:03:14.200 --> 00:03:17.000] ???????????? ????????
[00:03:17.000 --> 00:03:20.000] ??????????? ??????????
[00:03:20.000 --> 00:03:21.600] ????????????
[00:03:21.600 --> 00:03:23.200] ?????????????
[00:03:23.200 --> 00:03:26.000] ????????????????? ?????????????
[00:03:26.000 --> 00:03:29.000] ??????????? ????????? ?????????
[00:03:29.000 --> 00:03:31.800] ?????????? ??????????????
[00:03:31.800 --> 00:03:35.000] ??????? ????????????????????
[00:03:35.000 --> 00:03:36.800] ????????????
[00:03:36.800 --> 00:03:40.000] ???? ???????????? ???
[00:03:40.000 --> 00:03:42.600] ?????????? ???????????
[00:03:42.600 --> 00:03:46.000] ?????????? ????????????
[00:03:46.000 --> 00:03:49.200] ??????????????? ?????????????
[00:03:49.200 --> 00:03:52.200] ?????????? ??????????
[00:03:52.200 --> 00:03:55.000] ???????????????? ?????
[00:03:55.000 --> 00:03:58.000] ???????????? ?????????????
[00:03:58.000 --> 00:04:01.000] ?????????????????????? ?????
[00:04:01.000 --> 00:04:04.000] ??????????????? ??????
[00:04:04.000 --> 00:04:06.600] ??????? ???????????????
[00:04:06.600 --> 00:04:08.800] ???????????????
[00:04:08.800 --> 00:04:12.000] ?????????????????? ?????????
[00:04:12.000 --> 00:04:13.600] ??????????????
[00:04:13.600 --> 00:04:16.200] ??????????? ??????????
[00:04:16.200 --> 00:04:18.400] ???????? ???????
[00:04:18.400 --> 00:04:21.800] ?? ?????? ??????????????
[00:04:21.800 --> 00:04:25.800] ??????????????? ??????????????????
[00:04:25.800 --> 00:04:29.200] ???????? ????????????????????
[00:04:29.200 --> 00:04:30.800] ?????????????????
[00:04:30.800 --> 00:04:33.400] ?????????? ?????????
[00:04:33.400 --> 00:04:36.200] ??????? ????????????????
[00:04:36.200 --> 00:04:39.400] ???????? ???????????????
[00:04:39.400 --> 00:04:41.200] ??????????????
[00:04:41.200 --> 00:04:43.600] ?????????? ?????????
[00:04:43.600 --> 00:04:45.000] ??????????
[00:04:45.000 --> 00:04:47.600] ????????????????????
[00:04:47.600 --> 00:04:51.600] ????????????? ????????? ???????
[00:04:51.600 --> 00:04:53.200] ???????????
[00:04:53.200 --> 00:04:55.800] ??? ??????????????????????
[00:04:55.800 --> 00:04:57.400] ????????????????
[00:04:57.400 --> 00:04:59.800] ?????????????????????
[00:04:59.800 --> 00:05:03.000] ?????????????? ???????????
[00:05:03.000 --> 00:05:04.800] ?????????????????
[00:05:04.800 --> 00:05:07.200] ???????????? ??????????
[00:05:07.200 --> 00:05:09.400] ???? ??????????????
[00:05:09.400 --> 00:05:11.600] ??????????????????
[00:05:11.600 --> 00:05:14.800] ???????????????? ???????????
[00:05:14.800 --> 00:05:16.400] ???? ??????
[00:05:16.400 --> 00:05:18.800] ????? ??????????????
[00:05:18.800 --> 00:05:20.800] ???????????????
[00:05:20.800 --> 00:05:23.200] ????????? ????????????
[00:05:23.200 --> 00:05:25.600] ????????? ??????????????
[00:05:25.600 --> 00:05:29.800] ?????? ????????????????????881?
[00:05:29.800 --> 00:05:31.800] ??????? ??2000?
[00:05:31.800 --> 00:05:34.600] ?????? ??????????????????
[00:05:34.600 --> 00:05:38.400] ?????????8000????????? 2000???????
[00:05:38.600 --> 00:05:41.200] ????????? ????????????
[00:05:41.200 --> 00:05:43.600] ?????? ??? ????????
[00:05:43.600 --> 00:05:46.600] ??2000??8000????????????????
[00:05:46.600 --> 00:05:49.600] ??????????? ?2018?2021?
[00:05:49.600 --> 00:05:52.200] ?????4???????60%??
[00:05:52.200 --> 00:05:56.000] ??5??? ?????????????20??????60??
[00:05:56.000 --> 00:05:59.200] ?????????? ?????????????????
[00:05:59.200 --> 00:06:02.200] ???????????? ?????????????????
[00:06:02.200 --> 00:06:05.200] ?????????? ???????????????
[00:06:05.200 --> 00:06:09.600] ??? ????????? ????????????????????
[00:06:09.600 --> 00:06:11.400] ????????????
[00:06:11.400 --> 00:06:15.200] ???? ?????????? ????????????????
[00:06:15.200 --> 00:06:17.800] ???? ????????????????
[00:06:17.800 --> 00:06:20.600] ?350?????????????????
[00:06:20.600 --> 00:06:23.000] ??????? ??????????
[00:06:23.000 --> 00:06:25.000] ?????????????????
[00:06:25.000 --> 00:06:27.400] ??? ???????????OK
[00:06:27.400 --> 00:06:29.600] ?????????????????????
[00:06:29.600 --> 00:06:31.800] ???????????????????
[00:06:31.800 --> 00:06:36.600] ???????????????? ???????????????????????
[00:06:36.600 --> 00:06:38.800] ?????????????????
[00:06:38.800 --> 00:06:41.400] ???????????????????
[00:06:41.400 --> 00:06:44.200] ??????????????????????????
[00:06:44.200 --> 00:06:46.800] ????????????????????
[00:06:46.800 --> 00:06:48.800] ????????????????
[00:06:48.800 --> 00:06:51.200] ???????????????????
[00:06:51.200 --> 00:06:53.000] ????????????????
[00:06:53.000 --> 00:06:56.000] ?????????????????????????
[00:06:56.000 --> 00:07:01.600] ????????????IC????? ????? ??????
CPU Tasks
LoadModel 1.43866 seconds
RunComplete 83.7284 seconds
Run 83.6255 seconds
Callbacks 457.784 milliseconds, 187 calls, 2.44804 milliseconds average
Spectrogram 1.21106 seconds, 90 calls, 13.4562 milliseconds average
Sample 1.01043 seconds, 3535 calls, 285.836 microseconds average
Encode 15.2296 seconds, 17 calls, 895.858 milliseconds average
Decode 67.9228 seconds, 17 calls, 3.99546 seconds average
DecodeStep 66.9103 seconds, 3535 calls, 18.928 milliseconds average
GPU Tasks
LoadModel 1.03839 seconds
Run 83.4773 seconds
Encode 15.3219 seconds, 17 calls, 901.288 milliseconds average
EncodeLayer 13.0778 seconds, 408 calls, 32.0533 milliseconds average
Decode 68.1554 seconds, 17 calls, 4.00914 seconds average
DecodeStep 68.1535 seconds, 3535 calls, 19.2796 milliseconds average
DecodeLayer 61.7764 seconds, 84840 calls, 728.152 microseconds average
Compute Shaders
mulMatByRowTiled 38.8209 seconds, 1016702 calls, 38.1831 microseconds average
mulMatTiled 15.8527 seconds, 8993 calls, 1.76278 milliseconds average
fmaRepeat1 3.71454 seconds, 258888 calls, 14.348 microseconds average
addRepeatEx 3.43395 seconds, 255336 calls, 13.4487 microseconds average
normFixed 3.29705 seconds, 258888 calls, 12.7354 microseconds average
softMaxLong 2.62421 seconds, 3535 calls, 742.351 microseconds average
copyConvert 2.6175 seconds, 171312 calls, 15.2791 microseconds average
addRepeatScale 2.43674 seconds, 169680 calls, 14.3608 microseconds average
copyTranspose 2.43484 seconds, 170496 calls, 14.2809 microseconds average
softMaxFixed 1.78188 seconds, 85248 calls, 20.9023 microseconds average
addRepeatGelu 1.39165 seconds, 85282 calls, 16.3182 microseconds average
softMax 1.27396 seconds, 84840 calls, 15.0161 microseconds average
scaleInPlace 1.00817 seconds, 85248 calls, 11.8264 microseconds average
addRepeat 954.089 milliseconds, 86064 calls, 11.0858 microseconds average
diagMaskInf 652.093 milliseconds, 84840 calls, 7.68616 microseconds average
convolutionMain2Fixed 388.382 milliseconds, 17 calls, 22.846 milliseconds average
convolutionMain 163.663 milliseconds, 17 calls, 9.62722 milliseconds average
convolutionPrep1 24.0373 milliseconds, 34 calls, 706.979 microseconds average
addRows 21.3709 milliseconds, 3535 calls, 6.04552 microseconds average
convolutionPrep2 7.0976 milliseconds, 34 calls, 208.753 microseconds average
add 1.8821 milliseconds, 17 calls, 110.712 microseconds average
Memory Usage
Model 877.966 KB RAM, 1.42785 GB VRAM
Context 109.465 MB RAM, 785.219 MB VRAM
Total 110.322 MB RAM, 2.19467 GB VRAM
c:\whisper.cpp>
https://github.com/ggerganov/whisper.cpp/tree/master/models
https://github.com/ggerganov/whisper.cpp
ggerganov/whisper.cpp
https://blog.csdn.net/aiyolo/article/details/129674728?share_token=2c48b804-37f6-43a8-9159-08b28147ad67
Whisper.cpp 编译使用
whisper.cpp 是牛人 ggerganov 对 openai 的 whisper 语音识别模型用 C++ 重新实现的项目,开源在 github 上,具有轻量、性能高,实用性强等特点。这篇文章主要记录在 windows 平台,如何使用该模型在本地端进行语音识别。
whisper.cpp 的开源地址在 ggerganov/whisper.cpp: Port of OpenAI’s Whisper model in C/C++ (github.com),首先将项目下载在本地。
git clone https://github.com/ggerganov/whisper.cpp
whisper.cpp 项目里提供了几个现成的模型。建议下载 small 以上的模型,不然识别效果完全无法使用。
https://huggingface.co/ggerganov/whisper.cpp
ggerganov/whisper.cpp
OpenAI's Whisper models converted to ggml format
Available models
Model Disk Mem SHA
tiny 75 MB ~390 MB bd577a113a864445d4c299885e0cb97d4ba92b5f
tiny.en 75 MB ~390 MB c78c86eb1a8faa21b369bcd33207cc90d64ae9df
base 142 MB ~500 MB 465707469ff3a37a2b9b8d8f89f2f99de7299dac
base.en 142 MB ~500 MB 137c40403d78fd54d454da0f9bd998f78703390c
small 466 MB ~1.0 GB 55356645c2b361a969dfd0ef2c5a50d530afd8d5
small.en 466 MB ~1.0 GB db8a495a91d927739e50b3fc1cc4c6b8f6c2d022
medium 1.5 GB ~2.6 GB fd9727b6e1217c2f614f9b698455c4ffd82463b4
medium.en 1.5 GB ~2.6 GB 8c30f0e44ce9560643ebd10bbe50cd20eafd3723
large-v1 2.9 GB ~4.7 GB b1caaf735c4cc1429223d5a74f0f4d0b9b59a299
large-v2 2.9 GB ~4.7 GB 0f4c8e34f21cf1a914c59d8b3ce882345ad349d6
large 2.9 GB ~4.7 GB ad82bf6a9043ceed055076d0fd39f5f186ff8062
note: large corresponds to the latest Large v3 model
For more information, visit:
https://github.com/ggerganov/whisper.cpp/tree/master/models
https://huggingface.co/ggerganov/whisper.cpp/tree/main
参考资料:
https://www.toutiao.com/article/7225218604160418338/?app=news_article×tamp=1706803458&use_new_style=1&req_id=2024020200041726E9258609E554857D25&group_id=7225218604160418338&tt_from=mobile_qq&utm_source=mobile_qq&utm_medium=toutiao_android&utm_campaign=client_share&share_token=37e094d5-29b8-4d14-87bb-241cdc28b0ea&source=m_redirect
AI浪潮下的12大开源神器介绍
原创2023-04-23 20:33·IT小熊实验室丶
https://blog.csdn.net/sinat_18131557/article/details/130950719?share_token=25ca6bb5-8450-472c-9228-abc8c6ce74d8
whisper.cpp在Windows VS的编译
sinat_18131557 于 2023-05-30 16:03:53 发布
https://www.toutiao.com/article/7283079784329052726/?app=news_article×tamp=1706803297&use_new_style=1&req_id=20240202000137411974769524167990E0&group_id=7283079784329052726&tt_from=mobile_qq&utm_source=mobile_qq&utm_medium=toutiao_android&utm_campaign=client_share&share_token=b7961b29-d87a-4b6c-bb8e-c7c213388390&source=m_redirect
【往期回顾】Github开源项目月刊精选-2023年8月
原创2023-09-27 08:30·Github推荐官
https://blog.csdn.net/weixin_45533131/article/details/132817683?share_token=72d8a161-4d49-4795-ad21-2ce5e2e4b197
在Linux(Centos7)上编译whisper.cpp的详细教程
https://blog.csdn.net/u012234115/article/details/134668510?share_token=e3835a0d-ac3b-4c86-9e32-e79ec85cddbe
开源C++智能语音识别库whisper.cpp开发使用入门
https://www.toutiao.com/article/7276732434920653312/?app=news_article×tamp=1706802934&use_new_style=1&req_id=2024020123553463D3509B1706BC79D479&group_id=7276732434920653312&tt_from=mobile_qq&utm_source=mobile_qq&utm_medium=toutiao_android&utm_campaign=client_share&share_token=7bcb7488-a03d-4291-96fb-d0835ac76cca&source=m_redirect
OpenAI的whisper的c/c++ 版本体验
首先下载代码,注:我的OS环境是ubuntu 18.04。
https://post.smzdm.com/p/a3052kz7/?share_token=d4057cba-adb0-4c91-8a8b-d8a7adcf4087
显卡怎么玩 篇三:音频转字幕神器whisper升级版,whisper-webui使用教程
https://www.toutiao.com/article/7311876528407921162/?app=news_article×tamp=1706801102&use_new_style=1&req_id=20240201232501647517150775FC7AD89A&group_id=7311876528407921162&tt_from=mobile_qq&utm_source=mobile_qq&utm_medium=toutiao_android&utm_campaign=client_share&share_token=dfa1976e-9422-49d2-a73b-6453becea90c&source=m_redirect
2023 AI 界7个最火的 Text-to-Video 模型
动画
https://www.toutiao.com/article/7312473532829745700/?app=news_article×tamp=1706801052&use_new_style=1&req_id=2024020123241265D9BE3F954EB979A010&group_id=7312473532829745700&tt_from=mobile_qq&utm_source=mobile_qq&utm_medium=toutiao_android&utm_campaign=client_share&share_token=ca5d0d2a-2d9b-4959-b5c0-3dd869555240&source=m_redirect
推荐5款本周 超火 的开源AI项目
原创2023-12-15 07:32·程序员梓羽同学
https://blog.csdn.net/chenlu5201314/article/details/131156770?share_token=b8796ff0-44f8-471a-af6d-c1bc7ca57002
【开源工具】使用Whisper提取视频、语音的字幕
1、下载安装包Assets\WhisperDesktop.zip
https://www.toutiao.com/article/7222852915286016544/?app=news_article×tamp=1706460752&use_new_style=1&req_id=2024012900523164164830D4E1ECF3CCE2&group_id=7222852915286016544&tt_from=mobile_qq&utm_source=mobile_qq&utm_medium=toutiao_android&utm_campaign=client_share&share_token=9bc8621f-b3b1-4f49-ae20-5214c1254515&source=m_redirect
从零开始,手把手教本地部署Stable Diffusion AI绘画 V3版 (Win最新)
原创2023-04-17 11:23·觉悟之坡
https://blog.csdn.net/S_eashell/article/details/135258411?share_token=f998e896-6dff-4fd4-8df2-c6aae132e95c
98秒转录2.5小时音频,最强音频转文字软件insanely-fast-whisper下载部署
老艾的AI世界 已于 2024-01-05 20:20:51 修改
相关文章:

20240202在WIN10下使用whisper.cpp
20240202在WIN10下使用whisper.cpp 2024/2/2 14:15 【结论:在Windows10下,确认large模式识别7分钟中文视频,需要83.7284 seconds,需要大概1.5分钟!效率太差!】 83.7284/4200.1993533333333333333333333333…...

【Linux】基本指令(上)
🦄个人主页:修修修也 🎏所属专栏:Linux ⚙️操作环境:Xshell (操作系统:CentOS 7.9 64位) 目录 Xshell快捷键 Linux基本指令 ls指令 pwd指令 cd指令 touch指令 mkdir指令 rmdir指令/rm指令 结语 Xshell快捷键 AltEnter 全屏/取消全屏 Tab 进…...
【DB2】—— 一次关于db2 sqlcode -420 22018的记录
情况描述 在DB2 10.5数据库中执行以下SQL语句: SELECT * FROM aa WHERE aa.ivc_typ IN (213,123,12334,345)其中aa.ivc_typ列的类型为VARCHAR(10) 关于执行会发生以下情况 类型转换:SQL引擎会尝试把IN列表中的整数常量转换为VARCHAR(10)类型…...

账簿和明细账
目录 一. 账簿的意义和种类二. 明细账 \quad 一. 账簿的意义和种类 \quad 账簿是由一定格式、互有联系的账页组成,以审核无误的会计凭证为依据,用来序时地、分类地记录和反映各项经济业务的会计簿籍(或称账本)。设置和登记账簿是会计工作的重…...

C# Onnx GroundingDINO 开放世界目标检测
目录 介绍 效果 模型信息 项目 代码 下载 介绍 地址:https://github.com/IDEA-Research/GroundingDINO Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection" 效果 …...

PyCharm / DataSpell 导入WSL2 解析器,实现GPU加速
PyCharm / DataSpell 导入WSL2 解析器的实现 Windows的解析器不好么?设置WSL2和实现GPU加速为PyCharm / DataSpell 设置WSL解析器设置Interpreter Windows的解析器不好么? Windows上的解析器的确很方便,也省去了我们很多的麻烦。但是WSL2的解…...
Android矩阵Matrix裁切setRectToRect拉伸Bitmap替代Bitmap.createScaledBitmap缩放,Kotlin
Android矩阵Matrix裁切setRectToRect拉伸Bitmap替代Bitmap.createScaledBitmap缩放,Kotlin class MyImageView : AppCompatImageView {private var mSrcBmp: Bitmap? nullprivate var testIV: ImageView? nullconstructor(ctx: Context, attrs: AttributeSet) :…...

TensorFlow2实战-系列教程11:RNN文本分类3
🧡💛💚TensorFlow2实战-系列教程 总目录 有任何问题欢迎在下面留言 本篇文章的代码运行界面均在Jupyter Notebook中进行 本篇文章配套的代码资源已经上传 6、构建训练数据 所有的输入样本必须都是相同shape(文本长度,…...

故障诊断 | 一文解决,RF随机森林的故障诊断(Matlab)
效果一览 文章概述 故障诊断 | 一文解决,RF随机森林的故障诊断(Matlab) 模型描述 随机森林(Random Forest)是一种集成学习(Ensemble Learning)方法,常用于解决分类和回归问题。它由多个决策树组成,每个决策树都独立地对数据进行训练,并且最终的预测结果是由所有决策…...
DAO设计模式
概念:DAO(Data Access Object) 数据库访问对象,**面向数据库SQL操作**的封装。 (一)场景 问题分析 在实际开发中,针对一张表的复杂业务功能通常需要和表交互多次(比如转账)。如果每次针对表的…...
【Midjourney】新手指南:参数设置
1.--aspect 或 --ar 用于设置图片长宽比,例如 --ar 16:9就是设置图片宽为16,高为9 2.--chaos 用于设置躁点,噪点值越高随机性越大,取值为0到100,例如 --chaos 50 3.--turbo 覆盖seetings的设置并启用极速模式生成…...

阿里云a10GPU,centos7,cuda11.2环境配置
Anaconda3-2022.05-Linux-x86_64.sh gcc升级 centos7升级gcc至8.2_centos7 yum gcc8.2.0-CSDN博客 paddlepaddle python -m pip install paddlepaddle-gpu2.5.1.post112 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html 报错 ImportError: libssl.so…...

RTSP/Onvif协议视频平台EasyNVR激活码授权异常该如何解决
TSINGSEE青犀视频安防监控平台EasyNVR可支持设备通过RTSP/Onvif协议接入,并能对接入的视频流进行处理与多端分发,包括RTSP、RTMP、HTTP-FLV、WS-FLV、HLS、WebRTC等多种格式。在智慧安防等视频监控场景中,EasyNVR可提供视频实时监控直播、云端…...
React16源码: React中event事件对象的创建过程源码实现
event 对象 1 ) 概述 在生产事件对象的过程当中,要去调用每一个 possiblePlugin.extractEvents 方法现在单独看下这里面的细节过程,即如何去生产这个事件对象的过程 2 )源码 定位到 packages/events/EventPluginHub.js#L172 f…...

深度学习(12)--Mnist分类任务
一.Mnist分类任务流程详解 1.1.引入数据集 Mnist数据集是官方的数据集,比较特殊,可以直接通过%matplotlib inline自动下载,博主此处已经完成下载,从本地文件中引入数据集。 设置数据路径 from pathlib import Path# 设置数据路…...

AI工具【OCR 01】Java可使用的OCR工具Tess4J使用举例(身份证信息识别核心代码及信息提取方法分享)
Java可使用的OCR工具Tess4J使用举例 1.简介1.1 简单介绍1.2 官方说明 2.使用举例2.1 依赖及语言数据包2.2 核心代码2.3 识别身份证信息2.3.1 核心代码2.3.2 截取指定字符2.3.3 去掉字符串里的非中文字符2.3.4 提取出生日期(待优化)2.3.5 实测 3.总结 1.简…...
【MySQL复制】半同步复制
介绍 除了内置的异步复制之外,MySQL 5.7 还支持通过插件实现的半同步复制接口。本节讨论半同步复制的概念及其工作原理。接下来的部分将涵盖与半同步复制相关的管理界面,以及如何安装、配置和监控它。 异步复制 MySQL 复制默认是异步的。源服务器将事…...
PHP面试知识点--echo、print、print_r、var_dump区别
echo、print、print_r、var_dump 区别 echo 输出单个或多个字符,多个使用逗号分隔无返回值 echo "String 1", "String 2";print 只可以输出单个字符返回1,因此可用于表达式 print "Hello"; if ($expr && pri…...

centos 7 部署若依前后端分离项目
目录 一、新建数据库 二、修改需求配置 1.修改数据库连接 2.修改Redis连接信息 3.文件路径 4.日志存储路径调整 三、编译后端项目 四、编译前端项目 1.上传项目 2.安装依赖 3.构建生产环境 五、项目部署 1.创建目录 2.后端文件上传 3. 前端文件上传 六、服务启…...

RFID手持终端_智能pda手持终端设备定制方案
手持终端是一款多功能、适用范围广泛的安卓产品,具有高性能、大容量存储、高端扫描头和全网通数据连接能力。它能够快速平稳地运行,并提供稳定的连接表现和快速的响应时,适用于医院、物流运输、零售配送、资产盘点等苛刻的环境。通过快速采集…...

(LeetCode 每日一题) 3442. 奇偶频次间的最大差值 I (哈希、字符串)
题目:3442. 奇偶频次间的最大差值 I 思路 :哈希,时间复杂度0(n)。 用哈希表来记录每个字符串中字符的分布情况,哈希表这里用数组即可实现。 C版本: class Solution { public:int maxDifference(string s) {int a[26]…...
变量 varablie 声明- Rust 变量 let mut 声明与 C/C++ 变量声明对比分析
一、变量声明设计:let 与 mut 的哲学解析 Rust 采用 let 声明变量并通过 mut 显式标记可变性,这种设计体现了语言的核心哲学。以下是深度解析: 1.1 设计理念剖析 安全优先原则:默认不可变强制开发者明确声明意图 let x 5; …...

CTF show Web 红包题第六弹
提示 1.不是SQL注入 2.需要找关键源码 思路 进入页面发现是一个登录框,很难让人不联想到SQL注入,但提示都说了不是SQL注入,所以就不往这方面想了 先查看一下网页源码,发现一段JavaScript代码,有一个关键类ctfs…...

【Java_EE】Spring MVC
目录 Spring Web MVC 编辑注解 RestController RequestMapping RequestParam RequestParam RequestBody PathVariable RequestPart 参数传递 注意事项 编辑参数重命名 RequestParam 编辑编辑传递集合 RequestParam 传递JSON数据 编辑RequestBody …...

C# 求圆面积的程序(Program to find area of a circle)
给定半径r,求圆的面积。圆的面积应精确到小数点后5位。 例子: 输入:r 5 输出:78.53982 解释:由于面积 PI * r * r 3.14159265358979323846 * 5 * 5 78.53982,因为我们只保留小数点后 5 位数字。 输…...

MySQL 知识小结(一)
一、my.cnf配置详解 我们知道安装MySQL有两种方式来安装咱们的MySQL数据库,分别是二进制安装编译数据库或者使用三方yum来进行安装,第三方yum的安装相对于二进制压缩包的安装更快捷,但是文件存放起来数据比较冗余,用二进制能够更好管理咱们M…...

Linux nano命令的基本使用
参考资料 GNU nanoを使いこなすnano基础 目录 一. 简介二. 文件打开2.1 普通方式打开文件2.2 只读方式打开文件 三. 文件查看3.1 打开文件时,显示行号3.2 翻页查看 四. 文件编辑4.1 Ctrl K 复制 和 Ctrl U 粘贴4.2 Alt/Esc U 撤回 五. 文件保存与退出5.1 Ctrl …...

三分算法与DeepSeek辅助证明是单峰函数
前置 单峰函数有唯一的最大值,最大值左侧的数值严格单调递增,最大值右侧的数值严格单调递减。 单谷函数有唯一的最小值,最小值左侧的数值严格单调递减,最小值右侧的数值严格单调递增。 三分的本质 三分和二分一样都是通过不断缩…...
命令行关闭Windows防火墙
命令行关闭Windows防火墙 引言一、防火墙:被低估的"智能安检员"二、优先尝试!90%问题无需关闭防火墙方案1:程序白名单(解决软件误拦截)方案2:开放特定端口(解决网游/开发端口不通)三、命令行极速关闭方案方法一:PowerShell(推荐Win10/11)方法二:CMD命令…...

Selenium 查找页面元素的方式
Selenium 查找页面元素的方式 Selenium 提供了多种方法来查找网页中的元素,以下是主要的定位方式: 基本定位方式 通过ID定位 driver.find_element(By.ID, "element_id")通过Name定位 driver.find_element(By.NAME, "element_name"…...