stt, распознавание речи
https://github.com/SergeyShk/Speech-to-Text-Russian
Модели: https://alphacephei.com/vosk/models, в архиве файл graph/HCLG.fst.
База многих STT-проектов
https://kaldi-asr.org/
https://github.com/kaldi-asr/kaldi
https://hub.docker.com/r/kaldiasr/kaldi
Другие варианты: DeepSpeech, Wav2letter, SpeechBrain, Coqui STT, Vosk.
https://alphacephei.com/vosk/index
usage: vosk-transcriber.exe [-h] [--model MODEL] [--list-models]
[--list-languages] [--model-name MODEL_NAME]
[--lang LANG] [--input INPUT] [--output OUTPUT]
[--output-type OUTPUT_TYPE]
[--log-level LOG_LEVEL]
Transcribe audio file and save result in selected format
optional arguments:
-h, --help show this help message and exit
--model MODEL, -m MODEL
model path
--list-models list available models
--list-languages list available languages
--model-name MODEL_NAME, -n MODEL_NAME
select model by name
--lang LANG, -l LANG select model by language
--input INPUT, -i INPUT
audiofile
--output OUTPUT, -o OUTPUT
optional output filename path
--output-type OUTPUT_TYPE, -t OUTPUT_TYPE
optional arg output data type
--log-level LOG_LEVEL
logging level
This filter uses PocketSphinx for speech recognition. To enable compilation of this filter, you need to configure FFmpeg with --enable-pocketsphinx