stt, распознавание речи
https://github.com/SergeyShk/Speech-to-Text-Russian
Модели: https://alphacephei.com/vosk/models, в архиве файл graph/HCLG.fst
.
База многих STT-проектов
https://kaldi-asr.org/
https://github.com/kaldi-asr/kaldi
https://hub.docker.com/r/kaldiasr/kaldi
Другие варианты: DeepSpeech, Wav2letter, SpeechBrain, Coqui STT, Vosk.
https://alphacephei.com/vosk/index
usage: vosk-transcriber.exe [-h] [--model MODEL] [--list-models] [--list-languages] [--model-name MODEL_NAME] [--lang LANG] [--input INPUT] [--output OUTPUT] [--output-type OUTPUT_TYPE] [--log-level LOG_LEVEL] Transcribe audio file and save result in selected format optional arguments: -h, --help show this help message and exit --model MODEL, -m MODEL model path --list-models list available models --list-languages list available languages --model-name MODEL_NAME, -n MODEL_NAME select model by name --lang LANG, -l LANG select model by language --input INPUT, -i INPUT audiofile --output OUTPUT, -o OUTPUT optional output filename path --output-type OUTPUT_TYPE, -t OUTPUT_TYPE optional arg output data type --log-level LOG_LEVEL logging level
This filter uses PocketSphinx for speech recognition. To enable compilation of this filter, you need to configure FFmpeg with --enable-pocketsphinx