Содержание

Speech to text

stt, распознавание речи

https://github.com/SergeyShk/Speech-to-Text-Russian

Модели: https://alphacephei.com/vosk/models, в архиве файл graph/HCLG.fst.

Kaldi

База многих STT-проектов
https://kaldi-asr.org/
https://github.com/kaldi-asr/kaldi
https://hub.docker.com/r/kaldiasr/kaldi

Другие варианты: DeepSpeech, Wav2letter, SpeechBrain, Coqui STT, Vosk.

Vosk

https://alphacephei.com/vosk/index

usage: vosk-transcriber.exe [-h] [--model MODEL] [--list-models]
                            [--list-languages] [--model-name MODEL_NAME]
                            [--lang LANG] [--input INPUT] [--output OUTPUT]
                            [--output-type OUTPUT_TYPE]
                            [--log-level LOG_LEVEL]

Transcribe audio file and save result in selected format

optional arguments:
  -h, --help            show this help message and exit
  --model MODEL, -m MODEL
                        model path
  --list-models         list available models
  --list-languages      list available languages
  --model-name MODEL_NAME, -n MODEL_NAME
                        select model by name
  --lang LANG, -l LANG  select model by language
  --input INPUT, -i INPUT
                        audiofile
  --output OUTPUT, -o OUTPUT
                        optional output filename path
  --output-type OUTPUT_TYPE, -t OUTPUT_TYPE
                        optional arg output data type
  --log-level LOG_LEVEL
                        logging level

ffmpeg asr filter

This filter uses PocketSphinx for speech recognition. To enable compilation of this filter, you need to configure FFmpeg with --enable-pocketsphinx

https://ffmpeg.org/ffmpeg-all.html#asr