Automatic Speech Recognition with NVIDIA NeMo

This an example of using Nvidia’s NeMo toolkit for creating Automatic Speech Recognition (ASR), Natural Language Understanding (NLU) or Text-to-Speech (TTS) pre-annotations.

With the NeMo ASR models, you can create audio pre-annotations with a text area, aka transcriptions.

Start using it

Follow this installation guide to set up the NeMo environment.
On the same server or Docker container as NeMo, install Label Studio.
Install the Label Studio machine learning backend. From the command line, run the following:
```
git clone https://github.com/heartexlabs/label-studio-ml-backend
```

Set up the Label Studio ML backend environment:

cd label-studio-ml-backend
# Install label-studio-ml and its dependencies
pip install -U -e .
# Install the nemo example dependencies
pip install -r label_studio_ml/examples/requirements.txt

Initialize the Label Studio machine learning backend with the ASR example

label-studio-ml init my_model --from label_studio_ml/examples/nemo/asr.py

Start the machine learning backend. By default, the model starts on localhost with port 9090.
```
label-studio-ml start my_model
```
Start Label Studio:
```
label-studio start my_project --init
```
In Label Studio, open the Settings page for your project and open the Labeling Interface section.

From the template list, select Automatic Speech Recognition. You can also create your own with <TextArea> and <Audio> tags. Or copy this labeling config into the Label Studio UI:

 <View>
  <Audio name="audio" value="url" zoom="true" hotkey="ctrl+enter" />
  <Header value="Provide Transcription" />
  <TextArea name="answer" transcription="true" toName="audio" rows="4" editable="true" maxSubmissions="1" />
</View>

In your project settings, open the Machine Learning page in the Label Studio UI.

note

It takes some time to download models from the NeMo engine. The Label Studio UI might hang until the models finish automatically downloading.

Click Add Model and add the ML backend using this URL: http://localhost:9090.
Import audio data and start reviewing pre-annotations.

Automatic Speech Recognition with NVIDIA NeMo

Start using it

In this article