Skip to content

Tutorial - Whisper

Let's run OpenAI's Whisper, pre-trained model for automatic speech recognition on Jetson!

What you need

  1. One of the following Jetson devices:

    Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)

  2. Running one of the following versions of JetPack:

    JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)

  3. Sufficient storage space (preferably with NVMe SSD).

    • 6.1 GB for whisper container image
    • Space for checkpoints

Clone and set up jetson-containers

git clone https://github.com/dusty-nv/jetson-containers
cd jetson-containers
sudo apt update; sudo apt install -y python3-pip
pip3 install -r requirements.txt

How to start

Use run.sh and autotag script to automatically pull or build a compatible container image.

cd jetson-containers
./run.sh $(./autotag whisper)

The container has a default run command (CMD) that will automatically start the Jupyter Lab server, with SSL enabled.

Open your browser and access https://<IP_ADDRESS>:8888.

Attention

Note it is https (not http).

HTTPS (SSL) connection is needed to allow ipywebrtc widget to have access to your microphone (for record-and-transcribe.ipynb).

You will see a warning message like this.

Press "Advanced" button and then click on "Proceed to (unsafe)" link to proceed to the Jupyter Lab web interface.

The default password for Jupyter Lab is nvidia.

Run Jupyter notebooks

Whisper repo comes with demo Jupyter notebooks, which you can find under /notebooks/ directory.

jetson-containers also adds one convenient notebook (record-and-transcribe.ipynb) to record your audio sample on Jupyter notebook in order to run transcribe on your recorded audio.

record-and-transcribe.ipynb

This notebook is to let you record your own audio sample using your PC's microphone and apply Whisper's medium model to transcribe the audio sample.

It uses Jupyter notebook/lab's ipywebrtc extension to record an audio sample on your web browser.

Attention

When you click the ⏺ botton, your web browser may show a pop-up to ask you to allow it to use your microphone. Be sure to allow the access.

Final check

Once done, if you click on the "⚠ Not secure" part in the URL bar, you should see something like this.

Result

Once you go through all the steps, you should see the transcribe result in text like this.