Skip to content

Tutorial - NanoOWL

Let's run NanoOWL, OWL-ViT optimized to run real-time on Jetson with NVIDIA TensorRT.

What you need

  1. One of the following Jetson:

    Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)

  2. Running one of the following versions of JetPack:

    JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)

  3. Sufficient storage space (preferably with NVMe SSD).

    • 7.2 GB for container image
    • Spaces for models

Clone and set up jetson-containers

git clone https://github.com/dusty-nv/jetson-containers
cd jetson-containers
sudo apt update; sudo apt install -y python3-pip
pip3 install -r requirements.txt

How to start

Use run.sh and autotag script to automatically pull or build a compatible container image.

cd jetson-containers
./run.sh $(./autotag nanoowl)

How to run the tree prediction (live camera) example

  1. Ensure you have a camera device connected

    ls /dev/video*
    

    If no video device is found, exit from the container and check if you can see a video device on the host side.

  2. Launch the demo

    cd examples/tree_demo
    python3 tree_demo.py ../../data/owl_image_encoder_patch32.engine
    

    Info

    If it fails to find or load the TensorRT engine file, build the TensorRT engine for the OWL-ViT vision encoder on your Jetson device.

    python3 -m nanoowl.build_image_encoder_engine \
        data/owl_image_encoder_patch32.engine
    
  3. Second, open your browser to http://<ip address>:7860

  4. Type whatever prompt you like to see what works!

    Here are some examples

    • Example: [a face [a nose, an eye, a mouth]]
    • Example: [a face (interested, yawning / bored)]
    • Example: (indoors, outdoors)

Result