Tutorial - Live VLM WebUI
Live VLM WebUI is a convenient interface for evaluating Vision Language Model in real-time;
- ๐ฅ Multi-source video input
- WebRTC webcam streaming (stable)
- ๐งช RTSP IP camera support (Beta)
- ๐ OpenAI-compatible API - Works with Ollama, vLLM, NIM, or any vision API
- ๐ง Flexible deployment - VLM backend : local inference or cloud APIs
- โ๏ธ Interactive prompt editor - 8 preset prompts + custom prompts
- โก Async processing - Smooth video while VLM processes frames in background
What you need
-
One of the following Jetson devices:
Jetson AGX Thor Developer Kit Jetson AGX Orin 64GB Developer Kit Jetson AGX Orin (32GB) Developer Kit Jetson Orin Nano 8GB Developer Kit
-
Running one of the following versions of JetPack :
JetPack 6 (L4T r36.x) JetPack 7 (L4T r38.x)
-
NVMe SSD highly recommended for storage speed and space
-
About
4 GBforlive-vlm-webuicontainer - Need space for pulling (downloading) models on Ollama (if you choose to run Ollama locally)
-
About
Overview
Vision Language Models are now available in sizes suitable for edge deployment, from 4b to 90b+ parameters. Open-weight models include Qwen 2.5/3 VL, Gemma 3, Llama 3.2/4 Vision, Phi-3.5-vision, and NVIDIA's Cosmos-Reason1 and Nemotron Nano VL.
Example of Available Open-Weight VLM Models
| Creator | Model Name | Sizes |
|---|---|---|
| Alibaba | Qwen 2.5 VL | 3b, 7b, 32b, 72b |
| Alibaba | Qwen 3 VL | 2b, 4b, 8b, 30b, 32b, 235b |
| Gemma 3 | 4b, 12b, 27b | |
| Meta | Llama 3.2-Vision | 11b, 90b |
| Meta | Llama 4 | 16x17b, 128x17b |
| Microsoft | Phi-3.5-vision | 4.2b |
| NVIDIA | Cosmos-Reason1 | 7b |
| NVIDIA | Nemotron Nano 12B V2 VL | 12b |
Testing VLMs in real-time presents specific challenges:
- Web interfaces like Open WebUI require manual image uploads (no streaming)
- Lack of unified benchmarking tools across platforms
- Limited real-time GPU monitoring integration
Live VLM WebUI addresses these gaps by providing a WebRTC-based interface for real-time vision inference testing with integrated system monitoring.
๐บ Demo Video
Watch the Live VLM WebUI in action:
๐ Getting Started
Step 1: Prepare VLM Backend (Ollama)
First, install Ollama as the backend to serve VLMs. Ollama official installer supports Jetson.
# Ollama installer for Linux/Mac
curl -fsSL https://ollama.com/install.sh | sh
# Download recommended model (lightweight)
ollama pull gemma3:4b
# Or other vision models
# ollama pull llama3.2-vision:11b
# ollama pull qwen2.5-vl:7b
Jetson Thor (JetPack 7.0) Users
Ollama 0.12.10 has GPU inference issues on Jetson Thor. Use version 0.12.9:
curl -fsSL https://ollama.com/install.sh | OLLAMA_VERSION=0.12.9 sh
Step 2: Install Live VLM WebUI
git clone https://github.com/nvidia-ai-iot/live-vlm-webui.git
cd live-vlm-webui
./scripts/start_container.sh
Step 3: Access the Web Interface
Open your web browser and navigate to:
-
Local URL
:
https://localhost:8090(if running browser on same machine) -
Network URL
:
https://<IP_ADDRESS>:8090(from another device on same network)
Local URL: https://localhost:8090
Network URL: https://10.110.50.252:8090
Jetson Orin Nano Users
For Jetson Orin Nano Developer Kit, it's recommended to use a PC connected on the same network and access the web UI remotely for better performance.
๐ How to use Live VLM WebUI
Step 4: Configure and Start
Accept the Self-signed SSL Certificate
-
Click " Advanced " button
-
Click " Proceed to
(unsafe) "
-
Allow camera access when prompted
Verify VLM API Configuration
The interface auto-detects local VLM backends. Verify in the " VLM API Configuration " section:
-
API Endpoint
:
http://localhost:11434/v1(Ollama) orhttp://localhost:8000/v1(vLLM) -
Model
: Selected model name (e.g.,
gemma3:4b)
For cloud APIs, manually enter the endpoint and API key.
Start Camera and VLM Analysis
Click " Start Camera and Start VLM Analysis " and grant camera permissions. The interface begins streaming and analyzing frames based on the configured interval.
โจ Main Features
Real-time Display
The main pane on the right displays the following information:
1. Real-time AI Analysis Results
The VLM Output Info section shows:
- Top-right : Inference latency, average latency, total analysis count
- Current VLM model being used
- Current prompt text
- Analysis result text (with Markdown support)
2. Live Video Stream
WebRTC streams your webcam video with:
- Mirror button (top-left) - Convenient mirror display when facing the camera
- Overlay text - Analysis results overlaid on video (if enabled in settings)
3. GPU/CPU Monitoring
Real-time hardware usage monitoring:
-
GPU Usage
(using
jtop) -
VRAM Usage
(using
jtop) - CPU Usage
- System RAM
Detailed Settings
The left settings menu allows you to customize configuration and behavior.
1. Prompt Customization
The most important setting is the Prompt Editor at the bottom of the left menu.
Quick Prompts - 8 preset prompts ready to use:
- Scene Description : "Describe what you see in this image in one sentence."
- Object Detection : "List all objects you can see in this image, separated by commas."
- Activity Recognition : "Describe the person's activity and what they are doing."
- Safety Monitoring : "Are there any safety hazards visible? Answer with 'ALERT: description' or 'SAFE'."
- Emotion Detection : "Describe the facial expressions and emotions of people visible."
- Accessibility : "Provide a detailed description of the scene for a visually impaired person."
- OCR / Text Recognition : "Read and transcribe any text visible in the image."
- Yes/No Question : "Answer with Yes or No only: Is there a person visible?"
Custom Prompt - Enter your own unique prompts in the Custom Prompt field.
Tip
Many models, including
gemma3:4b
, support multiple languages. Try instruct the model in different language and output in the language.
Real-time Prompt Engineering
You can perform what we might call "real-time vision prompt engineering."
For example, using the object detection prompt as-is might produce:
"Here is a list of objects I see in the image: person, desk, monitor..."
The model helpfully includes a preamble. If you want to feed this directly as CSV to a downstream application, this is inconvenient, so you can customize the prompt:
List all objects you can see in this image, separated by commas.
Do not start with "Here is a list of..."
This suppresses the preamble. This real-time evaluation capability is one of the tool's unique advantages.
2. Backend Configuration
The VLM API Configuration at the top of the left menu supports multiple VLM backends.
API Base URL
- Auto-detects if Ollama, vLLM, or SGLang is running locally on the same machine
- If not detected, you can specify cloud APIs (NVIDIA API Catalog is set by default)
- When using cloud APIs, an API Key field appears for entering your credentials
Model Selection
Queries the specified API and lists available models.
3. Camera Settings
The Camera and App Control section in the middle of the left menu handles camera selection and related settings.
Camera Selection
Lists all cameras detected by your browser on the client PC. Most modern laptops should have a front-facing camera available. If using Jetson directly with a desktop browser, connect a USB camera.
You can switch cameras even during VLM analysis.
Frame Processing Interval
WebRTC camera frame processing (simply echoing frames from browser to UI server) and VLM analysis processing run asynchronously.
By default, after VLM analysis completes, it waits for the next frame number divisible by 30 before starting the next analysis.
If you want to slow down the analysis pace to have more time to read each result, increase this number.
RTSP Stream (Beta)
Tentative support for RTSP streams from IP surveillance cameras.
Testing hasn't been extensive across many cameras, so please report issues on GitHub if you encounter problems.
๐งช Verified Platforms
| Platform | GPU | Status | Inference Speed |
|---|---|---|---|
| PC (RTX 6000 Ada) | RTX 6000 Ada | โ | <1 sec/frame (gemma3:4b) |
| Jetson Orin Nano 8GB | 1024-core Ampere | โ | 7-8 sec/frame (gemma3:4b) |
| Jetson Thor 128GB | 2560-core Blackwell | โ | 1-2 sec/frame (llama3.2-vision:11b) |
| DGX Spark | 6144-core Blackwell | โ | 1-2 sec/frame (llama3.2-vision:11b) |
| Mac (M3) | Apple Silicon | โ | 2-4 sec/frame (gemma3:4b) |
| Windows (WSL2) | RTX A3000 | โ | 2-4 sec/frame (gemma3:4b) |
Jetson Orin Nano Performance
Even the most affordable Jetson, the
Jetson Orin Nano Developer Kit
, successfully runs
gemma3:4b
via Ollama!
While each frame takes 7-8 seconds to process, it runs continuously, opening up possibilities for real-time systems.
Use Cases
Model Benchmarking
Test and compare VLM performance across different scenarios:
- Frame processing latency measurement
- Object detection accuracy and spatial reasoning
- Structured output capabilities (JSON, CSV)
- OCR performance evaluation
- Multi-language support testing
Use real-time GPU/CPU metrics to compare models on the same hardware or evaluate the same model across platforms.
Robotics Applications
VLM-based vision for robotic systems:
- Object recognition and spatial reasoning
- Scene understanding for navigation
- Human-robot interaction scenarios
Prototyping and Development
Rapid iteration on vision-based applications:
- Real-time prompt engineering and testing
- Integration with existing inference backends
- Reference implementation for custom deployments
Computer Vision Pipeline Alternatives
VLMs can replace or augment traditional CV pipelines in certain applications.
NVIDIA's
Video Search and Summarization (VSS)
demonstrates this approach in smart city deployments.
๐๏ธ Development: Building Your Own App
Live VLM WebUI is open-source and can be extended or integrated into your own applications.
Architecture Overview
The system consists of three main components:
- Frontend (Web UI) - HTML/CSS/JavaScript with WebRTC for video streaming
-
Backend Server
- Python-based WebRTC server using
aiortc - VLM Backend - Any OpenAI-compatible API (Ollama, vLLM, SGLang, cloud APIs)
Key Python Modules
-
server.py- Main WebRTC server with WebSocket support -
video_processor.py- Video frame processing and VLM integration -
gpu_monitor.py- Cross-platform GPU/system monitoring (Jetson support via jetson-stats) -
vlm_service.py- VLM API integration with async support
Example: Custom Video Processing
You can extend
video_processor.py
to add custom frame processing:
# Custom pre-processing before sending to VLM
def preprocess_frame(self, frame):
# Add your custom image processing here
# e.g., edge detection, filtering, ROI extraction
processed = your_custom_function(frame)
return processed
Example: Custom Prompts via API
You can programmatically send prompts to the VLM backend:
import aiohttp
import base64
async def analyze_image(image_path, prompt):
with open(image_path, 'rb') as f:
image_data = base64.b64encode(f.read()).decode('utf-8')
async with aiohttp.ClientSession() as session:
async with session.post(
'http://localhost:11434/v1/chat/completions',
json={
'model': 'gemma3:4b',
'messages': [{
'role': 'user',
'content': [
{'type': 'text', 'text': prompt},
{'type': 'image_url', 'image_url': {'url': f'data:image/jpeg;base64,{image_data}'}}
]
}]
}
) as resp:
result = await resp.json()
return result['choices'][0]['message']['content']
Integration with ROS 2
For robotics applications, you can integrate Live VLM WebUI with ROS 2:
- Subscribe to ROS 2 camera topics
- Convert ROS Image messages to OpenCV format
- Feed frames to the VLM processing pipeline
- Publish VLM responses as ROS 2 String messages
See the jetson-containers repository for ROS 2 integration examples
๐งฑ Project Structure
live-vlm-webui/
โโโ src/
โ โโโ live_vlm_webui/ # Main Python package
โ โโโ __init__.py # Package initialization
โ โโโ server.py # WebRTC server with WebSocket
โ โโโ video_processor.py # Frame processing & VLM integration
โ โโโ gpu_monitor.py # GPU/system monitoring (Jetson support)
โ โโโ vlm_service.py # VLM API client
โ โโโ static/
โ โโโ index.html # Frontend web UI
โ
โโโ scripts/ # Utility scripts
โ โโโ start_server.sh # Quick start with SSL
โ โโโ start_container.sh # Docker launcher (auto-detection)
โ โโโ stop_container.sh # Stop Docker container
โ โโโ generate_cert.sh # SSL certificate generator
โ
โโโ docker/ # Docker configurations
โ โโโ Dockerfile.jetson-orin # Jetson Orin (JetPack 6)
โ โโโ Dockerfile.jetson-thor # Jetson Thor (JetPack 7)
โ โโโ docker-compose.yml # Multi-service stack
โ
โโโ docs/ # Documentation
โ โโโ setup/ # Setup guides
โ โโโ usage/ # Usage guides
โ โโโ troubleshooting.md # Common issues & solutions
โ
โโโ pyproject.toml # Python package configuration
โโโ requirements.txt # Python dependencies
โโโ README.md # Main documentation
Summary
Live VLM WebUI provides real-time VLM testing capabilities with the following features:
- WebRTC-based low-latency video streaming
- Cross-platform support (x86_64, ARM64, Apple Silicon)
- Multiple backend support (Ollama, vLLM, SGLang, cloud APIs)
- Integrated GPU/CPU monitoring
- Preset and custom prompt support
- Apache 2.0 license
Use Cases:
- VLM performance benchmarking across hardware platforms
- Real-time inference testing with Ollama or vLLM
- Edge AI application prototyping on Jetson
- Vision model evaluation and comparison
๐ซ Troubleshooting
Ollama GPU Error on Jetson Thor (JetPack 7.0)
Problem: Ollama 0.12.10 fails with GPU inference on Jetson Thor
Solution: Downgrade to Ollama 0.12.9:
# Stop and remove current Ollama
sudo systemctl stop ollama
sudo rm /usr/local/bin/ollama
# Install Ollama 0.12.9
curl -fsSL https://ollama.com/install.sh | OLLAMA_VERSION=0.12.9 sh
Camera Not Accessible
Problem: Browser cannot access webcam or shows "Permission Denied"
Solution: Ensure you're using HTTPS (not HTTP):
-
The
start_container.shscript enables HTTPS by default - Accept the self-signed certificate warning in your browser (click "Advanced" โ "Proceed")
- Modern browsers require HTTPS for webcam access
Cannot Connect to VLM Backend
Problem: "Failed to connect" or "Connection refused" errors
Solution:
-
Verify VLM is running:
# For Ollama curl http://localhost:11434/v1/models # For vLLM curl http://localhost:8000/v1/models -
Check firewall settings:
sudo ufw allow 11434 # Ollama sudo ufw allow 8000 # vLLM -
If using Docker, ensure network mode:
# Use host network to access local services docker run --network host ...
GPU Stats Show "N/A"
Problem: GPU monitoring shows "N/A" for all metrics
Solution for Jetson:
-
Ensure jetson-stats is installed:
sudo pip3 install -U jetson-stats sudo reboot -
Grant container access to jtop socket:
# Already included in start_container.sh docker run -v /run/jtop.sock:/run/jtop.sock:ro ... -
For Jetson Thor, install from GitHub:
sudo pip3 install --break-system-packages git+https://github.com/rbonghi/jetson_stats.git sudo jtop --install-service sudo reboot
Slow Performance / Low FPS
Problem: Video is laggy or VLM responses are slow
Solutions:
-
Use a smaller model:
ollama pull gemma3:4b # Instead of gemma3:11b -
Increase Frame Processing Interval:
- In Settings, set "Frame Processing Interval" to 60+ frames
-
This reduces how often frames are analyzed
-
Reduce Max Tokens:
- Set "Max Tokens" to 50-100 instead of 512
-
Shorter responses = faster inference
-
Check system resources:
jtop # Monitor GPU/CPU usage
JetPack 5.x Not Supported
Problem: Python 3.8 compatibility issues on JetPack 5.x
Solution:
- Upgrade to JetPack 6.x or JetPack 7.0
- Or use Docker method which handles Python environment automatically
For more troubleshooting tips, see the official troubleshooting guide .
๐ Additional Resources
- GitHub Repository : https://github.com/NVIDIA-AI-IOT/live-vlm-webui
- PyPI Package : https://pypi.org/project/live-vlm-webui/
- Full Documentation : https://github.com/NVIDIA-AI-IOT/live-vlm-webui/tree/main/docs
- Docker Setup Guide : https://github.com/NVIDIA-AI-IOT/live-vlm-webui/blob/main/docs/setup/docker.md
- Ollama Documentation : https://ollama.ai/
- NVIDIA Jetson AI Lab : https://www.jetson-ai-lab.com/
- Jetson Containers : https://github.com/dusty-nv/jetson-containers
Community & Support:
Issues, PRs, and feedback are welcome on the GitHub repository ! โญ
If you find this project useful, please consider giving it a star on GitHub!