Skip to content

Tutorial - Introduction


Our tutorials are divided into categories roughly based on model modality, the type of data to be processed or generated.

Text (LLM)

text-generation-webui Interact with a local AI assistant by running a LLM with oobabooga's text-generaton-webui
Ollama Get started effortlessly deploying GGUF models for chat and web UI
llamaspeak Talk live with Llama using Riva ASR/TTS, and chat about images with Llava!
NanoLLM Optimized inferencing library for LLMs, multimodal agents, and speech.
Small LLM (SLM) Deploy Small Language Models (SLM) with reduced memory usage and higher throughput.
API Examples Learn how to write Python code for doing LLM inference using popular APIs.

Text + Vision (VLM)

Give your locally running LLM an access to vision!

Mini-GPT4 Mini-GPT4, an open-source model that demonstrate vision-language capabilities.
LLaVA Large Language and Vision Assistant, multimodal model that combines a vision encoder and LLM for visual and language understanding.
Live LLaVA Run multimodal models interactively on live video streams over a repeating set of prompts.
NanoVLM Use mini vision/language models and the optimized multimodal pipeline for live streaming.

Image Generation

Stable Diffusion Run AUTOMATIC1111's stable-diffusion-webui to generate images from prompts
Stable Diffusion XL A newer ensemble pipeline consisting of a base model and refiner that results in significantly enhanced and detailed image generation capabilities.

Vision Transformers (ViT)

EfficientVIT MIT Han Lab's EfficientViT, Multi-Scale Linear Attention for High-Resolution Dense Prediction
NanoOWL OWL-ViT optimized to run real-time on Jetson with NVIDIA TensorRT
NanoSAM NanoSAM, SAM model variant capable of running in real-time on Jetson
SAM Meta's SAM, Segment Anything model
TAM TAM, Track-Anything model, is an interactive tool for video object tracking and segmentation

Vector Database

NanoDB Interactive demo to witness the impact of Vector Database that handles multimodal data


Whisper OpenAI's Whisper, pre-trained model for automatic speech recognition (ASR)
AudioCraft Meta's AudioCraft, to produce high-quality audio and music
Voicecraft Voicecraft, Speech editing and zero shot TTS

Metropolis Microservices

First Steps Get Metropolis Microservices up & running on Jetson with NVStreamer and AI NVR capabilities.

About NVIDIA Jetson


We are mainly targeting Jetson Orin generation devices for deploying the latest LLMs and generative AI models.

Jetson AGX Orin 64GB Developer Kit Jetson AGX Orin Developer Kit Jetson Orin Nano Developer Kit

GPU 2048-core NVIDIA Ampere architecture GPU with 64 Tensor Cores 1024-core NVIDIA Ampere architecture GPU with 32 Tensor Cores
64GB 32GB 8GB
Storage 64GB eMMC (+ NVMe SSD) microSD card (+ NVMe SSD)