Supported Models

Run the latest generative AI models on your Jetson device

Featured Models

Latest releases with day-0 support on Jetson

MiniMax M2.7

MiniMax's 230B agentic MoE flagship for software engineering and self-evolving agent harnesses with llama.cpp at 4-bit

MiniMax M2.7

Quick Start Runner

Inference Engine

Loading command...

Commands are auto-generated based on your configuration settings.

Details

Qwen3.6 35B-A3B (MoE)

Alibaba's latest Mixture-of-Experts model with 35B total / 3B active parameters, featuring native tool calling and MTP speculative decoding

Qwen3.6 35B-A3B (MoE)

Quick Start Runner

Inference Engine

Loading command...

Commands are auto-generated based on your configuration settings.

Details

Qwen3.6 27B

Alibaba's dense 27 billion parameter language model with native tool calling and MTP speculative decoding

Qwen3.6 27B

Quick Start Runner

Inference Engine

Loading command...

Commands are auto-generated based on your configuration settings.

Details

Nemotron 3 Nano Omni

VLM

NVIDIA's multimodal reasoning model with language, vision, audio, and video understanding — 30B total / 3B active MoE, available in NVFP4, FP8, and BF16.

Nemotron 3 Nano Omni

Quick Start Runner

Inference Engine

Loading command...

Commands are auto-generated based on your configuration settings.

Details

All Models

Browse by family cards or switch to a sortable table.

Google Gemma4

Gemma 4 E2B

Quick Start Runner

Inference Engine

Loading command...

Commands are auto-generated based on your configuration settings.

Details

Gemma 4 E4B

Quick Start Runner

Inference Engine

Loading command...

Commands are auto-generated based on your configuration settings.

Details

Gemma 4 26B-A4B

Quick Start Runner

Inference Engine

Loading command...

Commands are auto-generated based on your configuration settings.

Details

Gemma 4 31B

Quick Start Runner

Inference Engine

Loading command...

Commands are auto-generated based on your configuration settings.

Details

NVIDIA Nemotron

Nemotron3 Nano 4B

Quick Start Runner

Inference Engine

Loading command...

Commands are auto-generated based on your configuration settings.

Details

Nemotron3 Nano 30B-A3B

Quick Start Runner

Inference Engine

Loading command...

Commands are auto-generated based on your configuration settings.

Details

Nemotron Nano 9B v2

Quick Start Runner

Inference Engine

Loading command...

Commands are auto-generated based on your configuration settings.

Details

Nemotron Nano 12B VL

Quick Start Runner

Inference Engine

Loading command...

Commands are auto-generated based on your configuration settings.

Details

Nemotron 3 Nano Omni New

Nemotron 3 Nano Omni

Quick Start Runner

Inference Engine

Loading command...

Commands are auto-generated based on your configuration settings.

Details

NVIDIA Cosmos Reason

Cosmos Reason 1 7B

Quick Start Runner

Inference Engine

Loading command...

Commands are auto-generated based on your configuration settings.

Details

Cosmos Reason 2 2B

Details

Cosmos Reason 2 8B

Details

Alibaba Qwen3.5

Qwen3.5 35B-A3B (MoE)

Quick Start Runner

Inference Engine

Loading command...

Commands are auto-generated based on your configuration settings.

Details

Qwen3.5 27B

Quick Start Runner

Inference Engine

Loading command...

Commands are auto-generated based on your configuration settings.

Details

Qwen3.5 9B

Quick Start Runner

Inference Engine

Loading command...

Commands are auto-generated based on your configuration settings.

Details

Qwen3.5 4B

Quick Start Runner

Inference Engine

Loading command...

Commands are auto-generated based on your configuration settings.

Details

Qwen3.5 0.8B

Quick Start Runner

Inference Engine

Loading command...

Commands are auto-generated based on your configuration settings.

Details

OpenAI GPT OSS

GPT OSS 20B

Quick Start Runner

Inference Engine

Loading command...

Commands are auto-generated based on your configuration settings.

Details

GPT OSS 120B

Quick Start Runner

Inference Engine

Loading command...

Commands are auto-generated based on your configuration settings.

Details

Alibaba Qwen3

Qwen3 4B

Quick Start Runner

Inference Engine

Loading command...

Commands are auto-generated based on your configuration settings.

Details

Qwen3 8B

Quick Start Runner

Inference Engine

Loading command...

Commands are auto-generated based on your configuration settings.

Details

Qwen3 30B-A3B (MoE)

Quick Start Runner

Inference Engine

Loading command...

Commands are auto-generated based on your configuration settings.

Details

Qwen3 32B

Quick Start Runner

Inference Engine

Loading command...

Commands are auto-generated based on your configuration settings.

Details

Qwen3 VL 4B

Quick Start Runner

Inference Engine

Loading command...

Commands are auto-generated based on your configuration settings.

Details

Qwen3 VL 8B

Quick Start Runner

Inference Engine

Loading command...

Commands are auto-generated based on your configuration settings.

Details

Mistral AI Ministral 3

Ministral 3 3B Instruct

Quick Start Runner

Inference Engine

Loading command...

Commands are auto-generated based on your configuration settings.

Details

Ministral 3 8B Instruct

Quick Start Runner

Inference Engine

Loading command...

Commands are auto-generated based on your configuration settings.

Details

Ministral 3 14B Instruct

Quick Start Runner

Inference Engine

Loading command...

Commands are auto-generated based on your configuration settings.

Details

Ministral 3 3B Reasoning

Quick Start Runner

Inference Engine

Loading command...

Commands are auto-generated based on your configuration settings.

Details

Ministral 3 8B Reasoning

Quick Start Runner

Inference Engine

Loading command...

Commands are auto-generated based on your configuration settings.

Details

Ministral 3 14B Reasoning

Quick Start Runner

Inference Engine

Loading command...

Commands are auto-generated based on your configuration settings.

Details

Meta Llama 3

Llama 3.2 3B

Quick Start Runner

Inference Engine

Loading command...

Commands are auto-generated based on your configuration settings.

Details

Llama 3.1 8B

Quick Start Runner

Inference Engine

Loading command...

Commands are auto-generated based on your configuration settings.

Details

Llama 3.1 70B

Quick Start Runner

Inference Engine

Loading command...

Commands are auto-generated based on your configuration settings.

Details

Google Gemma3

FunctionGemma

Quick Start Runner

Inference Engine

Loading command...

Commands are auto-generated based on your configuration settings.

Details

Gemma 3 270M

Quick Start Runner

Inference Engine

Loading command...

Commands are auto-generated based on your configuration settings.

Details

Gemma 3 1B

Quick Start Runner

Inference Engine

Loading command...

Commands are auto-generated based on your configuration settings.

Details

Gemma 3 4B

Quick Start Runner

Inference Engine

Loading command...

Commands are auto-generated based on your configuration settings.

Details

Gemma 3 12B

Quick Start Runner

Inference Engine

Loading command...

Commands are auto-generated based on your configuration settings.

Details

Gemma 3 27B

Quick Start Runner

Inference Engine

Loading command...

Commands are auto-generated based on your configuration settings.

Details

MiniMax M2.7

MiniMax M2.7 New

MiniMax M2.7

Quick Start Runner

Inference Engine

Loading command...

Commands are auto-generated based on your configuration settings.

Details

Alibaba Qwen3.6

Qwen3.6 35B-A3B (MoE) New

Qwen3.6 35B-A3B (MoE)

Quick Start Runner

Inference Engine

Loading command...

Commands are auto-generated based on your configuration settings.

Details

Qwen3.6 27B New

Qwen3.6 27B

Quick Start Runner

Inference Engine

Loading command...

Commands are auto-generated based on your configuration settings.

Details

All models: filter by model name in the Model column; sort Model, Family, or VLM; use Modules and Inference engines checkboxes to filter further.
Filter by model name			Modules					Inference engines				Actions
Filter by model name			T5000	T4000	AGX Orin 64GB	Orin NX 16GB	Orin Nano 8GB	vLLM	Ollama	llama.cpp	Edge-LLM	Actions
Nemotron3 Nano 4B	NVIDIA Nemotron	—	✓	✓	✓	✓	✓	—	—	✓	—	Nemotron3 Nano 4B Quick Start Runner Inference Engine `Loading command...` Commands are auto-generated based on your configuration settings. Advanced configuration Authentication Hugging Face Token * This model requires a Hugging Face access token. The token is inserted into the command and never stored. vLLM Configuration Configure vLLM server parameters. Leave empty to use defaults. Port (--port) Max Model Length (--max-model-len) Maximum context length the model can handle GPU Memory Utilization (--gpu-memory-utilization) Fraction of GPU memory to use (0.1 - 1.0) Details
FunctionGemma	Google Gemma3	—	✓	✓	✓	✓	✓	—	—	✓	—	FunctionGemma Quick Start Runner Inference Engine `Loading command...` Commands are auto-generated based on your configuration settings. Advanced configuration Authentication Hugging Face Token * This model requires a Hugging Face access token. The token is inserted into the command and never stored. vLLM Configuration Configure vLLM server parameters. Leave empty to use defaults. Port (--port) Max Model Length (--max-model-len) Maximum context length the model can handle GPU Memory Utilization (--gpu-memory-utilization) Fraction of GPU memory to use (0.1 - 1.0) Details
Cosmos Reason 1 7B	NVIDIA Cosmos Reason	VLM	✓	✓	✓	✓	✓	✓	—	—	—	Cosmos Reason 1 7B Quick Start Runner Inference Engine `Loading command...` Commands are auto-generated based on your configuration settings. Advanced configuration Authentication Hugging Face Token * This model requires a Hugging Face access token. The token is inserted into the command and never stored. vLLM Configuration Configure vLLM server parameters. Leave empty to use defaults. Port (--port) Max Model Length (--max-model-len) Maximum context length the model can handle GPU Memory Utilization (--gpu-memory-utilization) Fraction of GPU memory to use (0.1 - 1.0) Details
Gemma 3 270M	Google Gemma3	—	✓	✓	✓	✓	✓	✓	✓	—	—	Gemma 3 270M Quick Start Runner Inference Engine `Loading command...` Commands are auto-generated based on your configuration settings. Advanced configuration Authentication Hugging Face Token * This model requires a Hugging Face access token. The token is inserted into the command and never stored. vLLM Configuration Configure vLLM server parameters. Leave empty to use defaults. Port (--port) Max Model Length (--max-model-len) Maximum context length the model can handle GPU Memory Utilization (--gpu-memory-utilization) Fraction of GPU memory to use (0.1 - 1.0) Details
Gemma 4 E2B	Google Gemma4	VLM	✓	✓	✓	✓	✓	✓	—	✓	—	Gemma 4 E2B Quick Start Runner Inference Engine `Loading command...` Commands are auto-generated based on your configuration settings. Advanced configuration Authentication Hugging Face Token * This model requires a Hugging Face access token. The token is inserted into the command and never stored. vLLM Configuration Configure vLLM server parameters. Leave empty to use defaults. Port (--port) Max Model Length (--max-model-len) Maximum context length the model can handle GPU Memory Utilization (--gpu-memory-utilization) Fraction of GPU memory to use (0.1 - 1.0) Details
GPT OSS 20B	OpenAI GPT OSS	—	✓	✓	✓	—	—	✓	—	—	—	GPT OSS 20B Quick Start Runner Inference Engine `Loading command...` Commands are auto-generated based on your configuration settings. Advanced configuration Authentication Hugging Face Token * This model requires a Hugging Face access token. The token is inserted into the command and never stored. vLLM Configuration Configure vLLM server parameters. Leave empty to use defaults. Port (--port) Max Model Length (--max-model-len) Maximum context length the model can handle GPU Memory Utilization (--gpu-memory-utilization) Fraction of GPU memory to use (0.1 - 1.0) Details
Llama 3.2 3B	Meta Llama 3	—	✓	✓	✓	✓	✓	✓	✓	—	—	Llama 3.2 3B Quick Start Runner Inference Engine `Loading command...` Commands are auto-generated based on your configuration settings. Advanced configuration Authentication Hugging Face Token * This model requires a Hugging Face access token. The token is inserted into the command and never stored. vLLM Configuration Configure vLLM server parameters. Leave empty to use defaults. Port (--port) Max Model Length (--max-model-len) Maximum context length the model can handle GPU Memory Utilization (--gpu-memory-utilization) Fraction of GPU memory to use (0.1 - 1.0) Details
MiniMax M2.7 New	MiniMax M2.7	—	✓	—	—	—	—	—	—	✓	—	MiniMax M2.7 Quick Start Runner Inference Engine `Loading command...` Commands are auto-generated based on your configuration settings. Advanced configuration Authentication Hugging Face Token * This model requires a Hugging Face access token. The token is inserted into the command and never stored. vLLM Configuration Configure vLLM server parameters. Leave empty to use defaults. Port (--port) Max Model Length (--max-model-len) Maximum context length the model can handle GPU Memory Utilization (--gpu-memory-utilization) Fraction of GPU memory to use (0.1 - 1.0) Details
Ministral 3 3B Instruct	Mistral AI Ministral 3	VLM	✓	✓	✓	✓	✓	✓	✓	—	—	Ministral 3 3B Instruct Quick Start Runner Inference Engine `Loading command...` Commands are auto-generated based on your configuration settings. Advanced configuration Authentication Hugging Face Token * This model requires a Hugging Face access token. The token is inserted into the command and never stored. vLLM Configuration Configure vLLM server parameters. Leave empty to use defaults. Port (--port) Max Model Length (--max-model-len) Maximum context length the model can handle GPU Memory Utilization (--gpu-memory-utilization) Fraction of GPU memory to use (0.1 - 1.0) Details
Nemotron3 Nano 30B-A3B	NVIDIA Nemotron	—	✓	✓	✓	✓	✓	✓	✓	—	—	Nemotron3 Nano 30B-A3B Quick Start Runner Inference Engine `Loading command...` Commands are auto-generated based on your configuration settings. Advanced configuration Authentication Hugging Face Token * This model requires a Hugging Face access token. The token is inserted into the command and never stored. vLLM Configuration Configure vLLM server parameters. Leave empty to use defaults. Port (--port) Max Model Length (--max-model-len) Maximum context length the model can handle GPU Memory Utilization (--gpu-memory-utilization) Fraction of GPU memory to use (0.1 - 1.0) Details
Qwen3 4B	Alibaba Qwen3	—	✓	✓	✓	✓	✓	✓	—	—	—	Qwen3 4B Quick Start Runner Inference Engine `Loading command...` Commands are auto-generated based on your configuration settings. Advanced configuration Authentication Hugging Face Token * This model requires a Hugging Face access token. The token is inserted into the command and never stored. vLLM Configuration Configure vLLM server parameters. Leave empty to use defaults. Port (--port) Max Model Length (--max-model-len) Maximum context length the model can handle GPU Memory Utilization (--gpu-memory-utilization) Fraction of GPU memory to use (0.1 - 1.0) Details
Qwen3.5 35B-A3B (MoE)	Alibaba Qwen3.5	—	✓	✓	✓	—	—	✓	—	—	—	Qwen3.5 35B-A3B (MoE) Quick Start Runner Inference Engine `Loading command...` Commands are auto-generated based on your configuration settings. Advanced configuration Authentication Hugging Face Token * This model requires a Hugging Face access token. The token is inserted into the command and never stored. vLLM Configuration Configure vLLM server parameters. Leave empty to use defaults. Port (--port) Max Model Length (--max-model-len) Maximum context length the model can handle GPU Memory Utilization (--gpu-memory-utilization) Fraction of GPU memory to use (0.1 - 1.0) Details
Qwen3.6 35B-A3B (MoE) New	Alibaba Qwen3.6	—	✓	✓	✓	—	—	✓	—	—	—	Qwen3.6 35B-A3B (MoE) Quick Start Runner Inference Engine `Loading command...` Commands are auto-generated based on your configuration settings. Advanced configuration Authentication Hugging Face Token * This model requires a Hugging Face access token. The token is inserted into the command and never stored. vLLM Configuration Configure vLLM server parameters. Leave empty to use defaults. Port (--port) Max Model Length (--max-model-len) Maximum context length the model can handle GPU Memory Utilization (--gpu-memory-utilization) Fraction of GPU memory to use (0.1 - 1.0) Details
Gemma 3 1B	Google Gemma3	—	✓	✓	✓	✓	✓	✓	✓	—	—	Gemma 3 1B Quick Start Runner Inference Engine `Loading command...` Commands are auto-generated based on your configuration settings. Advanced configuration Authentication Hugging Face Token * This model requires a Hugging Face access token. The token is inserted into the command and never stored. vLLM Configuration Configure vLLM server parameters. Leave empty to use defaults. Port (--port) Max Model Length (--max-model-len) Maximum context length the model can handle GPU Memory Utilization (--gpu-memory-utilization) Fraction of GPU memory to use (0.1 - 1.0) Details
Gemma 4 E4B	Google Gemma4	VLM	✓	✓	✓	✓	—	✓	—	✓	—	Gemma 4 E4B Quick Start Runner Inference Engine `Loading command...` Commands are auto-generated based on your configuration settings. Advanced configuration Authentication Hugging Face Token * This model requires a Hugging Face access token. The token is inserted into the command and never stored. vLLM Configuration Configure vLLM server parameters. Leave empty to use defaults. Port (--port) Max Model Length (--max-model-len) Maximum context length the model can handle GPU Memory Utilization (--gpu-memory-utilization) Fraction of GPU memory to use (0.1 - 1.0) Details
GPT OSS 120B	OpenAI GPT OSS	—	✓	✓	—	—	—	✓	—	—	—	GPT OSS 120B Quick Start Runner Inference Engine `Loading command...` Commands are auto-generated based on your configuration settings. Advanced configuration Authentication Hugging Face Token * This model requires a Hugging Face access token. The token is inserted into the command and never stored. vLLM Configuration Configure vLLM server parameters. Leave empty to use defaults. Port (--port) Max Model Length (--max-model-len) Maximum context length the model can handle GPU Memory Utilization (--gpu-memory-utilization) Fraction of GPU memory to use (0.1 - 1.0) Details
Llama 3.1 8B	Meta Llama 3	—	✓	✓	✓	✓	✓	✓	✓	—	—	Llama 3.1 8B Quick Start Runner Inference Engine `Loading command...` Commands are auto-generated based on your configuration settings. Advanced configuration Authentication Hugging Face Token * This model requires a Hugging Face access token. The token is inserted into the command and never stored. vLLM Configuration Configure vLLM server parameters. Leave empty to use defaults. Port (--port) Max Model Length (--max-model-len) Maximum context length the model can handle GPU Memory Utilization (--gpu-memory-utilization) Fraction of GPU memory to use (0.1 - 1.0) Details
Ministral 3 8B Instruct	Mistral AI Ministral 3	VLM	✓	✓	✓	✓	✓	✓	✓	—	—	Ministral 3 8B Instruct Quick Start Runner Inference Engine `Loading command...` Commands are auto-generated based on your configuration settings. Advanced configuration Authentication Hugging Face Token * This model requires a Hugging Face access token. The token is inserted into the command and never stored. vLLM Configuration Configure vLLM server parameters. Leave empty to use defaults. Port (--port) Max Model Length (--max-model-len) Maximum context length the model can handle GPU Memory Utilization (--gpu-memory-utilization) Fraction of GPU memory to use (0.1 - 1.0) Details
Nemotron Nano 9B v2	NVIDIA Nemotron	—	✓	✓	—	—	—	✓	—	—	—	Nemotron Nano 9B v2 Quick Start Runner Inference Engine `Loading command...` Commands are auto-generated based on your configuration settings. Advanced configuration Authentication Hugging Face Token * This model requires a Hugging Face access token. The token is inserted into the command and never stored. vLLM Configuration Configure vLLM server parameters. Leave empty to use defaults. Port (--port) Max Model Length (--max-model-len) Maximum context length the model can handle GPU Memory Utilization (--gpu-memory-utilization) Fraction of GPU memory to use (0.1 - 1.0) Details
Qwen3.5 27B	Alibaba Qwen3.5	—	✓	✓	✓	—	—	✓	—	—	—	Qwen3.5 27B Quick Start Runner Inference Engine `Loading command...` Commands are auto-generated based on your configuration settings. Advanced configuration Authentication Hugging Face Token * This model requires a Hugging Face access token. The token is inserted into the command and never stored. vLLM Configuration Configure vLLM server parameters. Leave empty to use defaults. Port (--port) Max Model Length (--max-model-len) Maximum context length the model can handle GPU Memory Utilization (--gpu-memory-utilization) Fraction of GPU memory to use (0.1 - 1.0) Details
Qwen3.6 27B New	Alibaba Qwen3.6	—	✓	✓	—	—	—	✓	—	—	—	Qwen3.6 27B Quick Start Runner Inference Engine `Loading command...` Commands are auto-generated based on your configuration settings. Advanced configuration Authentication Hugging Face Token * This model requires a Hugging Face access token. The token is inserted into the command and never stored. vLLM Configuration Configure vLLM server parameters. Leave empty to use defaults. Port (--port) Max Model Length (--max-model-len) Maximum context length the model can handle GPU Memory Utilization (--gpu-memory-utilization) Fraction of GPU memory to use (0.1 - 1.0) Details
Qwen3 8B	Alibaba Qwen3	—	✓	✓	✓	✓	—	✓	—	—	—	Qwen3 8B Quick Start Runner Inference Engine `Loading command...` Commands are auto-generated based on your configuration settings. Advanced configuration Authentication Hugging Face Token * This model requires a Hugging Face access token. The token is inserted into the command and never stored. vLLM Configuration Configure vLLM server parameters. Leave empty to use defaults. Port (--port) Max Model Length (--max-model-len) Maximum context length the model can handle GPU Memory Utilization (--gpu-memory-utilization) Fraction of GPU memory to use (0.1 - 1.0) Details
Gemma 3 4B	Google Gemma3	VLM	✓	✓	✓	✓	✓	✓	✓	—	—	Gemma 3 4B Quick Start Runner Inference Engine `Loading command...` Commands are auto-generated based on your configuration settings. Advanced configuration Authentication Hugging Face Token * This model requires a Hugging Face access token. The token is inserted into the command and never stored. vLLM Configuration Configure vLLM server parameters. Leave empty to use defaults. Port (--port) Max Model Length (--max-model-len) Maximum context length the model can handle GPU Memory Utilization (--gpu-memory-utilization) Fraction of GPU memory to use (0.1 - 1.0) Details
Gemma 4 26B-A4B	Google Gemma4	VLM	✓	✓	✓	—	—	✓	—	✓	—	Gemma 4 26B-A4B Quick Start Runner Inference Engine `Loading command...` Commands are auto-generated based on your configuration settings. Advanced configuration Authentication Hugging Face Token * This model requires a Hugging Face access token. The token is inserted into the command and never stored. vLLM Configuration Configure vLLM server parameters. Leave empty to use defaults. Port (--port) Max Model Length (--max-model-len) Maximum context length the model can handle GPU Memory Utilization (--gpu-memory-utilization) Fraction of GPU memory to use (0.1 - 1.0) Details
Cosmos Reason 2 2B	NVIDIA Cosmos Reason	VLM	✓	✓	✓	✓	✓	✓	—	✓	—	Details
Llama 3.1 70B	Meta Llama 3	—	✓	✓	✓	✓	✓	✓	✓	—	—	Llama 3.1 70B Quick Start Runner Inference Engine `Loading command...` Commands are auto-generated based on your configuration settings. Advanced configuration Authentication Hugging Face Token * This model requires a Hugging Face access token. The token is inserted into the command and never stored. vLLM Configuration Configure vLLM server parameters. Leave empty to use defaults. Port (--port) Max Model Length (--max-model-len) Maximum context length the model can handle GPU Memory Utilization (--gpu-memory-utilization) Fraction of GPU memory to use (0.1 - 1.0) Details
Ministral 3 14B Instruct	Mistral AI Ministral 3	VLM	✓	✓	✓	✓	✓	✓	✓	—	—	Ministral 3 14B Instruct Quick Start Runner Inference Engine `Loading command...` Commands are auto-generated based on your configuration settings. Advanced configuration Authentication Hugging Face Token * This model requires a Hugging Face access token. The token is inserted into the command and never stored. vLLM Configuration Configure vLLM server parameters. Leave empty to use defaults. Port (--port) Max Model Length (--max-model-len) Maximum context length the model can handle GPU Memory Utilization (--gpu-memory-utilization) Fraction of GPU memory to use (0.1 - 1.0) Details
Nemotron Nano 12B VL	NVIDIA Nemotron	VLM	✓	✓	—	—	—	✓	—	—	—	Nemotron Nano 12B VL Quick Start Runner Inference Engine `Loading command...` Commands are auto-generated based on your configuration settings. Advanced configuration Authentication Hugging Face Token * This model requires a Hugging Face access token. The token is inserted into the command and never stored. vLLM Configuration Configure vLLM server parameters. Leave empty to use defaults. Port (--port) Max Model Length (--max-model-len) Maximum context length the model can handle GPU Memory Utilization (--gpu-memory-utilization) Fraction of GPU memory to use (0.1 - 1.0) Details
Qwen3 30B-A3B (MoE)	Alibaba Qwen3	—	✓	✓	✓	—	—	✓	—	—	—	Qwen3 30B-A3B (MoE) Quick Start Runner Inference Engine `Loading command...` Commands are auto-generated based on your configuration settings. Advanced configuration Authentication Hugging Face Token * This model requires a Hugging Face access token. The token is inserted into the command and never stored. vLLM Configuration Configure vLLM server parameters. Leave empty to use defaults. Port (--port) Max Model Length (--max-model-len) Maximum context length the model can handle GPU Memory Utilization (--gpu-memory-utilization) Fraction of GPU memory to use (0.1 - 1.0) Details
Qwen3.5 9B	Alibaba Qwen3.5	VLM	✓	✓	✓	✓	—	✓	—	—	—	Qwen3.5 9B Quick Start Runner Inference Engine `Loading command...` Commands are auto-generated based on your configuration settings. Advanced configuration Authentication Hugging Face Token * This model requires a Hugging Face access token. The token is inserted into the command and never stored. vLLM Configuration Configure vLLM server parameters. Leave empty to use defaults. Port (--port) Max Model Length (--max-model-len) Maximum context length the model can handle GPU Memory Utilization (--gpu-memory-utilization) Fraction of GPU memory to use (0.1 - 1.0) Details
Gemma 3 12B	Google Gemma3	VLM	✓	✓	✓	✓	✓	✓	✓	—	—	Gemma 3 12B Quick Start Runner Inference Engine `Loading command...` Commands are auto-generated based on your configuration settings. Advanced configuration Authentication Hugging Face Token * This model requires a Hugging Face access token. The token is inserted into the command and never stored. vLLM Configuration Configure vLLM server parameters. Leave empty to use defaults. Port (--port) Max Model Length (--max-model-len) Maximum context length the model can handle GPU Memory Utilization (--gpu-memory-utilization) Fraction of GPU memory to use (0.1 - 1.0) Details
Cosmos Reason 2 8B	NVIDIA Cosmos Reason	VLM	✓	✓	✓	✓	✓	✓	—	✓	—	Details
Gemma 4 31B	Google Gemma4	VLM	✓	✓	✓	—	—	✓	—	✓	—	Gemma 4 31B Quick Start Runner Inference Engine `Loading command...` Commands are auto-generated based on your configuration settings. Advanced configuration Authentication Hugging Face Token * This model requires a Hugging Face access token. The token is inserted into the command and never stored. vLLM Configuration Configure vLLM server parameters. Leave empty to use defaults. Port (--port) Max Model Length (--max-model-len) Maximum context length the model can handle GPU Memory Utilization (--gpu-memory-utilization) Fraction of GPU memory to use (0.1 - 1.0) Details
Ministral 3 3B Reasoning	Mistral AI Ministral 3	VLM	✓	✓	✓	✓	✓	✓	—	—	—	Ministral 3 3B Reasoning Quick Start Runner Inference Engine `Loading command...` Commands are auto-generated based on your configuration settings. Advanced configuration Authentication Hugging Face Token * This model requires a Hugging Face access token. The token is inserted into the command and never stored. vLLM Configuration Configure vLLM server parameters. Leave empty to use defaults. Port (--port) Max Model Length (--max-model-len) Maximum context length the model can handle GPU Memory Utilization (--gpu-memory-utilization) Fraction of GPU memory to use (0.1 - 1.0) Details
Qwen3 32B	Alibaba Qwen3	—	✓	✓	—	—	—	✓	—	—	—	Qwen3 32B Quick Start Runner Inference Engine `Loading command...` Commands are auto-generated based on your configuration settings. Advanced configuration Authentication Hugging Face Token * This model requires a Hugging Face access token. The token is inserted into the command and never stored. vLLM Configuration Configure vLLM server parameters. Leave empty to use defaults. Port (--port) Max Model Length (--max-model-len) Maximum context length the model can handle GPU Memory Utilization (--gpu-memory-utilization) Fraction of GPU memory to use (0.1 - 1.0) Details
Nemotron 3 Nano Omni New	NVIDIA Nemotron	VLM	✓	✓	✓	—	—	✓	✓	✓	—	Nemotron 3 Nano Omni Quick Start Runner Inference Engine `Loading command...` Commands are auto-generated based on your configuration settings. Advanced configuration Authentication Hugging Face Token * This model requires a Hugging Face access token. The token is inserted into the command and never stored. vLLM Configuration Configure vLLM server parameters. Leave empty to use defaults. Port (--port) Max Model Length (--max-model-len) Maximum context length the model can handle GPU Memory Utilization (--gpu-memory-utilization) Fraction of GPU memory to use (0.1 - 1.0) Details
Qwen3.5 4B	Alibaba Qwen3.5	VLM	✓	✓	✓	✓	✓	✓	—	—	—	Qwen3.5 4B Quick Start Runner Inference Engine `Loading command...` Commands are auto-generated based on your configuration settings. Advanced configuration Authentication Hugging Face Token * This model requires a Hugging Face access token. The token is inserted into the command and never stored. vLLM Configuration Configure vLLM server parameters. Leave empty to use defaults. Port (--port) Max Model Length (--max-model-len) Maximum context length the model can handle GPU Memory Utilization (--gpu-memory-utilization) Fraction of GPU memory to use (0.1 - 1.0) Details
Gemma 3 27B	Google Gemma3	VLM	✓	✓	✓	✓	✓	✓	✓	—	—	Gemma 3 27B Quick Start Runner Inference Engine `Loading command...` Commands are auto-generated based on your configuration settings. Advanced configuration Authentication Hugging Face Token * This model requires a Hugging Face access token. The token is inserted into the command and never stored. vLLM Configuration Configure vLLM server parameters. Leave empty to use defaults. Port (--port) Max Model Length (--max-model-len) Maximum context length the model can handle GPU Memory Utilization (--gpu-memory-utilization) Fraction of GPU memory to use (0.1 - 1.0) Details
Ministral 3 8B Reasoning	Mistral AI Ministral 3	VLM	✓	✓	✓	✓	—	✓	—	—	—	Ministral 3 8B Reasoning Quick Start Runner Inference Engine `Loading command...` Commands are auto-generated based on your configuration settings. Advanced configuration Authentication Hugging Face Token * This model requires a Hugging Face access token. The token is inserted into the command and never stored. vLLM Configuration Configure vLLM server parameters. Leave empty to use defaults. Port (--port) Max Model Length (--max-model-len) Maximum context length the model can handle GPU Memory Utilization (--gpu-memory-utilization) Fraction of GPU memory to use (0.1 - 1.0) Details
Qwen3.5 0.8B	Alibaba Qwen3.5	VLM	✓	✓	✓	✓	✓	✓	—	—	—	Qwen3.5 0.8B Quick Start Runner Inference Engine `Loading command...` Commands are auto-generated based on your configuration settings. Advanced configuration Authentication Hugging Face Token * This model requires a Hugging Face access token. The token is inserted into the command and never stored. vLLM Configuration Configure vLLM server parameters. Leave empty to use defaults. Port (--port) Max Model Length (--max-model-len) Maximum context length the model can handle GPU Memory Utilization (--gpu-memory-utilization) Fraction of GPU memory to use (0.1 - 1.0) Details
Qwen3 VL 4B	Alibaba Qwen3	VLM	✓	✓	✓	✓	✓	✓	—	—	—	Qwen3 VL 4B Quick Start Runner Inference Engine `Loading command...` Commands are auto-generated based on your configuration settings. Advanced configuration Authentication Hugging Face Token * This model requires a Hugging Face access token. The token is inserted into the command and never stored. vLLM Configuration Configure vLLM server parameters. Leave empty to use defaults. Port (--port) Max Model Length (--max-model-len) Maximum context length the model can handle GPU Memory Utilization (--gpu-memory-utilization) Fraction of GPU memory to use (0.1 - 1.0) Details
Ministral 3 14B Reasoning	Mistral AI Ministral 3	VLM	✓	✓	✓	—	—	✓	—	—	—	Ministral 3 14B Reasoning Quick Start Runner Inference Engine `Loading command...` Commands are auto-generated based on your configuration settings. Advanced configuration Authentication Hugging Face Token * This model requires a Hugging Face access token. The token is inserted into the command and never stored. vLLM Configuration Configure vLLM server parameters. Leave empty to use defaults. Port (--port) Max Model Length (--max-model-len) Maximum context length the model can handle GPU Memory Utilization (--gpu-memory-utilization) Fraction of GPU memory to use (0.1 - 1.0) Details
Qwen3 VL 8B	Alibaba Qwen3	VLM	✓	✓	✓	✓	—	✓	—	—	—	Qwen3 VL 8B Quick Start Runner Inference Engine `Loading command...` Commands are auto-generated based on your configuration settings. Advanced configuration Authentication Hugging Face Token * This model requires a Hugging Face access token. The token is inserted into the command and never stored. vLLM Configuration Configure vLLM server parameters. Leave empty to use defaults. Port (--port) Max Model Length (--max-model-len) Maximum context length the model can handle GPU Memory Utilization (--gpu-memory-utilization) Fraction of GPU memory to use (0.1 - 1.0) Details

Performance Comparison

Benchmarks on Jetson across supported inference runtimes

Platform

Concurrency

* ISL/OSL for all benchmarks: 2048/128

* Unless otherwise specified, all models utilize W4A16 quantization for Orin and NVFP4 for Thor.

* NVFP4 and MXFP4 require Blackwell FP4 tensor cores and are not available on Orin (Ampere).

Supported Models

Featured Models

MiniMax M2.7

MiniMax M2.7

Inference Engine

Advanced configuration

Authentication

vLLM Configuration

Qwen3.6 35B-A3B (MoE)

Qwen3.6 35B-A3B (MoE)

Inference Engine

Advanced configuration

Authentication

vLLM Configuration

Qwen3.6 27B

Qwen3.6 27B

Inference Engine

Advanced configuration

Authentication

vLLM Configuration

Nemotron 3 Nano Omni

Nemotron 3 Nano Omni

Inference Engine

Advanced configuration

Authentication

vLLM Configuration

All Models

Google Gemma4

Gemma 4 E2B

Inference Engine

Advanced configuration

Authentication

vLLM Configuration

Gemma 4 E4B

Inference Engine

Advanced configuration

Authentication

vLLM Configuration

Gemma 4 26B-A4B

Inference Engine

Advanced configuration

Authentication

vLLM Configuration

Gemma 4 31B

Inference Engine

Advanced configuration

Authentication

vLLM Configuration

NVIDIA Nemotron

Nemotron3 Nano 4B

Inference Engine

Advanced configuration

Authentication

vLLM Configuration

Nemotron3 Nano 30B-A3B

Inference Engine

Advanced configuration

Authentication

vLLM Configuration

Nemotron Nano 9B v2

Inference Engine

Advanced configuration

Authentication

vLLM Configuration

Nemotron Nano 12B VL

Inference Engine

Advanced configuration

Authentication

vLLM Configuration

Nemotron 3 Nano Omni

Inference Engine

Advanced configuration

Authentication

vLLM Configuration

NVIDIA Cosmos Reason

Cosmos Reason 1 7B

Inference Engine

Advanced configuration

Authentication

vLLM Configuration