Cosmos3 Nano
NVIDIA's compact vision-language reasoning model (16B) with chain-of-thought over text, image, and video — NVFP4 for Blackwell/Thor.
Benchmark
Cosmos3 Nano · vLLM · NVFP4* · ISL 2048 / OSL 128
C = concurrent requests. Results will vary with image, clocks, and workload.
Model Details
Cosmos3 Nano is a compact (16B) vision-language reasoning model from the NVIDIA Cosmos family. It performs chain-of-thought reasoning over text, images, and video, producing text output. This page covers the NVFP4 checkpoint, which runs natively on Jetson Thor (Blackwell, sm_110) for efficient 4-bit inference.
Key Capabilities
- Multimodal Reasoning: Chain-of-thought over combined image/video + text input
- Spatial & Scene Understanding: Reasoning about objects and relationships in a scene
- Video Understanding: Temporal reasoning across video frames
- NVFP4 on Blackwell: 4-bit (E2M1 with FP8 block scales) weights for high throughput on Thor
Running with vLLM (NVFP4)
The NVFP4 checkpoint is published on NGC and downloaded via the NGC CLI.
Step 1: Install and Configure the NGC CLI
wget -O ngccli_arm64.zip https://api.ngc.nvidia.com/v2/resources/nvidia/ngc-apps/ngc_cli/versions/4.20.1/files/ngccli_arm64.zip
unzip ngccli_arm64.zip && chmod u+x ngc-cli/ngc
export PATH="$PATH:$(pwd)/ngc-cli"
ngc config set
You will need an NGC account with access to the model and a valid API key.
Step 2: Download the NVFP4 Model
mkdir -p ~/cosmos3-ngc
ngc registry model download-version \
"nim/nvidia/cosmos3-nano-reasoner:modelopt-nvfp4-full-quantize-final_format_fix" \
--dest ~/cosmos3-ngc
export MODEL_PATH=$(find ~/cosmos3-ngc -maxdepth 2 -name config.json -exec dirname {} \; | head -1)
Step 3: Serve on Jetson Thor
sudo docker run -it --rm --runtime=nvidia --network host \
-v $MODEL_PATH:/model:ro \
--entrypoint "" \
vllm/vllm-openai:v0.23.0-aarch64-ubuntu2404 \
vllm serve /model \
--max-model-len 8192 \
--gpu-memory-utilization 0.8 \
--trust-remote-code \
--limit-mm-per-prompt '{"image": 1, "video": 0}'
Send an OpenAI-style chat request with an image_url (data URI or http URL) plus a text prompt to exercise the multimodal path.
Additional Resources
- NGC NVFP4 Checkpoint - NVFP4 quantized model for vLLM on Thor
- Live VLM WebUI - real-time webcam-to-VLM interface