Cosmos Reason 1 7B
NVIDIA's 7B parameter reasoning vision-language model designed for physical AI and robotics applications
Jetson Inference - Supported Inference Engines
# Run Command
sudo docker run -it --rm --pull always --runtime=nvidia --network host -e HF_TOKEN=$HF_TOKEN -v $HOME/.cache/huggingface:/root/.cache/huggingface ghcr.io/nvidia-ai-iot/vllm:latest-jetson-orin vllm serve nvidia/Cosmos-Reason1-7B --max-model-len 8192 --gpu-memory-utilization 0.8 --reasoning-parser qwen3 Model Details
NVIDIA Cosmos Reason 1 7B is a reasoning vision-language model designed for physical AI and robotics applications. With 7 billion parameters, it provides strong reasoning capabilities for understanding physical world interactions, spatial relationships, and complex scene analysis.
This model can be pulled directly from HuggingFace and served with vLLM — no manual model download needed.
Key Capabilities
- Physical AI Reasoning: Understands physical world dynamics and interactions
- Spatial Understanding: Advanced spatial reasoning about object positions, orientations, and relationships
- Robotics Applications: Designed for robotics perception and planning tasks
- Chain-of-thought Reasoning: Generates detailed reasoning traces before conclusions
- Scene Analysis: Comprehensive understanding of complex visual scenes
Platform Support
| Jetson AGX Thor | Jetson AGX Orin (64GB) | |
|---|---|---|
| vLLM Container | ghcr.io/nvidia-ai-iot/vllm:latest-jetson-thor | ghcr.io/nvidia-ai-iot/vllm:latest-jetson-orin |
| Max Model Length | 8192 tokens | 8192 tokens |
| GPU Memory Util | 0.6 | 0.8 |
Note: Requires
HF_TOKENenvironment variable set with your HuggingFace token. The model is downloaded automatically on first run.
Inputs and Outputs
Input:
- Text prompts and images
- Supports video frame analysis via
--media-io-kwargs
Output:
- Generated text with chain-of-thought reasoning traces
- Physical reasoning, spatial analysis, and scene understanding
Additional Resources
- Try on build.nvidia.com
- NVIDIA Cosmos Documentation
- Live VLM WebUI — real-time webcam-to-VLM interface