Cosmos Reason 1 7B
NVIDIA's 7B parameter reasoning vision-language model designed for physical AI and robotics applications
Serve the model
Start server
Choose module, then engine and optional parameters on the left, then copy the serve command by clicking the button on the right.
Command
ยท
No command for this module and engine in model data.
Call the model over Web API
Copy a client command below and paste it into your terminal to make a Web API request to the model you just served.
curl -s http://${JETSON_HOST}:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "nvidia/Cosmos-Reason1-7B",
"messages": [{"role": "user", "content": "Hello!"}]
}' Model Details
NVIDIA Cosmos Reason 1 7B is a reasoning vision-language model designed for physical AI and robotics applications. With 7 billion parameters, it provides strong reasoning capabilities for understanding physical world interactions, spatial relationships, and complex scene analysis.
This model can be pulled directly from HuggingFace and served with vLLM โ no manual model download needed.
Key Capabilities
- Physical AI Reasoning: Understands physical world dynamics and interactions
- Spatial Understanding: Advanced spatial reasoning about object positions, orientations, and relationships
- Robotics Applications: Designed for robotics perception and planning tasks
- Chain-of-thought Reasoning: Generates detailed reasoning traces before conclusions
- Scene Analysis: Comprehensive understanding of complex visual scenes
Platform Support
| Jetson AGX Thor | Jetson AGX Orin (64GB) | |
|---|---|---|
| vLLM Container | ghcr.io/nvidia-ai-iot/vllm:latest-jetson-thor | ghcr.io/nvidia-ai-iot/vllm:latest-jetson-orin |
| Max Model Length | 8192 tokens | 8192 tokens |
| GPU Memory Util | 0.6 | 0.8 |
Note: Requires
HF_TOKENenvironment variable set with your HuggingFace token. The model is downloaded automatically on first run.
Inputs and Outputs
Input:
- Text prompts and images
- Supports video frame analysis via
--media-io-kwargs
Output:
- Generated text with chain-of-thought reasoning traces
- Physical reasoning, spatial analysis, and scene understanding
Additional Resources
- Try on build.nvidia.com
- NVIDIA Cosmos Documentation
- Live VLM WebUI โ real-time webcam-to-VLM interface