Cosmos Reason 1 7B

NVIDIA Cosmos Reason 1 7B is a reasoning vision-language model designed for physical AI and robotics applications. With 7 billion parameters, it provides strong reasoning capabilities for understanding physical world interactions, spatial relationships, and complex scene analysis.

This model can be pulled directly from HuggingFace and served with vLLM — no manual model download needed.

Key Capabilities

Physical AI Reasoning: Understands physical world dynamics and interactions
Spatial Understanding: Advanced spatial reasoning about object positions, orientations, and relationships
Robotics Applications: Designed for robotics perception and planning tasks
Chain-of-thought Reasoning: Generates detailed reasoning traces before conclusions
Scene Analysis: Comprehensive understanding of complex visual scenes

Platform Support

	Jetson AGX Thor	Jetson AGX Orin (64GB)
vLLM Container	`ghcr.io/nvidia-ai-iot/vllm:latest-jetson-thor`	`ghcr.io/nvidia-ai-iot/vllm:latest-jetson-orin`
Max Model Length	8192 tokens	8192 tokens
GPU Memory Util	0.6	0.8

Note: Requires HF_TOKEN environment variable set with your HuggingFace token. The model is downloaded automatically on first run.

Inputs and Outputs

Input:

Text prompts and images
Supports video frame analysis via --media-io-kwargs

Output:

Generated text with chain-of-thought reasoning traces
Physical reasoning, spatial analysis, and scene understanding

Additional Resources

Try on build.nvidia.com
NVIDIA Cosmos Documentation
Live VLM WebUI — real-time webcam-to-VLM interface

Jetson Inference - Supported Inference Engines

Model Details

Key Capabilities

Platform Support

Inputs and Outputs

Additional Resources