Multimodal

Cosmos Reason 1 7B

NVIDIA's 7B parameter reasoning vision-language model designed for physical AI and robotics applications

Memory Requirement 16GB RAM
Precision FP16
Size 14GB

Jetson Inference - Supported Inference Engines

🚀
Container
# Run Command
sudo docker run -it --rm --pull always --runtime=nvidia --network host -e HF_TOKEN=$HF_TOKEN -v $HOME/.cache/huggingface:/root/.cache/huggingface ghcr.io/nvidia-ai-iot/vllm:latest-jetson-orin vllm serve nvidia/Cosmos-Reason1-7B --max-model-len 8192 --gpu-memory-utilization 0.8 --reasoning-parser qwen3

NVIDIA Cosmos Reason 1 7B is a reasoning vision-language model designed for physical AI and robotics applications. With 7 billion parameters, it provides strong reasoning capabilities for understanding physical world interactions, spatial relationships, and complex scene analysis.

This model can be pulled directly from HuggingFace and served with vLLM — no manual model download needed.

Key Capabilities

  • Physical AI Reasoning: Understands physical world dynamics and interactions
  • Spatial Understanding: Advanced spatial reasoning about object positions, orientations, and relationships
  • Robotics Applications: Designed for robotics perception and planning tasks
  • Chain-of-thought Reasoning: Generates detailed reasoning traces before conclusions
  • Scene Analysis: Comprehensive understanding of complex visual scenes

Platform Support

Jetson AGX ThorJetson AGX Orin (64GB)
vLLM Containerghcr.io/nvidia-ai-iot/vllm:latest-jetson-thorghcr.io/nvidia-ai-iot/vllm:latest-jetson-orin
Max Model Length8192 tokens8192 tokens
GPU Memory Util0.60.8

Note: Requires HF_TOKEN environment variable set with your HuggingFace token. The model is downloaded automatically on first run.

Inputs and Outputs

Input:

  • Text prompts and images
  • Supports video frame analysis via --media-io-kwargs

Output:

  • Generated text with chain-of-thought reasoning traces
  • Physical reasoning, spatial analysis, and scene understanding

Additional Resources