Multimodal

Cosmos Reason 1 7B

NVIDIA's 7B parameter reasoning vision-language model designed for physical AI and robotics applications

Parameters 7B
Modalities
Text Image Video
Context Length 128K
License NVIDIA Open Model License
Precision
FP16

Serve the model

Start server

Choose module, then engine and optional parameters on the left, then copy the serve command by clicking the button on the right.

Command

ยท

Call the model over Web API

Copy a client command below and paste it into your terminal to make a Web API request to the model you just served.

curl -s http://${JETSON_HOST}:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "nvidia/Cosmos-Reason1-7B",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

NVIDIA Cosmos Reason 1 7B is a reasoning vision-language model designed for physical AI and robotics applications. With 7 billion parameters, it provides strong reasoning capabilities for understanding physical world interactions, spatial relationships, and complex scene analysis.

This model can be pulled directly from HuggingFace and served with vLLM โ€” no manual model download needed.

Key Capabilities

  • Physical AI Reasoning: Understands physical world dynamics and interactions
  • Spatial Understanding: Advanced spatial reasoning about object positions, orientations, and relationships
  • Robotics Applications: Designed for robotics perception and planning tasks
  • Chain-of-thought Reasoning: Generates detailed reasoning traces before conclusions
  • Scene Analysis: Comprehensive understanding of complex visual scenes

Platform Support

Jetson AGX ThorJetson AGX Orin (64GB)
vLLM Containerghcr.io/nvidia-ai-iot/vllm:latest-jetson-thorghcr.io/nvidia-ai-iot/vllm:latest-jetson-orin
Max Model Length8192 tokens8192 tokens
GPU Memory Util0.60.8

Note: Requires HF_TOKEN environment variable set with your HuggingFace token. The model is downloaded automatically on first run.

Inputs and Outputs

Input:

  • Text prompts and images
  • Supports video frame analysis via --media-io-kwargs

Output:

  • Generated text with chain-of-thought reasoning traces
  • Physical reasoning, spatial analysis, and scene understanding

Additional Resources