Multimodal

Cosmos Reason 1 7B

NVIDIA's 7B parameter reasoning vision-language model designed for physical AI and robotics applications

Command to Run on Jetson Model Details

Parameters 7B

Modalities

Text Image Video

Context Length 128K

License NVIDIA Open Model License

Precision

FP16

Serve the model

Start server

Choose module, then engine and optional parameters on the left, then copy the serve command by clicking the button on the right.

Command

Call the model over Web API

Copy a client command below and paste it into your terminal to make a Web API request to the model you just served.

curl -s http://${JETSON_HOST}:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "nvidia/Cosmos-Reason1-7B",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

from openai import OpenAI

client = OpenAI(
    base_url="http://${JETSON_HOST}:8000/v1",
    api_key="not-needed",  # vLLM / llama.cpp typically do not enforce a key
)

completion = client.chat.completions.create(
    model="nvidia/Cosmos-Reason1-7B",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(completion.choices[0].message.content)

Model Details

Try on build.nvidia.com

View on HuggingFace

NVIDIA Cosmos Reason 1 7B is a reasoning vision-language model designed for physical AI and robotics applications. With 7 billion parameters, it provides strong reasoning capabilities for understanding physical world interactions, spatial relationships, and complex scene analysis.

This model can be pulled directly from HuggingFace and served with vLLM — no manual model download needed.

Key Capabilities

Physical AI Reasoning: Understands physical world dynamics and interactions
Spatial Understanding: Advanced spatial reasoning about object positions, orientations, and relationships
Robotics Applications: Designed for robotics perception and planning tasks
Chain-of-thought Reasoning: Generates detailed reasoning traces before conclusions
Scene Analysis: Comprehensive understanding of complex visual scenes

Platform Support

	Jetson AGX Thor	Jetson AGX Orin (64GB)
vLLM Container	`ghcr.io/nvidia-ai-iot/vllm:latest-jetson-thor`	`ghcr.io/nvidia-ai-iot/vllm:latest-jetson-orin`
Max Model Length	8192 tokens	8192 tokens
GPU Memory Util	0.6	0.8

Note: Requires HF_TOKEN environment variable set with your HuggingFace token. The model is downloaded automatically on first run.

Inputs and Outputs

Input:

Text prompts and images
Supports video frame analysis via --media-io-kwargs

Output:

Generated text with chain-of-thought reasoning traces
Physical reasoning, spatial analysis, and scene understanding

Additional Resources

Try on build.nvidia.com
NVIDIA Cosmos Documentation
Live VLM WebUI — real-time webcam-to-VLM interface