Nemotron Nano 12B VL
NVIDIA's vision-language model for image understanding and multimodal reasoning
Serve the model
Start server
Choose module, then engine and optional parameters on the left, then copy the serve command by clicking the button on the right.
Command
·
No command for this module and engine in model data.
Call the model over Web API
Copy a client command below and paste it into your terminal to make a Web API request to the model you just served.
curl -s http://${JETSON_HOST}:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "nvidia/NVIDIA-Nemotron-Nano-12B-v2-VL-NVFP4-QAD",
"messages": [{"role": "user", "content": "Hello!"}]
}' Model Details
NVIDIA Nemotron Nano 12B VL is a vision-language model capable of understanding images and text, with support for chain-of-thought reasoning across multimodal inputs.
Inputs and Outputs
Input: Image, Text
Output: Text
Intended Use Cases
- Image Summarization: Generate detailed descriptions of images
- Text-Image Analysis: Analyze relationships between text and visual content
- Optical Character Recognition (OCR): Extract text from images
- Interactive Q&A on Images: Answer questions about image content
- Chain-of-Thought Reasoning: Complex visual reasoning tasks
Supported Languages
English, German, Spanish, French, Italian, Korean, Portuguese, Russian, Japanese, Chinese.