Multimodal
Qwen3.5 0.8B
Alibaba's compact Qwen3.5 vision-language model for lightweight multimodal deployment
Memory Requirement 2GB RAM
Precision BF16
Size 1.7GB
Jetson Inference - Supported Inference Engines
🚀
Container # Run Command
sudo docker run -it --rm --pull always --runtime=nvidia --network host ghcr.io/nvidia-ai-iot/vllm:latest-jetson-orin vllm serve Qwen/Qwen3.5-0.8B --gpu-memory-utilization 0.8 --enable-prefix-caching --reasoning-parser qwen3 --enable-auto-tool-choice --tool-call-parser qwen3_coder Model Details
Qwen3.5 0.8B is the smallest vision-language model in the Qwen3.5 lineup. It is designed for lightweight local multimodal inference, fast iteration, and efficient Jetson deployment.
Inputs and Outputs
Input: Text and images
Output: Text
Intended Use Cases
- Visual question answering: Ask questions about images and receive text responses
- Image understanding: Captioning, scene description, and visual analysis
- Tool calling: OpenAI-compatible tool use via vLLM
- Rapid prototyping: Quick local multimodal experiments
Additional Resources
- Hugging Face Model - Original checkpoint