Multimodal

Qwen3.5 0.8B

Alibaba's compact Qwen3.5 vision-language model for lightweight multimodal deployment

Memory Requirement 2GB RAM
Precision BF16
Size 1.7GB

Jetson Inference - Supported Inference Engines

🚀
Container
# Run Command
sudo docker run -it --rm --pull always --runtime=nvidia --network host ghcr.io/nvidia-ai-iot/vllm:latest-jetson-orin vllm serve Qwen/Qwen3.5-0.8B --gpu-memory-utilization 0.8 --enable-prefix-caching --reasoning-parser qwen3 --enable-auto-tool-choice --tool-call-parser qwen3_coder

Model Details

Qwen3.5 0.8B is the smallest vision-language model in the Qwen3.5 lineup. It is designed for lightweight local multimodal inference, fast iteration, and efficient Jetson deployment.

Inputs and Outputs

Input: Text and images

Output: Text

Intended Use Cases

  • Visual question answering: Ask questions about images and receive text responses
  • Image understanding: Captioning, scene description, and visual analysis
  • Tool calling: OpenAI-compatible tool use via vLLM
  • Rapid prototyping: Quick local multimodal experiments

Additional Resources