Multimodal
Qwen3.5 4B
Alibaba's efficient Qwen3.5 4B vision-language model tuned for practical multimodal deployment
Memory Requirement 4GB RAM
Precision AWQ 4-bit
Size 2.5GB
Jetson Inference - Supported Inference Engines
🚀
Container # Run Command
sudo docker run -it --rm --pull always --runtime=nvidia --network host ghcr.io/nvidia-ai-iot/vllm:latest-jetson-orin vllm serve cyankiwi/Qwen3.5-4B-AWQ-4bit --gpu-memory-utilization 0.8 --enable-prefix-caching --reasoning-parser qwen3 --enable-auto-tool-choice --tool-call-parser qwen3_coder Model Details
Qwen3.5 4B offers a balanced point in the Qwen3.5 family for local multimodal instruction following, visual understanding, and agent-style workloads on Jetson.
Inputs and Outputs
Input: Text and images
Output: Text
Intended Use Cases
- Visual question answering: Multimodal prompting with image inputs
- Image understanding: Captioning, scene analysis, and grounded responses
- Tool calling: Structured tool use with vLLM
- Multilingual tasks: Translation and multilingual prompting
Additional Resources
- Original Model - Base Qwen3.5 4B checkpoint
- AWQ Checkpoint - Quantized checkpoint used here