Multimodal

Qwen3.5 4B

Alibaba's efficient Qwen3.5 4B vision-language model tuned for practical multimodal deployment

Memory Requirement 4GB RAM
Precision AWQ 4-bit
Size 2.5GB

Jetson Inference - Supported Inference Engines

🚀
Container
# Run Command
sudo docker run -it --rm --pull always --runtime=nvidia --network host ghcr.io/nvidia-ai-iot/vllm:latest-jetson-orin vllm serve cyankiwi/Qwen3.5-4B-AWQ-4bit --gpu-memory-utilization 0.8 --enable-prefix-caching --reasoning-parser qwen3 --enable-auto-tool-choice --tool-call-parser qwen3_coder

Model Details

Qwen3.5 4B offers a balanced point in the Qwen3.5 family for local multimodal instruction following, visual understanding, and agent-style workloads on Jetson.

Inputs and Outputs

Input: Text and images

Output: Text

Intended Use Cases

  • Visual question answering: Multimodal prompting with image inputs
  • Image understanding: Captioning, scene analysis, and grounded responses
  • Tool calling: Structured tool use with vLLM
  • Multilingual tasks: Translation and multilingual prompting

Additional Resources