Multimodal

Qwen3.5 4B

Alibaba's efficient Qwen3.5 4B vision-language model tuned for practical multimodal deployment

Command to Run on Jetson Model Details

Parameters 4B

Modalities

Text Image

Context Length 256K

License Apache 2.0

Precision

AWQ 4-bit

Serve the model

Start server

Choose module, then engine and optional parameters on the left, then copy the serve command by clicking the button on the right.

Command

Call the model over Web API

Copy a client command below and paste it into your terminal to make a Web API request to the model you just served.

curl -s http://${JETSON_HOST}:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "cyankiwi/Qwen3.5-4B-AWQ-4bit",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

from openai import OpenAI

client = OpenAI(
    base_url="http://${JETSON_HOST}:8000/v1",
    api_key="not-needed",  # vLLM / llama.cpp typically do not enforce a key
)

completion = client.chat.completions.create(
    model="cyankiwi/Qwen3.5-4B-AWQ-4bit",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(completion.choices[0].message.content)

Model Details

View on HuggingFace

Qwen3.5 4B offers a balanced point in the Qwen3.5 family for local multimodal instruction following, visual understanding, and agent-style workloads on Jetson.

Inputs and Outputs

Input: Text and images

Output: Text

Intended Use Cases

Visual question answering: Multimodal prompting with image inputs
Image understanding: Captioning, scene analysis, and grounded responses
Tool calling: Structured tool use with vLLM
Multilingual tasks: Translation and multilingual prompting

Additional Resources

Original Model - Base Qwen3.5 4B checkpoint
AWQ Checkpoint - Quantized checkpoint used here