Multimodal

Qwen3.5 0.8B

Alibaba's compact Qwen3.5 vision-language model for lightweight multimodal deployment

Command to Run on Jetson Model Details

Parameters 0.8B

Modalities

Text Image

Context Length 256K

License Apache 2.0

Precision

BF16

Serve the model

Start server

Choose module, then engine and optional parameters on the left, then copy the serve command by clicking the button on the right.

Command

Call the model over Web API

Copy a client command below and paste it into your terminal to make a Web API request to the model you just served.

curl -s http://${JETSON_HOST}:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen/Qwen3.5-0.8B",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

from openai import OpenAI

client = OpenAI(
    base_url="http://${JETSON_HOST}:8000/v1",
    api_key="not-needed",  # vLLM / llama.cpp typically do not enforce a key
)

completion = client.chat.completions.create(
    model="Qwen/Qwen3.5-0.8B",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(completion.choices[0].message.content)

Model Details

View on HuggingFace

Qwen3.5 0.8B is the smallest vision-language model in the Qwen3.5 lineup. It is designed for lightweight local multimodal inference, fast iteration, and efficient Jetson deployment.

Inputs and Outputs

Input: Text and images

Output: Text

Intended Use Cases

Visual question answering: Ask questions about images and receive text responses
Image understanding: Captioning, scene description, and visual analysis
Tool calling: OpenAI-compatible tool use via vLLM
Rapid prototyping: Quick local multimodal experiments

Additional Resources

Hugging Face Model - Original checkpoint