Qwen3.5 27B
Alibaba's dense 27 billion parameter language model with native tool calling and MTP speculative decoding
Serve the model
Start server
Choose module, then engine and optional parameters on the left, then copy the serve command by clicking the button on the right.
Command
ยท
No command for this module and engine in model data.
Call the model over Web API
Copy a client command below and paste it into your terminal to make a Web API request to the model you just served.
curl -s http://${JETSON_HOST}:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "Qwen/Qwen3.5-27B",
"messages": [{"role": "user", "content": "Hello!"}]
}' Benchmark
Qwen3.5-27B · vLLM · NVFP4 / W4A16 · ISL 2048 / OSL 128
C = concurrent requests. Results will vary with image, clocks, and workload.
Model Details
Qwen3.5 27B is a dense language model from Alibaba Cloudโs Qwen3.5 family. With 27 billion parameters, it delivers strong performance across complex reasoning, coding, and language understanding tasks.
Inputs and Outputs
Input: Text
Output: Text
Intended Use Cases
- Reasoning: Advanced logical and analytical reasoning with chain-of-thought
- Function Calling: Native support for tool use and function calling
- Multilingual Instruction Following: Following instructions across 100+ languages
- Code Generation: Programming assistance in multiple languages
- Translation: High-quality translation between supported languages
Running with vLLM
sudo docker run -it --rm --pull always --runtime=nvidia --network host \
ghcr.io/nvidia-ai-iot/vllm:latest-jetson-orin \
vllm serve Kbenkhaled/Qwen3.5-27B-quantized.w4a16 \
--gpu-memory-utilization 0.8 --enable-prefix-caching \
--reasoning-parser qwen3 \
--enable-auto-tool-choice --tool-call-parser qwen3_coder
Speculative Decoding with MTP
This model supports Multi-Token Prediction (MTP) speculative decoding, which can significantly improve generation throughput. To enable it, add the following flag to your vllm serve command:
--speculative-config '{"method": "mtp", "num_speculative_tokens": 4}'
Qwen3.5 Family
| Model | Parameters | Active Params | Type | Best For |
|---|---|---|---|---|
| Qwen3.5 35B-A3B | 35B | 3B | MoE | Efficient high-performance inference |
| Qwen3.5 27B | 27B | 27B | Dense | Maximum accuracy on demanding tasks |
Additional Resources
- Hugging Face Model - Original model weights
- NVFP4 Checkpoint (Thor) - Quantized for Jetson Thor
- W4A16 Checkpoint (Orin) - Quantized for Jetson Orin