Text
Qwen3.5 27B
Alibaba's dense 27 billion parameter language model with native tool calling and MTP speculative decoding
Memory Requirement 18GB RAM
Precision NVFP4 / W4A16
Size 15GB
Jetson Inference - Supported Inference Engines
🚀
Container # Run Command
sudo docker run -it --rm --pull always --runtime=nvidia --network host ghcr.io/nvidia-ai-iot/vllm:latest-jetson-orin vllm serve Kbenkhaled/Qwen3.5-27B-quantized.w4a16 --gpu-memory-utilization 0.8 --enable-prefix-caching --reasoning-parser qwen3 --enable-auto-tool-choice --tool-call-parser qwen3_coder Performance on Jetson Orin with vLLM
9
output tokens/sec
Single request (concurrency=1)
41
output tokens/sec
8 parallel requests (concurrency=8)
Model Details
Qwen3.5 27B is a dense language model from Alibaba Cloud’s Qwen3.5 family. With 27 billion parameters, it delivers strong performance across complex reasoning, coding, and language understanding tasks.
Inputs and Outputs
Input: Text
Output: Text
Intended Use Cases
- Reasoning: Advanced logical and analytical reasoning with chain-of-thought
- Function Calling: Native support for tool use and function calling
- Multilingual Instruction Following: Following instructions across 100+ languages
- Code Generation: Programming assistance in multiple languages
- Translation: High-quality translation between supported languages
Running with vLLM
sudo docker run -it --rm --pull always --runtime=nvidia --network host \
ghcr.io/nvidia-ai-iot/vllm:latest-jetson-orin \
vllm serve Kbenkhaled/Qwen3.5-27B-quantized.w4a16 \
--gpu-memory-utilization 0.8 --enable-prefix-caching \
--reasoning-parser qwen3 \
--enable-auto-tool-choice --tool-call-parser qwen3_coder
Speculative Decoding with MTP
This model supports Multi-Token Prediction (MTP) speculative decoding, which can significantly improve generation throughput. To enable it, add the following flag to your vllm serve command:
--speculative-config '{"method": "mtp", "num_speculative_tokens": 4}'
Qwen3.5 Family
| Model | Parameters | Active Params | Type | Best For |
|---|---|---|---|---|
| Qwen3.5 35B-A3B | 35B | 3B | MoE | Efficient high-performance inference |
| Qwen3.5 27B | 27B | 27B | Dense | Maximum accuracy on demanding tasks |
Additional Resources
- Hugging Face Model - Original model weights
- NVFP4 Checkpoint (Thor) - Quantized for Jetson Thor
- W4A16 Checkpoint (Orin) - Quantized for Jetson Orin