Qwen3 4B
Alibaba's efficient 4 billion parameter instruction-tuned language model
Serve the model
Start server
Choose module, then engine and optional parameters on the left, then copy the serve command by clicking the button on the right.
Command
ยท
No command for this module and engine in model data.
Call the model over Web API
Copy a client command below and paste it into your terminal to make a Web API request to the model you just served.
curl -s http://${JETSON_HOST}:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "RedHatAI/Qwen3-4B-quantized.w4a16",
"messages": [{"role": "user", "content": "Hello!"}]
}' Benchmark
Qwen 3 4B · vLLM · NVFP4 / W4A16 · ISL 2048 / OSL 128
C = concurrent requests. Results will vary with image, clocks, and workload.
Model Details
Qwen3 is Alibaba Cloudโs latest generation of large language models, offering state-of-the-art performance across a wide range of tasks. The Qwen3 4B model provides an excellent balance of capability and efficiency for edge deployment.
Inputs and Outputs
Input: Text
Output: Text
Intended Use Cases
- Reasoning: Advanced logical and analytical reasoning tasks
- Function Calling: Native support for tool use and function calling
- Subject Matter Experts: Fine-tuning for domain-specific expertise
- Multilingual Instruction Following: Following instructions across 100+ languages
- Translation: High-quality translation between supported languages