Text
GPT OSS 120B
OpenAI's open-source 120 billion parameter language model for Jetson Thor
Parameters 117B total / 5.1B activated
Modalities
Text
Context Length 128K
License Apache 2.0
Precision
NVFP4
Serve the model
Start server
Choose module, then engine and optional parameters on the left, then copy the serve command by clicking the button on the right.
Command
·
No command for this module and engine in model data.
Call the model over Web API
Copy a client command below and paste it into your terminal to make a Web API request to the model you just served.
curl -s http://${JETSON_HOST}:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-oss-120b",
"messages": [{"role": "user", "content": "Hello!"}]
}' Model Details
OpenAI GPT OSS 120B is OpenAI’s open-source 120 billion parameter language model. Due to its size, this model is exclusively supported on Jetson AGX Thor. It requires tiktoken encodings to be downloaded before serving.
Running with vLLM (Thor Only)
Step 1: Download Tiktoken Encodings
mkdir -p $HOME/.cache/tiktoken
wget -q https://openaipublic.blob.core.windows.net/encodings/cl100k_base.tiktoken \
-O $HOME/.cache/tiktoken/cl100k_base.tiktoken
wget -q https://openaipublic.blob.core.windows.net/encodings/o200k_base.tiktoken \
-O $HOME/.cache/tiktoken/o200k_base.tiktoken
Step 2: Serve
sudo docker run -it --rm --pull always --runtime=nvidia --network host \
-v $HOME/.cache/huggingface:/root/.cache/huggingface \
-v $HOME/.cache/tiktoken:/etc/encodings \
-e TIKTOKEN_ENCODINGS_BASE=/etc/encodings \
ghcr.io/nvidia-ai-iot/vllm:latest-jetson-thor \
vllm serve openai/gpt-oss-120b --gpu-memory-utilization 0.8
GPT OSS Family
| Model | Parameters | Memory | Minimum Jetson |
|---|---|---|---|
| GPT OSS 20B | 20B | 16GB RAM | AGX Orin |
| GPT OSS 120B | 120B | 64GB RAM | Thor |