Text
GPT OSS 120B
OpenAI's open-source 120 billion parameter language model for Jetson Thor
Memory Requirement 64GB RAM
Precision NVFP4
Size 60GB
Jetson Inference - Supported Inference Engines
🚀
Container # Installation
mkdir -p $HOME/.cache/tiktoken
wget -q https://openaipublic.blob.core.windows.net/encodings/cl100k_base.tiktoken \
-O $HOME/.cache/tiktoken/cl100k_base.tiktoken
wget -q https://openaipublic.blob.core.windows.net/encodings/o200k_base.tiktoken \
-O $HOME/.cache/tiktoken/o200k_base.tiktoken Model Details
OpenAI GPT OSS 120B is OpenAI’s open-source 120 billion parameter language model. Due to its size, this model is exclusively supported on Jetson AGX Thor. It requires tiktoken encodings to be downloaded before serving.
Running with vLLM (Thor Only)
Step 1: Download Tiktoken Encodings
mkdir -p $HOME/.cache/tiktoken
wget -q https://openaipublic.blob.core.windows.net/encodings/cl100k_base.tiktoken \
-O $HOME/.cache/tiktoken/cl100k_base.tiktoken
wget -q https://openaipublic.blob.core.windows.net/encodings/o200k_base.tiktoken \
-O $HOME/.cache/tiktoken/o200k_base.tiktoken
Step 2: Serve
sudo docker run -it --rm --pull always --runtime=nvidia --network host \
-v $HOME/.cache/huggingface:/root/.cache/huggingface \
-v $HOME/.cache/tiktoken:/etc/encodings \
-e TIKTOKEN_ENCODINGS_BASE=/etc/encodings \
ghcr.io/nvidia-ai-iot/vllm:latest-jetson-thor \
vllm serve openai/gpt-oss-120b --gpu-memory-utilization 0.8
GPT OSS Family
| Model | Parameters | Memory | Minimum Jetson |
|---|---|---|---|
| GPT OSS 20B | 20B | 16GB RAM | AGX Orin |
| GPT OSS 120B | 120B | 64GB RAM | Thor |