New Text
Gemma 4 E2B
Google's compact frontier Gemma 4 model for efficient multimodal and agentic workloads
Memory Requirement 8GB RAM
Precision Q8_0 GGUF
Size 5.0GB
Jetson Inference - Supported Inference Engines
🚀
Container # Run Command
sudo docker run -it --rm --pull always --runtime=nvidia --network host -v $HOME/.cache/huggingface:/root/.cache/huggingface ghcr.io/nvidia-ai-iot/llama_cpp:gemma4-jetson-orin llama-server -hf ggml-org/gemma-4-E2B-it-GGUF:Q8_0 Model Details
Gemma 4 E2B is the smallest variant in the Gemma 4 family. Google positions E2B as an edge-first model for low-latency, low-memory deployments where efficiency matters more than absolute model size.
- Offline voice assistants and smart home controllers
- Robotics copilots that combine speech and image understanding
- Lightweight OCR and document QA on constrained Jetson devices
- Local agent pipelines that need structured tool calling with a small footprint
Inputs and Outputs
Input: Text, image, and audio
Output: Text
Supported Platforms
- Jetson Orin
- Jetson Thor
Inference Engine
This model is configured to run on Jetson with llama.cpp.
Official Highlights
- Google’s model card describes E2B as a dense multimodal model with 2.3B effective parameters and 5.1B parameters including embeddings.
- It supports 128K context, text/image/audio input, and native function calling for agentic workflows.
- The official Gemma 4 launch notes that E2B was engineered for offline mobile and IoT use, including devices like Jetson Orin Nano.
- Google also documents built-in ASR and speech translation support on E2B, with audio clips up to 30 seconds.