Google DeepMind · Gemini

Gemini 3.1 Flash-Lite

Name: Gemini 3.1 Flash-Lite — Cheapest Google Gemini model
Brand: Google DeepMind · Gemini
Rating: 4.9 (187 reviews)

Smallest, fastest, cheapest Gemini model

Flash-Lite is Google's lowest-cost Gemini variant, tuned for high-volume, low-latency workloads like routing, classification and embedded agents.

Key features

Gemini 3.1 Flash-Lite · AI Models

Context window 1M tokens

Max output 32K tokens

Released March 3, 2026

Pricing Lowest Gemini tier

Key features

Flash-Lite is Google's lowest-cost Gemini variant, tuned for high-volume, low-latency workloads like routing, classification and embedded agents.

Best for

Use Flash-Lite when you need Gemini quality at the lowest possible cost — moderation, intent classification, batch processing.

What is Gemini 3.1 Flash-Lite best for?

Flash-Lite is ideal for moderation, intent classification, batch summarisation and any high-volume API workload where cost per query dominates.

Is Gemini 3.1 Flash-Lite still multimodal?

Yes — it accepts text, audio and image inputs natively, making it useful for lightweight multimodal classification and routing pipelines.

Flash-Lite is Google's lowest-cost Gemini variant, tuned for high-volume, low-latency workloads like routing, classification and embedded agents.

Open Chat