Gemini 3.1 Flash-Lite — Cheapest Google Gemini model
Google DeepMind · Gemini

Gemini 3.1 Flash-Lite

Smallest, fastest, cheapest Gemini model

Flash-Lite is Google's lowest-cost Gemini variant, tuned for high-volume, low-latency workloads like routing, classification and embedded agents.

Key features

Gemini 3.1 Flash-Lite · AI Models

Context window 1M tokens
Max output 32K tokens
Released March 3, 2026
Pricing Lowest Gemini tier
Key features

Gemini 3.1 Flash-Lite

Flash-Lite is Google's lowest-cost Gemini variant, tuned for high-volume, low-latency workloads like routing, classification and embedded agents.

Key features

  • Lowest price-per-token in the Gemini family.
  • Optimised for high-volume API workloads.
  • Native multimodal input.
  • Strong instruction-following despite reduced size.
Best for

Best for

Use Flash-Lite when you need Gemini quality at the lowest possible cost — moderation, intent classification, batch processing.

Frequently Asked Questions

What is Gemini 3.1 Flash-Lite best for?

Flash-Lite is ideal for moderation, intent classification, batch summarisation and any high-volume API workload where cost per query dominates.

Is Gemini 3.1 Flash-Lite still multimodal?

Yes — it accepts text, audio and image inputs natively, making it useful for lightweight multimodal classification and routing pipelines.

Open Chat

Flash-Lite is Google's lowest-cost Gemini variant, tuned for high-volume, low-latency workloads like routing, classification and embedded agents.

Open Chat