DeepSeek V4-Flash — Fast 284B MoE
DeepSeek

DeepSeek V4-Flash

Fast, lower-cost DeepSeek V4 variant

V4-Flash brings the V4 architecture down to 284B total / 13B active parameters, keeping the 1M-token context while drastically reducing inference cost.

Key features

DeepSeek V4-Flash · AI Models

Parameters 284B total · 13B active
Context window 1M tokens
Released April 24, 2026
Pricing Lower-cost DeepSeek API tier
Key features

DeepSeek V4-Flash

V4-Flash brings the V4 architecture down to 284B total / 13B active parameters, keeping the 1M-token context while drastically reducing inference cost.

Key features

  • 284B total / 13B active parameters.
  • Same V4 hybrid attention as V4-Pro.
  • 1M-token context window.
  • Significantly cheaper than V4-Pro.
Best for

Best for

Use V4-Flash for high-throughput chat, classification and coding workloads where you want V4 quality at a lower price point.

Frequently Asked Questions

How does DeepSeek V4-Flash compare to V4-Pro?

V4-Flash uses 284B total / 13B active parameters vs V4-Pro's 1.6T / 49B. Flash is significantly cheaper and faster while sharing the same hybrid attention architecture.

Can I self-host DeepSeek V4-Flash?

DeepSeek makes V4-Flash available via their API. The open-weight model enables enterprise deployments that need cost-efficient V4 quality on their own infrastructure.

Open Chat

V4-Flash brings the V4 architecture down to 284B total / 13B active parameters, keeping the 1M-token context while drastically reducing inference cost.

Open Chat