DeepSeek V4-Flash
V4-Flash brings the V4 architecture down to 284B total / 13B active parameters, keeping the 1M-token context while drastically reducing inference cost.
V4-Flash brings the V4 architecture down to 284B total / 13B active parameters, keeping the 1M-token context while drastically reducing inference cost.
Use V4-Flash for high-throughput chat, classification and coding workloads where you want V4 quality at a lower price point.
V4-Flash uses 284B total / 13B active parameters vs V4-Pro's 1.6T / 49B. Flash is significantly cheaper and faster while sharing the same hybrid attention architecture.
DeepSeek makes V4-Flash available via their API. The open-weight model enables enterprise deployments that need cost-efficient V4 quality on their own infrastructure.
V4-Flash brings the V4 architecture down to 284B total / 13B active parameters, keeping the 1M-token context while drastically reducing inference cost.