DeepSeek V4-Pro — 1.6T MoE flagship
DeepSeek

DeepSeek V4-Pro

DeepSeek's flagship MoE for the V4 generation

DeepSeek V4-Pro is a 1.6T-parameter Mixture-of-Experts model with 49B active parameters per token, hybrid attention and 73% lower per-token inference FLOPs vs V3.2.

Key features

DeepSeek V4-Pro · AI Models

Parameters 1.6T total · 49B active
Context window 1M tokens
Released April 24, 2026
Pricing DeepSeek API
Key features

DeepSeek V4-Pro

DeepSeek V4-Pro is a 1.6T-parameter Mixture-of-Experts model with 49B active parameters per token, hybrid attention and 73% lower per-token inference FLOPs vs V3.2.

Key features

  • 1.6T total / 49B active parameters MoE.
  • Hybrid attention combining Compressed Sparse and Heavily Compressed Attention.
  • 73% reduction in per-token inference FLOPs vs DeepSeek V3.2.
  • 90% reduction in KV-cache memory footprint.
Best for

Best for

Pick DeepSeek V4-Pro for the highest DeepSeek quality on hard reasoning, large-context workloads and self-hosted enterprise inference.

Frequently Asked Questions

What replaced DeepSeek V3?

DeepSeek V4 replaces V3. V3 and V3.2 are being retired after July 24, 2026.

Open Chat

DeepSeek V4-Pro is a 1.6T-parameter Mixture-of-Experts model with 49B active parameters per token, hybrid attention and 73% lower per-token inference FLOPs vs V3.2.

Open Chat