DeepSeek-V3
Open Weights
DeepSeek
DeepSeek's groundbreaking 671B parameter Mixture-of-Experts model with 37B activated per token. Released December 2024 with MIT license enabling unrestricted commercial use. Outperforms open-source models and achieves performance comparable to leading closed-source models (GPT-4, Claude) on most benchmarks. Excels particularly on math and code tasks. Uses Multi-head Latent Attention (MLA) and DeepSeekMoE architectures for efficient inference. Pretrained on 14.8T diverse tokens with only 2.788M H800 GPU hours - breakthrough training efficiency. Pioneers auxiliary-loss-free load balancing and multi-token prediction objectives.
Strengths
Caveats
Capabilities
Vision
Audio
Video
Tool Use
Resources
No external resources available