Skip to main content
llm.info

DeepSeek-V3

Open Weights

DeepSeek

DeepSeek's groundbreaking 671B parameter Mixture-of-Experts model with 37B activated per token. Released December 2024 with MIT license enabling unrestricted commercial use. Outperforms open-source models and achieves performance comparable to leading closed-source models (GPT-4, Claude) on most benchmarks. Excels particularly on math and code tasks. Uses Multi-head Latent Attention (MLA) and DeepSeekMoE architectures for efficient inference. Pretrained on 14.8T diverse tokens with only 2.788M H800 GPU hours - breakthrough training efficiency. Pioneers auxiliary-loss-free load balancing and multi-token prediction objectives.

Strengths

Caveats

Capabilities

Vision
Audio
Video
Tool Use

Resources

No external resources available

Reviews

Comments