Skip to main content
llm.info

MPT-30B

Open Weights

MosaicML

MosaicML's (now Databricks) efficient 30B parameter decoder-style transformer pretrained from scratch on 1T tokens of English text and code. Features 8K token context window with support for extrapolation via ALiBi. Trained initially on 1T tokens using 2K sequences then additional 50B tokens of 8K sequences. First LLM trained on NVIDIA H100 GPUs. Outperforms originally published GPT-3 using ~1/6th the parameters. Competitive with LLaMa-30B and Falcon-40B on open-source benchmarks. Surpasses purpose-built models like StarCoder on HumanEval. Designed for easy deployment on single GPU - 1xA100-80GB (16-bit) or 1xA100-40GB (8-bit). Features FlashAttention for efficient inference. Released under Apache 2.0. Available in base, instruct, and chat variants.

Strengths

Caveats

Capabilities

Vision
Audio
Video
Tool Use

Resources

No external resources available

Reviews

Comments