Name: MPT-30B
Author: MosaicML

MosaicML's (now Databricks) efficient 30B parameter decoder-style transformer pretrained from scratch on 1T tokens of English text and code. Features 8K token context window with support for extrapolation via ALiBi. Trained initially on 1T tokens using 2K sequences then additional 50B tokens of 8K sequences. First LLM trained on NVIDIA H100 GPUs. Outperforms originally published GPT-3 using ~1/6th the parameters. Competitive with LLaMa-30B and Falcon-40B on open-source benchmarks. Surpasses purpose-built models like StarCoder on HumanEval. Designed for easy deployment on single GPU - 1xA100-80GB (16-bit) or 1xA100-40GB (8-bit). Features FlashAttention for efficient inference. Released under Apache 2.0. Available in base, instruct, and chat variants.

MPT-30B

Strengths

Caveats

Capabilities

Resources

Reviews

Comments