Skip to main content
llm.info

StarCoder 2 15B

Open Weights

BigCode

BigCode's open-source code generation model with 15B parameters trained on 4T+ tokens from The Stack v2 dataset covering 600+ programming languages. Features 16,384-token context window enabling handling of longer codebases and elaborate instructions. Trained on 1T tokens of permissively licensed GitHub data including code, Git commits, issues, and Jupyter Notebooks. Achieves 33.6%-44.2% on HumanEval and excels on MultiPL-E across 16 of 18 programming languages. Best-performing large model on DS-1000 benchmark. Outperforms CodeLlama-34B on math and code reasoning while matching low-resource language performance. Released under permissive BigCode OpenRAIL-M license. Collaboration between ServiceNow, Hugging Face, and NVIDIA.

Strengths

Caveats

Capabilities

Vision
Audio
Video
Tool Use

Resources

No external resources available

Reviews

Comments