StarCoder 2 15B
BigCode
BigCode's open-source code generation model with 15B parameters trained on 4T+ tokens from The Stack v2 dataset covering 600+ programming languages. Features 16,384-token context window enabling handling of longer codebases and elaborate instructions. Trained on 1T tokens of permissively licensed GitHub data including code, Git commits, issues, and Jupyter Notebooks. Achieves 33.6%-44.2% on HumanEval and excels on MultiPL-E across 16 of 18 programming languages. Best-performing large model on DS-1000 benchmark. Outperforms CodeLlama-34B on math and code reasoning while matching low-resource language performance. Released under permissive BigCode OpenRAIL-M license. Collaboration between ServiceNow, Hugging Face, and NVIDIA.
Strengths
Caveats
Capabilities
Resources
No external resources available