Name: Falcon 180B
Author: Technology Innovation Institute

Technology Innovation Institute's massive 180B parameter open-access model trained on 3.5T tokens. Causal decoder-only architecture with 80 layers, hidden dimension 14,848, vocabulary size 65,024. Trained on up to 4,096 A100 GPUs using Amazon SageMaker for ~7M GPU hours. Dataset consists of 85% RefinedWeb plus curated conversations, technical papers, and code (~3%). Achieved 68.74 on Hugging Face Open LLM Leaderboard - highest among open models at release. Surpassed Meta's LLaMA 2 and ranks near GPT-4 and PaLM 2. 2.5x larger than Llama 2 with 4x more compute. Released under Falcon 180B TII License (Apache 2.0 based) for research and commercial use.

Falcon 180B

Strengths

Caveats

Capabilities

Resources

Reviews

Comments