Name: Mixtral 8x7B
Author: Mistral AI

Mistral AI's groundbreaking Sparse Mixture of Experts model with 46.7B total parameters. Released December 2023, uses 13B active parameters per token while achieving performance of 6x larger models. Outperforms Llama 2 70B on most benchmarks with 6x faster inference. Matches or exceeds GPT-3.5 across all evaluated benchmarks. Features 32K context window with sliding window attention enabling theoretical 128K token span. Best open-weights chatbot model as of December 2023 per MT-Bench. Supports 5 languages (English, French, German, Spanish, Italian). Apache 2.0 license enables unrestricted commercial use.

Mixtral 8x7B

Strengths

Caveats

Capabilities

Resources

Reviews

Comments