Llama 3.2 90B Vision
Open Weights
Meta
Meta's first large-scale open-weights vision model with 90B parameters. Released September 2024 as part of Llama 3.2 family. Built on Llama 3.1 with trained vision adapter using cross-attention layers integrating pre-trained image encoder. Processes high-resolution images up to 1120x1120 pixels. Features 128K context window for extensive multimodal conversations. Excels at visual recognition, image reasoning, captioning, document understanding (charts, graphs), and visual grounding. Outperforms many closed models like Claude 3 Haiku on image understanding tasks. Supports 8 languages for text, English-only for vision tasks.
Strengths
Caveats
Capabilities
Vision
Audio
Video
Tool Use
Resources
No external resources available