Name: Llama 3.2 90B Vision
Author: Meta

Meta's first large-scale open-weights vision model with 90B parameters. Released September 2024 as part of Llama 3.2 family. Built on Llama 3.1 with trained vision adapter using cross-attention layers integrating pre-trained image encoder. Processes high-resolution images up to 1120x1120 pixels. Features 128K context window for extensive multimodal conversations. Excels at visual recognition, image reasoning, captioning, document understanding (charts, graphs), and visual grounding. Outperforms many closed models like Claude 3 Haiku on image understanding tasks. Supports 8 languages for text, English-only for vision tasks.

Llama 3.2 90B Vision

Strengths

Caveats

Capabilities

Resources

Reviews

Comments