Skip to main content
llm.info

Llama 3.2 90B Vision

Open Weights

Meta

Meta's first large-scale open-weights vision model with 90B parameters. Released September 2024 as part of Llama 3.2 family. Built on Llama 3.1 with trained vision adapter using cross-attention layers integrating pre-trained image encoder. Processes high-resolution images up to 1120x1120 pixels. Features 128K context window for extensive multimodal conversations. Excels at visual recognition, image reasoning, captioning, document understanding (charts, graphs), and visual grounding. Outperforms many closed models like Claude 3 Haiku on image understanding tasks. Supports 8 languages for text, English-only for vision tasks.

Strengths

Caveats

Capabilities

Vision
Audio
Video
Tool Use

Resources

No external resources available

Reviews

Comments