Llama 3.2 11B Vision
Open Weights
Meta
Meta's lightweight multimodal model balancing vision capabilities with efficiency. Released September 2024 alongside 90B variant as first Llama vision models. Features 11B parameters with 128K context window. Built on Llama 3.1 with vision adapter enabling image understanding up to 1120x1120 resolution. Optimized for visual recognition, image reasoning, captioning, document understanding, and visual grounding. More accessible hardware requirements than 90B while maintaining strong vision capabilities. Supports grouped-query attention for enhanced inference speed. Ideal for edge deployment and resource-constrained multimodal applications.
Strengths
Caveats
Capabilities
Vision
Audio
Video
Tool Use
Resources
No external resources available