Meta has officially announced Llama 4 — a new series of generative AI models already integrated into its assistant on WhatsApp, Messenger, and Instagram platforms. This collection includes the Llama 4 Scout and Llama 4 Maverick models, which are now available for download on Meta and Hugging Face. Llama 4 Scout, a smaller model, can run on a single Nvidia H100 GPU, while Llama 4 Maverick, in terms of its characteristics, is similar to GPT-4o and Gemini 2.0 Flash.
According to Meta, Llama 4 Scout has a context window of ten million tokens and outperforms Google’s Gemma 3 and Gemini 2.0 Flash-Lite models, as well as the open-source Mistral 3.1, across many metrics. The larger Maverick model also demonstrates high performance compared to OpenAI’s GPT-4o and Google Gemini 2.0 Flash, while using less than half the active parameters.
Currently, Meta continues to train Llama 4 Behemoth, which has 288 billion active parameters and two trillion parameters in total. Although this model has not yet been released, Meta claims that Behemoth is capable of outperforming competitors in several STEM benchmarks. For Llama 4, the company switched to a “mixture of experts” architecture, which allows resources to be saved by using only those parts of the model needed to complete a specific task.
Meta continues to position Llama 4 as “open source,” although this license has certain restrictions. For example, commercial organizations with more than seven hundred million monthly active users must obtain permission from Meta to use its models. This has drawn criticism from the Open Source Initiative, which believes that such a license does not meet open source principles.