To mark the first anniversary of Mistral 7B, Mistral AI introduced the new Ministral 3B and Ministral 8B models, designed specifically for on-device use and edge computing scenarios. These models aim to deliver high performance and efficiency within the sub-10 billion parameter category, offering up to 128,000 context length (the current version supports 32,000).
Ministral 8B features an innovative interleaved sliding window attention mechanism, which reduces latency and optimizes memory usage. This makes the models an ideal solution for local computations without internet connectivity — for example, for autonomous robots, on-device analytics, and direct translation.

The Ministral 3B and 8B models can also serve as efficient intermediaries in complex workflows, where their ability to call functions and distribute tasks ensures minimal latency. Combined with larger language models such as Mistral Large, they enable flexible and cost-effective on-demand data processing.
Starting today, both models are available for use on the company’s platform. The cost is $0.1 per million tokens for Ministral 8B and $0.04 for Ministral 3B. Mistral AI also offers assistance in model fine-tuning and quantization for maximum performance in users’ specific tasks.