Mistral announced a new generative AI model — Mistral Small 3, featuring 24 billion parameters. It operates faster than larger models like Llama 3.3 70B and is an excellent alternative to closed proprietary models. This model is optimized for quick responses with low latency, making it especially attractive for local use.
Mistral Small 3 is designed to handle 80% of generative AI tasks that require accuracy and speed. The model delivers high performance with fewer layers, significantly reducing processing time. With over 81% accuracy on MMLU tests and a latency of 150 tokens per second, it is the most efficient in its category.
Open access to Mistral Small 3 is provided under the Apache 2.0 license, allowing unrestricted use and modification. The model is available on various platforms such as Hugging Face, Ollama, Kaggle, and Fireworks AI. In the near future, it will also appear on NVIDIA NIM and Amazon SageMaker.