MiniMax has announced the release of MiniMax-01 Series 2, which includes updates to their AI model lineup, notably MiniMax-Text-01. MiniMax-Text-01 is a state-of-the-art Mixture of Experts (MoE) language model with 456 billion parameters, 45.9 billion of which are activated per token.

The model uses a hybrid attention mechanism, combining Lightning Attention and Softmax Attention to optimize performance. It supports a significant context length, with the ability to train up to one million tokens and process up to four million tokens. This makes it suitable for tasks requiring deep contextual understanding and working with long texts. Additionally, the use of Rotary Position Embedding (RoPE) enhances positional encoding, ensuring efficient processing of complex data.
The MiniMax-01 model is now open source, making it accessible to a wide range of users. Key features include 80 layers alternating attention mechanisms, 32 experts within the MoE framework, a hidden size of 6144, and a vocabulary size of 200,064 tokens. The MiniMax-01 Series 2 models demonstrate competitiveness compared to other leading AI systems such as Qwen and DS3, especially in long-context understanding benchmarks.