Mistral, a French developer of large language models, has announced a new API for processing complex PDF documents. Mistral OCR is an optical character recognition tool that converts PDFs into text files, making them easier to use with AI models. Unlike most similar APIs, Mistral OCR is multimodal and can recognize illustrations and photographs embedded in text blocks, drawing bounding boxes around them.
It is important to note that the output of Mistral OCR is not just plain text—it is formatted in Markdown, allowing developers to add links, headings, and other formatting elements. This makes it especially valuable for models like ChatGPT, which actively use Markdown to generate formatted text. According to Mistral co-founder Guillaume Lample, this tool will help companies convert complex documents into an AI-friendly format.
Mistral OCR is available on the company’s own API platform or through cloud partners such as AWS, Azure, and Google Cloud Vertex. For companies working with confidential data, Mistral offers the option of local deployment. The Paris-based company claims its OCR model outperforms solutions from Google, Microsoft, and OpenAI, especially when working with documents containing complex layouts or tables.
Additionally, Mistral has already integrated its OCR into its own assistant Le Chat, enabling it to quickly analyze the contents of PDF files before processing them. Mistral OCR is expected to find applications in various fields, including law firms, which will be able to process large volumes of documents more efficiently.