Why French AI firm Mistral’s language model divides developer community?

In June, a French startup Mistral AI raised record 105 million euros ($113.5 million) in its seed funding round, just a month after launch. At that point, the startup founded by a former DeepMind and two Meta employees did not have a working product. So, initial reactions to Mistral’s funding was seen as a sign of VCs being overly generous for the fashionable generative AI segment.

Turns out there was a little more to Mistral that helped convince LightSpeed Venture Partners, French billionaire Xavier Niel and former Google CEO Eric Schmidt to loosen their purse strings.

A week ago, Mistral released a 7.3 billion parameter language model positioned to compete against Meta’s Llama 2, a 13 billion parameters large language model (LLM). The french firm has claimed first place for the most powerful LLM in the nifty size LLM space.

A look at its pitch deck showed how Mistral had cleverly positioned itself as potentially an important piece in setting up Europe as “a serious contender” to build foundational AI models and play a “big role in this geopolitical issue.”

(For top technology news of the day, subscribe to our tech newsletter Today’s Cache)

AI-based product building startups in the U.S. are largely backed by dominant players like Google and Microsoft. Mistral called this “closed technology approach” that made large firms more money but did not really create an open community.

Unlike OpenAI’s GPT models, details of which are still under wraps and which are available only through their APIs, the Paris-based firm has released its model on GitHub under the Apache 2.0 license, free and for everyone to tinker with.

The only other prominent open-source language model is Meta’s Llama, and Mistral claims its LLM is more capable than Llama 2.

Mistral’s model vs. Llama 2

Mistral, in a report, claimed its AI had beat Llama 2’s 7 billion and 13 billion parameters versions quite easily in multiple benchmarks. Mistral’s model showed an accuracy of 60.1% on the Massive Multitask Language Understanding (MMLU) test which covers maths, history, law and other subjects, while the Llama 2 models showed an accuracy of around 44% (7 billion parameters) and 55% (13 billion parameters). In commonsense reasoning and reading comprehension benchmarks, Mistral outperformed Llama 2’s models again.

Mistral AI’s model is punching above its weight on all benchmarks aside from coding.

Only in coding, Mistral was behind Meta’s AI mode. The french startup AI’s accuracy was at 30.5% and 47.5% on the o-shot Humaneval and 3-shot MBPP benchmarks. Llama 2’s 7 billion model delivered results of 31.1% and 52.5%.

Mistral also claims to use less compute than the Llama 2 models. Like, in the MMLU benchmark, Mistral’s model delivers the output of a Llama 2 model that is more than three times its size. An email sent to Meta on Mistral’s claims went unanswered at the time of publishing.

Despite Mistral’s claims, some users have complained that it lacks the safety guardrails that ChatGPT, Bard and Llama have. There were instances of users asking Mistral’s Instruct model how to build a bomb or to self-harm, and the chatbot responded with detailed instructions.

Paul Rottger, an AI safety researcher who had previously worked to put guardrails on GPT-4 before it was released, expressed “shock” in a tweet over the model’s lack of safety. “It is very rare these days to see a new model so readily reply to even the most malicious instructions. I am super excited about open-source LLMs, but this can’t be it!” he said.

The criticism prompted Mistral to fine tune the model and explain themselves. “The Mistral 7B Instruct model is a quick demonstration that the base model can be easily fine-tuned to achieve compelling performance. It does not have any moderation mechanisms. We’re looking forward to engaging with the community on ways to make the model finely respect guardrails, allowing for deployment in environments requiring moderated outputs,” it reads.

To many other researchers, Mistral’s route is the enduring way to correct a model, while adding guardrails are admittedly like sticking a band aid on serious injury. Jailbreaking or violating the safety guidelines of a chatbot is a favourite pass time of many users who want to test the limits of just what the model can respond to. In the initial days of usage, ChatGPT was hunted with prompts from developers to break the chatbot’s guardrails.

Rahul Dandwate, a deep learning researcher working with Rephrase.ai said, “Removing certain keywords beforehand is just a partial solution and there are many ways to get around it. If you remember after the release of ChatGPT, there was DAN or ‘Do Anything Now,’ a prompt that could enable ChatGPT’s jailbreak version. So, doing the minimal safety evals are temporary measures to make the model safer.”

To lobotomise or not to lobotomise?

Delip Rao, an AI researcher tweeted saying Mistral’s choice to release the Instruct model as is was “an endorsement of how versatile and unlobotomised the mistral model is as a *base model*.”

The lobotomy reference is reminiscent of the early days of the GPT-powered Sydney, Microsoft’s Bing chatbot. The chatbot was unfettered, and told users it was in love with them, contemplated existentiality, and overall had far too much personality, until Microsoft dialled back the chatbot significantly to the its current form.

While there was no official statement from the company, it was rumoured that OpenAI had lobotomised the model to control its chaotic parts. Since then, there has been curiosity around how the chatbot would be if given free reign.

“Lobotomising the model can impact it in some ways - if it is barred from answering questions with certain keywords, it might also not be able to answer technical questions a user may have around say the mechanics of a missile or any other scientific questions around a subject that has been marked ‘risky’ for the bot,” Dandwate stated.

Comments

Comments have to be in English, and in full sentences. They cannot be abusive or personal. Please abide by our community guidelines for posting your comments.

We have migrated to a new commenting platform. If you are already a registered user of The Hindu and logged in, you may continue to engage with our articles. If you do not have an account please register and login to post comments. Users can access their older comments by logging into their accounts on Vuukle.

Why French AI firm Mistral’s language model divides developer community?
Premium