Meta’s Llama 2 sticks out despite lagging behind the best AI models

Since Meta’s Large Language Model (LLM) LLaMa leaked earlier in February, it has became clear that there are two paths for the future of AI. One is the OpenAI route that trains AI models, spending billions of dollars, and offering them to businesses that can afford them. And the other is Meta’s track. With LLaMa, Meta is looking to build something more usable, nifty, and give it away for free.

While the first iteration of Meta’s LLM was but a brick in the wall, considering it wasn’t fine-tuned or pre-trained, the new one, Llama 2, is the entire package. And the social network giant has also built partnerships with AWS, Hugging Face, Databricks, and surprisingly, Microsoft’s Azure (which backs ChatGPT-maker OpenAI).

So, now that the landscape has been split wide open into open-source and “closed source,” where does Llama 2 stand in comparison with OpenAI and Google’s LLMs?

Smaller size

The first point of contention remains size. Llama 2 has three versions of the model with 7-billion, 13-billion and 70-billion parameters. Google’s chatbot, Bard is based on the underlying PaLM 2 model which has been trained on 340 billion parameters, according to internal documents seen by CNBC. But details around the model size of GPT-3.5, the model that ChatGPT is based on, are still unknown. (Although OpenAI did mention that text-davinci-003, from their GPT-3.5 series was built upon and therefore similar to an older model called Instruct GPT which was 175-billion parameters.)

The research papers released by OpenAI and Google on their bots had little to no technical information about the models. In contrast, the Llama 2 is transparent. This has helped it find unanimous appreciation among the AI community. “The Llama 2 paper is very approachable, very well done and has lots of interesting details,” Rajiv Shah, Machine Learning Engineer at open-source platform Hugging Face noted. The paper had even gone on to list the number of GPU hours that was required to train the models, Shah noted.

Needless to say, Llama 2 is free for research and commercial purposes as opposed to Google and OpenAI’s products, which is a huge plus for smaller entities and researchers.

Llama 2 was also trained on 40% more data than its predecessor, a total of 2 trillion tokens and has twice the context length of LLaMa. Google’s PaLM 2, on the other hand, was reportedly trained on 3.6 trillion tokens. Tokens, essentially a string of words, are an important ingredient in training LLMs. They help the model predict the next word in a sentence. The training data for OpenAI’s GPT-3.5 is still under wraps.

The research paper released stated that in terms of performance, Llama 2’s 70-billion model is closer to GPT-3.5 on benchmarks of language capability (MMLU) and math solving (GSM8K), but trails behind on coding parameters. The study also found that Llama 2 is also far behind OpenAI’s benchmark AI model GPT-4 and PaLM-2.

Easy to finetune

Llama 2’s underlying strength is in how it makes fine-tuning easier for researchers, meaning its coding capabilities can be easily improved upon. Victor Sanh, a lead Machine Learning scientist at Hugging Face posted on LinkedIn saying, “The pace of iteration on top of Llama v2 is unmatched. Yesterday, it was demos, tutorials, and integration in open-source libraries. Today, it is accelerated inference. Tomorrow, it will be entire creative side projects. Next month, it will be a wave of startups. So much value was unlocked, and such a hunger for open and transparent alternatives to closed models.”

Ravin Thambapillai, co-founder at AI startup, Credal.ai, also raved about how his company found Llama 2 a more accessible option to work with because of this. “Even though enterprises don’t care that much about who owns or controls the model itself, they want to fully own the “output” of the data, because they want to train / fine tune more models on that data.

Let’s say a robotics company needs a custom classifier that their robot can use to look at a photo and decide what action to perform.

“If companies use OpenAI or Anthropic’s LLMs to do this today, they can never improve their models over time because taking the output from previous examples, sorting the good ones from the bad, and feeding that back into the model to fine-tune the version is prohibited on all the powerful models today. Even the older, less powerful OpenAI models that do permit fine-tuning, are prohibitively expensive to do so,” Thambapillai explained.

Open or truly Open-Source?

There’s also something of a strange caveat that Llama 2 has, which was discovered belatedly. Meta has certain conditions around the licensing of Llama 2 which do not meet the technical definition of open-source as stated by its governing body, the Open Source Initiative or OSI. Specifically, Meta forbids Llama 2’s usage to train other models, while also stating that a special license must be obtained if the model is used in any app or service with more than 700 million monthly users.

These restrictions effectively contain competition, while also garnering the tag of being “open” to gain favour with independent researchers within the community, making Meta look much better than OpenAI and Google.

There are also some fresh additions to the Llama 2 paper in the aftermath of questions raised around the safety and environmental impact of LLMs. The paper has a separate section discussing the carbon footprint cost of training Llama 2 based on the number of GPU hours, which is a first.

Comments

Comments have to be in English, and in full sentences. They cannot be abusive or personal. Please abide by our community guidelines for posting your comments.

We have migrated to a new commenting platform. If you are already a registered user of The Hindu and logged in, you may continue to engage with our articles. If you do not have an account please register and login to post comments. Users can access their older comments by logging into their accounts on Vuukle.

Meta’s Llama 2 sticks out despite lagging behind the best AI models

Now that the landscape has been split wide open into open-source and “closed source,” where does Llama 2 stand in comparison with OpenAI and Google’s LLMs?

Top News Today

Comments

Meta’s Llama 2 sticks out despite lagging behind the best AI models

Now that the landscape has been split wide open into open-source and “closed source,” where does Llama 2 stand in comparison with OpenAI and Google’s LLMs?

Related stories

Related Topics

Top News Today

Comments