We are only one year into a post-ChatGPT world, so it may seem a bit premature to question whether we’ve reached peak AI. Rarely, if ever, does the peak version of a technology emerge within year one — and AI’s main goal of artificial general intelligence (AGI) is still all to play for.
Hear me out, though.
It’s been a huge year for AI. In a blog published Tuesday, Bill Gates wrote at length about how “it’s clearer than ever how AI can be used to improve access to education, mental health, and more.”
But as we head into 2024, the question to ask seems less about how AI would be used. Instead, it’s more about whether generative AI, today’s most popular form of the technology, is already peaking.
In other words, is a step-change in performance in today’s most advanced generative AI models even possible?
Researchers say, in theory, increases in performance are possible. But in practice, it’s harder to achieve them.
Much of this discussion has been prompted following the release of Gemini. The long-awaited AI model was finally unveiled on December 6 as “Google’s most capable AI model yet.”
It was also meant to be Google’s response to OpenAI’s GPT-4, the large language model powering ChatGPT. With Gemini’s release coming nine months after GPT-4, expectations were high for it to push the generative AI field forward.
But Google’s own data suggests expectations have been tough to meet. Gemini Ultra, an advanced version of Google’s AI model that’s coming next year, barely inches ahead of GPT-4 on performance benchmarks.
On a measure of reading comprehension, Gemini Ultra scored 82.4 versus GPT-4’s 80.9. Gemini Ultra actually lost to GPT-4 on a benchmark measuring commonsense reasoning for everyday tasks.
Ethan Mollick, an associate professor teaching innovation at Wharton, noted on X on Tuesday that “it’s been a year & no one has beaten GPT-4.”
There has been a slate of AI models released that match an earlier version of OpenAI’s LLM. Elon Musk’s Grok, open-source model Mixtral, and Google’s Gemini Pro model seem on par with GPT-3.5, albeit with slightly lower accuracy. A pre-print paper submitted to ArXiv on Monday by researchers at Carnegie Mellon University confirmed this for Gemini Pro.
But no one has convincingly beaten GPT-4 yet.
“Will they? Is there some magic there? Does it indicate a limit to LLMs? Will GPT-4.5 be another huge jump?” Mollick wrote on X.
The industry is certainly keen to see a huge jump. Artificial general intelligence, essentially AI demonstrating cognitive abilities on the same level as humans, is the stated goal of key figures like OpenAI boss Sam Altman.
Despite no model emerging to convincingly beat GPT-4 yet, there’s a chance that one might emerge eventually.
Transformers, the neural networks powering LLMs, scale well when they have more parameters — the number of variables within an AI model — to work with. OpenAI has not disclosed how many parameters its models have, but some estimates suggest GPT-3 had roughly 175 billion parameters.
Alex Voica, a consultant at the Mohamed bin Zayed University of Artificial Intelligence, told Business Insider that transformer models do “scale quite linearly with the amount of data and compute” they’re given.
That means if you power these models with many more AI processors, such as GPUs, or increase their number of parameters exponentially, they should perform much better.
But that’s not exactly practical. “That’s really nice if you’re a company that can afford either to have vast amounts of data available — and there are only a handful of those companies around right now — and if you have vast amounts of compute,” Voica said.
It’s not just a practical issue, either. Recent research has pointed to transformers potentially having other limitations that might stop the industry from reaching AGI.
Google researchers published a paper last month suggesting that transformers aren’t very good at generalizing. That’s not a great sign if Altman and his rivals are hoping their transformer-based technologies will help them reach AGI.
Voica said several tech giants are working on solutions to this. One example is the world model. In simple terms, this would be a technical development on top of a transformer that gives it some capacity for reasoning.
“The transformer model is the equivalent of having really good memory, but the main limitation of that is the moment you step outside of its very good memory, it just falls apart very quickly,” Voica said.
“However if you can add to that vast memory the capacity to do some reasoning, which is the world model, then you would have something that is truly useful.”
It seems, however, that companies are not yet ready for a world model to enter the mainstream.
Read the full article here