ChatGPT is not “true AI.” A computer scientist explains why
- Artificial intelligence has been a dream for centuries, but it only recently went “viral” because of enormous progress in computing power and data analysis.
- Large language models (LLMs) like ChatGPT are essentially a very sophisticated form of auto-complete. The reason they are so impressive is because the training data consists of the entire internet.
- LLMs might be one ingredient in the recipe for true artificial general intelligence, but they are surely not the whole recipe — and it is likely that we don’t yet know what some of the other ingredients are.
Thanks to ChatGPT we can all, finally, experience artificial intelligence. All you need is a web browser, and you can talk directly to the most sophisticated AI system on the planet — the crowning achievements of 70 years of effort. And it seems like real AI — the AI we have all seen in the movies. So, does this mean we have finally found the recipe for true AI? Is the end of the road for AI now in sight?
AI is one of humanity’s oldest dreams. It goes back at least to classical Greece and the myth of Hephaestus, blacksmith to the gods, who had the power to bring metal creatures to life. Variations on the theme have appeared in myth and fiction ever since then. But it was only with the invention of the computer in the late 1940s that AI began to seem plausible.
A recipe for symbolic AI
Computers are machines that follow instructions. The programs that we give them are nothing more than finely detailed instructions — recipes that the computer dutifully follows. Your web browser, your email client, and your word processor all boil down to these incredibly detailed lists of instructions. So, if “true AI” is possible — the dream of having computers that are as capable as humans — then it too will amount to such a recipe. All we must do to make AI a reality is find the right recipe. But what might such a recipe look like? And given recent excitement about ChatGPT, GPT-4, and BARD — large language models (LLMs), to give them their proper name — have we now finally found the recipe for true AI?
For about 40 years, the main idea that drove attempts to build AI was that its recipe would involve modelling the conscious mind — the thoughts and reasoning processes that constitute our conscious existence. This approach was called symbolic AI, because our thoughts and reasoning seem to involve languages composed of symbols (letters, words, and punctuation). Symbolic AI involved trying to find recipes that captured these symbolic expressions, as well as recipes to manipulate these symbols to reproduce reasoning and decision making.
Symbolic AI had some successes, but failed spectacularly on a huge range of tasks that seem trivial for humans. Even a task like recognizing a human face was beyond symbolic AI. The reason for this is that recognizing faces is a task that involves perception. Perception is the problem of understanding what we are seeing, hearing, and sensing. Those of us fortunate enough to have no sensory impairments largely take perception for granted — we don’t really think about it, and we certainly don’t associate it with intelligence. But symbolic AI was just the wrong way of trying to solve problems that require perception.
Neural networks arrive
Instead of modeling the mind, an alternative recipe for AI involves modeling structures we see in the brain. After all, human brains are the only entities that we know of at present that can create human intelligence. If you look at a brain under a microscope, you’ll see enormous numbers of nerve cells called neurons, connected to one another in vast networks. Each neuron is simply looking for patterns in its network connections. When it recognizes a pattern, it sends signals to its neighbors. Those neighbors in turn are looking for patterns, and when they see one, they communicate with their peers, and so on.
Somehow, in ways that we cannot quite explain in any meaningful sense, these enormous networks of neurons can learn, and they ultimately produce intelligent behavior. The field of neural networks (“neural nets”) originally arose in the 1940s, inspired by the idea that these networks of neurons might be simulated by electrical circuits. Neural networks today are realized in software, rather than in electrical circuits, and to be clear, neural net researchers don’t try to actually model the brain, but the software structures they use — very large networks of very simple computational devices — were inspired by the neural structures we see in brains and nervous systems.
Neural networks have been studied continuously since the 1940s, coming in and out of fashion at various times (notably in the late 1960s and mid 1980s), and often being seen as in competition with symbolic AI. But it is over the past decade that neural networks have decisively started to work. All the hype about AI that we have seen in the past decade is essentially because neural networks started to show rapid progress on a range of AI problems.
I’m afraid the reasons why neural nets took off this century are disappointingly mundane. For sure there were scientific advances, like new neural network structures and algorithms for configuring them. But in truth, most of the main ideas behind today’s neural networks were known as far back as the 1980s. What this century delivered was lots of data and lots of computing power. Training a neural network requires both, and both became available in abundance this century.
All the headline AI systems we have heard about recently use neural networks. For example, AlphaGo, the famous Go playing program developed by London-based AI company DeepMind, which in March 2016 became the first Go program to beat a world champion player, uses two neural networks, each with 12 neural layers. The data to train the networks came from previous Go games played online, and also from self-play — that is, the program playing against itself. The recent headline AI systems — ChatGPT and GPT-4 from Microsoft-backed AI company OpenAI, as well as BARD from Google — also use neural networks. What makes the recent developments different is simply their scale. Everything about them is on a mind-boggling scale.
Massive power, massive data
Consider the GPT-3 system, announced by OpenAI in the summer of 2020. This is the technology that underpins ChatGPT, and it was the LLM that signaled a breakthrough in this technology. The neural nets that make up GPT-3 are huge. Neural net people talk about the number of “parameters” in a network to indicate its scale. A “parameter” in this sense is a network component, either an individual neuron or a connection between neurons. GPT-3 had 175 billion parameters in total; GPT-4 reportedly has 1 trillion. By comparison, a human brain has something like 100 billion neurons in total, connected via as many as 1,000 trillion synaptic connections. Vast though current LLMs are, they are still some way from the scale of the human brain.
The data used to train GPT was 575 gigabytes of text. Maybe you don’t think that sounds like a lot — after all, you can store that on a regular desktop computer. But this isn’t video or photos or music, just ordinary written text. And 575 gigabytes of ordinary written text is an unimaginably large amount — far, far more than a person could ever read in a lifetime. Where did they get all this text? Well, for starters, they downloaded the World Wide Web. All of it. Every link in every web page was followed, the text extracted, and then the process repeated, with every link systematically followed until you have every piece of text on the web. English Wikipedia made up just 3% of the total training data.
What about the computer to process all this text and train these vast networks? Computer experts use the term “floating point operation” or “FLOP” to refer to an individual arithmetic calculation — that is, one FLOP means one act of addition, subtraction, multiplication, or division. Training GPT-3 required 3 x 1023 FLOPs. Our ordinary human experiences simply don’t equip us to understand numbers that big. Put it this way: If you were to try to train GPT-3 on a typical desktop computer made in 2023, it would need to run continuously for something like 10,000 years to be able to carry out that many FLOPs.
Of course, OpenAI didn’t train GPT-3 on desktop computers. They used very expensive supercomputers containing thousands of specialized AI processors, running for months on end. And that amount of computing is expensive. The computer time required to train GPT-3 would cost millions of dollars on the open market. Apart from anything else, this means that very few organizations can afford to build systems like ChatGPT, apart from a handful of big tech companies and nation-states.
Under the hood of the LLM
For all their mind-bending scale, LLMs are actually doing something very simple. Suppose you open your smartphone and start a text message to your spouse with the words “what time.” Your phone will suggest completions of that text for you. It might suggest “are you home” or “is dinner,” for example. It suggests these because your phone is predicting that they are the likeliest next words to appear after “what time.” Your phone makes this prediction based on all the text messages you have sent, and based on these messages, it has learned that these are the likeliest completions of “what time.” LLMs are doing the same thing, but as we have seen, they do it on a vastly larger scale. The training data is not just your text messages, but all the text available in digital format in the world. What does that scale deliver? Something quite remarkable — and unexpected.
The first thing we notice when we use ChatGPT or BARD is that they are extremely good at generating very natural text. That is no surprise; it’s what they are designed to do, and indeed that’s the whole point of those 575 gigabytes of text. But the unexpected thing is that, in ways that we don’t yet understand, LLMs acquire other capabilities as well: capabilities that must be somehow implicit within the enormous corpus of text they are trained on.
For example, we can ask ChatGPT to summarize a piece of text, and it usually does a creditable job. We can ask it to extract the key points from some text, or compare pieces of text, and it seems pretty good at these tasks as well. Although AI insiders were alerted to the power of LLMs when GPT-3 was released in 2020, the rest of the world only took notice when ChatGPT was released in November 2022. Within a few months, it had attracted hundreds of millions of users. AI has been high-profile for a decade, but the flurry of press and social media coverage when ChatGPT was released was unprecedented: AI went viral.
The age of AI
At this point, there is something I simply must get off my chest. Thanks to ChatGPT, we have finally reached the age of AI. Every day, hundreds of millions of people interact with the most sophisticated AI on the planet. This took 70 years of scientific labor, countless careers, billions upon billions of dollars of investment, hundreds of thousands of scientific papers, and AI supercomputers running at top speed for months. And the AI that the world finally gets is… prompt completion.
Right now, the future of trillion-dollar companies is at stake. Their fate depends on… prompt completion. Exactly what your mobile phone does. As an AI researcher, working in this field for more than 30 years, I have to say I find this rather galling. Actually, it’s outrageous. Who could possibly have guessed that this would be the version of AI that would finally hit prime time?
Whenever we see a period of rapid progress in AI, someone suggests that this is it — that we are now on the royal road to true AI. Given the success of LLMs, it is no surprise that similar claims are being made now. So, let’s pause and think about this. If we succeed in AI, then machines should be capable of anything that a human being is capable of.
Consider the two main branches of human intelligence: one involves purely mental capabilities, and the other involves physical capabilities. For example, mental capabilities include logical and abstract reasoning, common sense reasoning (like understanding that dropping an egg on the floor will cause it to break, or understanding that I can’t eat Kansas), numeric and mathematical reasoning, problem solving and planning, natural language processing, a rational mental state, a sense of agency, recall, and theory of mind. Physical capabilities include sensory understanding (that is, interpreting the inputs from our five senses), mobility, navigation, manual dexterity and manipulation, hand-eye coordination, and proprioception.
I emphasize that this is far from an exhaustive list of human capabilities. But if we ever have true AI — AI that is as competent as we are — then it will surely have all these capabilities.
LLMs are not true AI
The first obvious thing to say is that LLMs are simply not a suitable technology for any of the physical capabilities. LLMs don’t exist in the real world at all, and the challenges posed by robotic AI are far, far removed from those that LLMs were designed to address. And in fact, progress on robotic AI has been much more modest than progress on LLMs. Perhaps surprisingly, capabilities like manual dexterity for robots are a long way from being solved. Moreover, LLMs suggest no way forward for those challenges.
Of course, one can easily imagine an AI system that is pure software intellect, so to speak, so how do LLMs shape up when compared to the mental capabilities listed above? Well, of these, the only one that LLMs really can claim to have made very substantial progress on is natural language processing, which means being able to communicate effectively in ordinary human languages. No surprise there; that’s what they were designed for.
But their dazzling competence in human-like communication perhaps leads us to believe that they are much more competent at other things than they are. They can do some superficial logical reasoning and problem solving, but it really is superficial at the moment. But perhaps we should be surprised that they can do anything beyond natural language processing. They weren’t designed to do anything else, so anything else is a bonus — and any additional capabilities must somehow be implicit in the text that the system was trained on.
For these reasons, and more, it seems unlikely to me that LLM technology alone will provide a route to “true AI.” LLMs are rather strange, disembodied entities. They don’t exist in our world in any real sense and aren’t aware of it. If you leave an LLM mid-conversation, and go on holiday for a week, it won’t wonder where you are. It isn’t aware of the passing of time or indeed aware of anything at all. It’s a computer program that is literally not doing anything until you type a prompt, and then simply computing a response to that prompt, at which point it again goes back to not doing anything. Their encyclopedic knowledge of the world, such as it is, is frozen at the point they were trained. They don’t know of anything after that.
And LLMs have never experienced anything. They are just programs that have ingested unimaginable amounts of text. LLMs might do a great job at describing the sensation of being drunk, but this is only because they have read a lot of descriptions of being drunk. They have not, and cannot, experience it themselves. They have no purpose other than to produce the best response to the prompt you give them.
This doesn’t mean they aren’t impressive (they are) or that they can’t be useful (they are). And I truly believe we are at a watershed moment in technology. But let’s not confuse these genuine achievements with “true AI.” LLMs might be one ingredient in the recipe for true AI, but they are surely not the whole recipe — and I suspect we don’t yet know what some of the other ingredients are.