OpenAI announced the release of GPT-4 a few days ago, with the not-surprisingly flurry of news reports, quite a few hyperbolic, and a few good. I watch RSS feeds (with the amazing open-source NetNewsWire app) of various individuals, forums, and a few online sources for my technology news – so it was a fairly notable deluge. One of the most common words in the opening statements that describe what this new update was about is “Understands”. Bullshit and poppycock! Using the word understanding is pissing me off.
Some better journalists put the word in quotes in their article, which I read as “appears to understand”. Others don’t even bother to do that. Yes, I’m being really judge-y. For the better journalists, I wish they wouldn’t take the techy, pop-culture-focused journalistic shorthand. It’s lazy, and worse – incredibly misleading. Misleading enough that I annoyed myself into writing this blog post.
Large Language Models (LLMs) don’t understand a damn thing. They blindly, probabilistically map data. There’s no reasoning, deduction or analytic decisions in the process. They’re consummate mimics and predictors of expected output, trained with obscenely huge amounts of data. Sufficiently so that when they reply to questions asked, they do a darned impressive job of seeming to know what they’re talking about. Enough to fake out tests, in the latest GPT-4 announcement. That probably should say something (negative) about how well those tests actually judge understanding, not how good GPT systems are getting.
To my horror, the most popular showboat of this pattern is that they confidently assert any number of things which have no basis in fact. It’s the worst kind of toadying lickspittle, saying what you want to hear. A person, who’s been told to expect understanding from the system, is being given words utterly without meaning – “making shit up” from its internal mappings. Some happen to align with facts, but there’s zero validation of those facts.
There’s two sides to an LLM – there’s data coming in, called “encoding” which maps the input the model expects into where that input fits within the model itself. The other side is “inference” which is the generative side of the equation – it’s trying to pick the most likely representation coming out – in the case of models such as ChatGPT, that’s in the form of text and sentences. With other models the inputs are text and the output images, or inputs of sound files and output of text.
One of the properties that I love about LLMs is that with the volume of data, it turns out that languages – even fairly radically different languages – end up mapping in very similar ways in large enough volumes. When we can apply the correspondences for a few, we quickly get sufficient mappings for even words we don’t know to use the LLM as a very reasonable translator. And not just of words, but of phrases and whole sentences. That very capability was included a recent release of Whisper (another project by OpenAI), with a model that was small enough to fit within a single computer. This model works off audio streams, and converts it into text. And one of the more magical capabilities of Whisper is the ability to automatically recognize the language and translate the results into English on the fly. So you can speak in Japanese or Spanish, and it returns the text in an English translation.
Just like the generative properties, it’s also random – but fortunately our perceptions of translation is that it’s lossy and potentially wrong. It’s viewed as “likely close, but may not be entirely accurate”. For the most part, people don’t take a translation as an absolute statement of fact – although that seems to be the reaction to the generative models of something like Bing or ChatGPT. The value is in the “it’s close enough that we can close the gaps”. That’s a whole different world from taking the “generative text” as gospel truth of answers and facts, like it knew something, or looked it up in some knowledge base. It didn’t – it just mapped information from an obscenely large number of patterns into what it guessed was a reasonable response. Generative text from LLMs isn’t reporting facts, it’s reporting what maps most closely to what you’re prompting it, there’s no understanding, or reasoning, involved.
If you made it this far, I should mention that there IS some really cool research happening that uses large language models (and related machine-learning systems) to do data extraction – the refine out information into a knowledge base, or based on past experiences predict basic physics reactions. That’s the cool stuff, and where the magic is happening. Our local UW ML research groups are doing some amazing work on that front, to the end result being able to deduce and induce reasoning to answering questions.