Have humans lost their analogical edge?
Keith Holyoak outlines a theme from his new book, ‘The Human Edge: Analogy and the Roots of Creative Intelligence’.
25 February 2025
We humans – and no other species that currently shares the earth with us – are able to grasp that blindness relates to sight in much the same way as poverty relates to wealth (one is the lack of the other), and that the plot of The Lion King resembles that of Hamlet (even though one story is about talking lions and the other about medieval Danish aristocrats). Such judgments depend on reasoning by analogy. What makes analogy special is that it fundamentally depends on judgments of similarity between relations. In The Human Edge: Analogy and the Roots of Creative Intelligence, I argue that explicit relations – ones we can think about and talk about – provide a critical human edge separating our minds from other forms of biological intelligence. Analogy is basic not only to intelligence, but to the more elusive human capacity for creativity. For example, the earliest scientific theory still accepted today – the wave theory of sound – was proposed by a Roman engineer who noticed an analogy between the behavior of sound and water. Sound rebounds off barriers – creating an echo – much as water waves rebound off the shore.
There is now a serious question as to whether human analogical ability has been equaled or even exceeded by the latest advances in artificial intelligence (AI), particularly large language models (LLMs) such as the well-known chatGPT and its cousins. This is just one manifestation of wide-ranging controversies about the expected impact of AI on society – jobs destroyed and jobs created, democracy endangered and latent human potential realised. Cognitive scientists have debated whether chatGPT represented the dawn of some new and perhaps superhuman form of intelligence, or was merely an overhyped monstrosity inclined to spew bloviating nonsense.
In considering what LLMs can or can't do, we need to start by acknowledging how surprising it is that they can achieve much of anything. It's far from obvious that simply predicting missing words in text (the basic task on which the systems are trained) would lead to facility in using a natural language such as English. Unlike a child learning a language, LLMs are not provided with any sort of physical context, or communicative support from a parent, that could help them give meaning to bare symbols. The early LLMs received no perceptual input – they could neither see nor hear – and had no ability to move around or interact with the 3D world.
To get a sense of how general knowledge could be extracted by predicting words in a text, we need to think about how the vast body of training data came into existence. That is, how was all that text generated in the first place? Individual people wrote it. Insofar as humans tacitly know the rules that govern their language – that sentences generally require a verb, that verbs carry tense markings based on number and person, and all the other constraints that make up syntax – they generally write sentences that honor those rules. Moreover, their sentences usually 'make sense' – they describe probable situations more often than impossible or nonsensical ones. Further, the texts that people compose reflect their interests, beliefs, and emotional attitudes. Individual writers often have stylistic preferences – the writings of Virginia Woolf display linguistic patterns distinct from those apparent in the works of Ernest Hemingway.
These forces that govern the generation of text by people suggest how an LLM might solve the problem of text prediction. Given a large and deep network, the system might in essence 'work backwards' from patterns found in training texts to the rules and regularities that controlled how the texts were written. It would be efficient, for example, for the system to 'notice' that some words have the systematic properties of what we humans call 'nouns', while others have the properties of 'verbs'. Somewhere in the LLM's vast network, representations could be created that might be interpreted as the signatures of nouns, verbs, and other parts of speech. Using its novel transformer architecture, a network may start to encode abstract properties of language – and of human thinking – that help to predict missing words in texts. By feeding the massive amounts of data to the system, LLMs may in fact create representations of concepts and relations that to some significant degree approximate those of humans – a kind of convergent evolution by machines.
Cognitive scientists tested early LLMs such as chatGPT on reasoning problems of various sorts, to compare their performance with that of humans. Of most direct relevance here, a team from our lab systematically tested variants of chatGPT on analogy problems for which we had comparable data from humans (mostly college students). The early LLMs were trained only on verbal material (text and code), and lacked a vision module. Because of this limitation, we used only non-visual problems to evaluate chatGPT. Given that we didn't have access to any details about the vast set of texts on which chatGPT had been trained, we created novel problems to ensure the system couldn't simply regurgitate answers to problems it had already encountered. One test required completing digit matrices, which we constructed using rules that matched a well-known visual test of intelligence. Other tests involved four-term verbal analogies and analogies between simple stories.
The general picture that emerged from our evaluation was that chatGPT's performance on these sets of analogies ranged from somewhat lower than the average for college students to somewhat higher. Moreover, whatever factors made analogy problems easy or hard for people had a comparable impact on chatGPT. Basically, when the program was compared with humans who had been asked to solve the same sets of analogies, chatGPT looked like 'one of the gang' – nothing in its performance clearly betrayed its nonhuman identity.
But although the mechanisms incorporated into LLMs may have some important links to building blocks of human reasoning, we must also entertain the possibility that this type of machine intelligence is fundamentally different from the human variety. Because LLMs are not subject to human cognitive limits on memory or attention, they may solve analogies by mechanisms that are not available to people. Neither can LLMs be considered good models of the evolution of analogical reasoning. Their analogical abilities are derived entirely from being trained to predict human-generated text – a poor match to the ancestral environment of early hominids! Because human language is replete with analogies and metaphors, accurately predicting words in text likely requires an ability to appreciate their structure. But there is no reason to suppose that the same system, absent human-generated inputs, would spontaneously develop a disposition to think analogically, as apparently happened at some point in human evolution.
In fact, to the extent LLMs capture the analogical abilities of adult human reasoners, their capacity to do so is fundamentally parasitic on natural human intelligence. Humans created computers and the internet, not to mention the electricity required to power all that machinery. Then some very smart people coded LLMs. Finally, we fed LLMs the digitised records of pretty much all of human written culture, and let them take it from there.
Many AI models are generative – they produce responses to queries – so it's natural to consider the question of whether they are capable of creativity. And indeed, if a creative product is simply defined as one that is novel and valuable, AI has clearly passed the bar. Current models have made serious contributions to science – synthesising new proteins, classifying galaxies, and analysing historical writings stored in archives. They routinely aid in generating useful new solutions to many non-trivial problems that arise in everyday tasks, such as writing computer code, composing advertisements, and finding new recipes.
However, the creative potential of current AI models remains limited. AI models lack creative autonomy. They simply respond to questions or problems posed by human users. AI models do not choose their own problems without human guidance, and have no intrinsic motivation to create. In general, they also lack the ability to identify their own 'interesting' new discoveries – it's up to a human to find a problem worth solving, pose it to the AI, and then evaluate potential solutions that the AI generates. Without question, the contribution of the AI is often indispensable, not only in scientific applications but also in computer-generated art. For example, AI has been used to generate dynamic visual art – 'living paintings' – derived from recordings of a person's brain waves. In such cases it seems natural enough to describe the AI as a 'co-creator'. But without a human to set the basic agenda, the AI will do nothing and create nothing.
Another fundamental impediment to full creativity becomes apparent when we consider how large language models are trained. For example, chatGPT is trained with two basic objectives: (1) predict the most probable text completion based on what people have already written, and (2) give the user the answer they're looking for. These objectives would make an excellent recipe for snuffing out any spark of serious creativity! Large language models are basically imitation machines, equipped with an enormous body of training data to guide them. But whereas humans sometimes transcend their training data – an apprentice may eventually surpass the creative power of their teacher – current AI can only imitate. As I've described, chatGPT was able to solve a wide range of problems, with accuracy comparable to college students. But solving analogies is itself a form of imitation, albeit a very sophisticated one: the 'analogy game' is to make the new situation imitate the better-known source. Analogy can certainly contribute to creativity. But at least so far, chatGPT and its near relatives depend on a human to pose an analogy problem and ask the AI to solve it. The initial analogical spark – noticing spontaneously that a certain source analog might illuminate some novel situation in a way that advances knowledge – has yet to be struck by an AI.
Particularly in the case of artistic creativity, AI faces what may well be an insurmountable limit. Human creativity depends on 'mind-wandering' guided by the residue of a unique individual's lifetime of experiences, tinged with subtle emotions. And in most forms of art – perhaps most obviously music and lyric poetry – the essential point is for the creator to convey an emotional experience to their audience through the medium of their artistic creation. An AI has none of this – no emotions, no consciousness, and no individuality. In fact, the general complaint about AI-generated products, whether in the form of writing, music, or visual art, is that they have no soul – no expression of emotions or of an individual point of view. Absence of soul is not a problem – may even be a virtue – when the desired product is a routine summary of a scientific paper, the text and images for an advertisement, or bland 'mood music'. But to attain the higher reaches of artistic creativity, the deficit is fatal. Authentic creativity is inextricably bound to the nature of the generative process. AI heralds the Age of Inauthenticity.
- Keith J. Holyoak is Distinguished Professor of Psychology at the University of California, Los Angeles. A recipient of a Guggenheim Fellowship and the Warren Medal from the Society of Experimental Psychologists, he is the author of several books in cognitive science, including The Spider's Thread, as well as four volumes of poetry and a book of translated classical Chinese poetry.
- The Human Edge: Analogy and the Roots of Creative Intelligence is published by The MIT Press. Adapted with Permission from The MIT Press. Copyright 2025.