Why LLMs Imitate Thinking but Cannot Think
On the limits of large language models, the illusion of logic, and why plausibility is not proof.
Hello, friends.
First, a quick note before diving into today’s topic. I know I’ve been away for a while. Between too many projects and the usual storm of work, I had to step back from writing here. But I’ve missed this space so much, and I’m glad to return with today’s piece. Thank you for your patience.
In the meantime, I’ve also been diving deeper into some niche topics that I’ll be sharing with you in the coming posts.
And I’ve taken some time to think carefully about the future of this publication, and about my own direction as a writer. You’ll see some of that reflection woven into what comes next.
So, let’s get back to thinking together.
The Old Dream
Since the birth of computing, one question has quietly haunted generations of thinkers, straddling the worlds of philosophy, mathematics, and engineering: Can machines think?
It is a deceptively simple question, yet its implications ripple through ethics, cognition, and the very nature of intelligence at its root.
Alan Turing, in 1950, approached the puzzle with both rigor and creativity. In his seminal paper “Computing Machinery and Intelligence,” he reframed the philosophical dilemma into what he called the “imitation game”, now widely known as the Turing Test.
Instead of asking, vaguely, whether a machine can think, Turing asked whether a machine could convince a human interlocutor that it was human through conversation alone.
In doing so, he shifted the debate from abstract metaphysics to measurable behavior: a practical benchmark for intelligence that could, in principle, be tested.
Turing’s brilliance lay in his subtle acknowledgment that intelligence might not reside in what a machine “is” but in what it does, in the patterns it produces, and in the reactions it evokes.
Decades later, Marvin Minsky, a pioneering figure in artificial intelligence, made another provocative claim. He described the human brain as a “meat machine,” encapsulating the idea that cognition is fundamentally computational, reducible to mechanisms, albeit made of flesh rather than silicon.
While the exact origin of this phrasing is somewhat debated, the earliest known attribution is a 1982 essay titled “Why People Think Computers Can’t” in his influential collection, The Society of Mind.
There, Minsky argued that human intelligence emerges from the interactions of simple processes, much like software emerges from hardware, challenging long-held dualistic notions that separate mind from body.
The term “meat machine” was, in essence, a metaphor: a way to frame the brain as a physical, mechanistic system capable of computation and pattern recognition.
Both Turing and Minsky, though separated by decades and methodological approaches, were wrestling with the same tension: intelligence as a phenomenon versus intelligence as a mechanism.
Turing measured it through behavior; Minsky analyzed it through structure. Together, their insights laid the groundwork for our modern conception of artificial intelligence: machines that do not merely follow instructions but are able to generate patterns that mimic understanding.
Fast forward to today, and we witness a proliferation of labs and companies; OpenAI, Google DeepMind, Anthropic, and many others are announcing each successive generation of models as progressively “closer” to human reasoning.
GPT-5, Claude Opus, Gemini 2.5 pro, DeepSeek V3: these engineering marvels are no longer just “language models.” They are marketed as reasoning engines. They promise more than fluency; they promise judgment, analysis, even decision-making. They promise, in short, the tantalizing illusion of thought.
And yet, promises are fragile. Behind the hype, behind carefully curated demos and press releases, lies a more subtle, and perhaps more fascinating, reality.
What we often call “reasoning models” may be, at their core, stochastic parrots with extended attention spans. The very architecture of large language models, which consists of layers of attention, transformers, and embeddings, may inherently forbid the kind of genuine logical reasoning we instinctively project onto them.
What appears to be thought may be nothing more than a mirage: dazzling, seductive, but untouchable.
This is not a critique born of disappointment. Large language models are pretty much astonishing. They can draft essays, debug complex codebases, invent characters, summarize centuries of philosophy, generate music, and even help design proteins.
But their brilliance lies not in understanding but in plausibility. They produce text that looks like reasoning rather than reasoning itself. They mimic patterns of intelligence, not the mechanisms of cognition.
Our challenge, then, is not to diminish their power but to clarify its boundaries. If we are to coexist with these machines, if we are to delegate tasks, make decisions alongside them, and perhaps eventually entrust them with values, we must first understand what they are; but, equally importantly, what they are not.
The Core Limitation: Stochastic ≠ Logical
And here we arrive at the heart of the matter: the deep, structural tension between stochastic language models and genuine logical reasoning.
This is where the most promising-looking models, the so-called “reasoning engines”, confront their invisible ceiling. It’s subtle, almost invisible at first, yet inescapable once understood.
Reasoning, at its core, is about necessity. Consider the classic syllogism:
All men are mortal.
Socrates is a man.
Therefore, Socrates is mortal.
The conclusion follows with certainty. There is absolutely no margin of error. It is not “likely” that Socrates is mortal; it is inevitable.
The chain of reasoning is rigid, formal, and absolute. One misstep, one misapplied rule, and the logic collapses: there is no halfway point, not even a “probably correct” outcome.
Language models, on the other hand, operate in a fundamentally different universe. They were designed at their core to speak in probabilities, not certainties.
When an LLM generates a sentence, it does not declare truth; it declares plausibility: “Given everything I have learned, this is the word or phrase most likely to follow.”
Even if that likelihood is extraordinarily high, even if it seems convincing to a human reader, it remains a statistical guess, never a guarantee.
The distinction may seem subtle at first, but it is foundational. A system trained to optimize for plausibility cannot, by design, produce necessity. To conflate the two is like mistaking the weather forecast for the laws of physics.
Weather predictions are often accurate and may become even more precise with improved data, advanced modeling, and better technology. However, they can never guarantee certainty, as the laws of physics dictate all possible outcomes. Plausibility can be misleading, but it cannot enforce truth.
This is the origin of what researchers call hallucinations. Ask an LLM, “Who won the Nobel Prize in Literature in 2022?” and it may reply, “Margaret Atwood”.
Why this answer? Not because the model is lying, and not because it is broken or is under a cyberattack. Atwood’s name simply appears often in contexts discussing Nobel Prizes, and the model has learned to string together the most statistically likely continuation.
The words form a smooth, confident answer, a mirror of truth, but the underlying mechanism is pattern matching, not fact-checking. The model has no awareness of events, no mechanism to verify reality; it only knows what sequences of words are probable.
And here lies the original sin of LLMs: plausibility masquerading as truth. Their genius (the ability to produce coherent, fluent, humanlike text) is also their Achilles’ heel.
Every sentence is a very sophisticated guess. Every paragraph is a carefully weighted probability distribution. Even when the text is correct, it is correct by coincidence rather than necessity.
Consider mathematics, the realm of absolute reasoning. Ask a model, “Prove that the sum of the first n odd numbers equals n²”. The model might output a correct-looking inductive proof, perfectly formatted, using language reminiscent of a textbook.
Yet a closer look can reveal gaps: subtle missteps in the base case, skipped logic, or misapplied inductive steps. Each output is plausible; few outputs are guaranteed. The model can simulate reasoning beautifully, but it cannot ensure correctness.
In practical terms, this mismatch shapes everything we do with LLMs. We can rely on them to explore ideas, draft narratives, or suggest hypotheses. But we cannot rely on them to verify truths, to perform rigorous proofs, or to make life-or-death decisions without human oversight. Their fluency creates the illusion of reasoning; their stochastic nature ensures that the illusion is never complete.
And so, we circle back to a simple, uncomfortable truth: large language models are not reasoners. They are sophisticated plausibility engines. Every “logical” answer is only as solid as the statistical patterns it has observed. Necessity cannot emerge from probability alone. The mirage shimmers; it dazzles, it invites, but it cannot be grasped.
This is the structural boundary of the new AI frontier. No amount of data, no number of parameters, no clever chain-of-thought prompting can eliminate it. The gulf between what looks like reasoning and what is reasoning is not an engineering problem; it is a mathematical and philosophical one, baked into the very architecture of these models.
The Seduction of Fluency
Let us begin with a simple but dangerous observation: language feels like thought. When I say, “Two plus two equals four”, the words themselves seem to carry necessity, as though the truth lives inside the sentence.
When a large language model writes, “The derivative of sin(x) is cos(x),” the phrase has the same cadence, the same authority, as when a mathematician announces it at the blackboard. On the surface, everything aligns: the phrasing, the grammar, the expected rhythm of reasoning.
This is the source of the seduction. Large language models occupy the outer skin of cognition. They replicate the phenomenology of thought, the way reasoning sounds to us when expressed in language.
And because humans are deeply vulnerable to mistaking words for ideas, form for substance, we are easily deceived.
The machine does not need to think in order to convince us it has thought; it only needs to produce the kinds of sentences we associate with thinking.
This is hardly a new vulnerability. Plato warned us in the Phaedrus that writing itself would deceive, that the written word seems alive but cannot explain itself, cannot defend itself, cannot truly “know.”
The printed page, for him, was already a kind of parrot, mimicking the voice of knowledge while lacking its soul. LLMs simply magnify this ancient danger: they generate pages upon pages that look alive, but the animation is statistical, not rational at all.
To see the actual difference, imagine three cases:
A parrot trained to say “two plus two equals four.”
A student who memorizes the multiplication table but cannot explain why 7 × 8 = 56.
A mathematician who derives, from Peano’s axioms, that 2 + 2 = 4.
On the surface, all three produce the same utterance. But beneath the utterance lies a vast gulf: mimicry, rote recall, genuine reasoning. The form is constant, the substance actually varies a lot.
Large language models belong somewhere between the first two cases. Like parrots, they lack comprehension; like students, they have absorbed staggering amounts of data and patterns, yet without necessarily grasping the underlying structure.
They are not mathematicians deriving truth from axioms. They are expert artisans of plausibility.
And plausibility has its power. Trained on trillions of words, LLMs know (in the statistical sense of “know”) how human sentences are shaped. They know that “the capital of France is” is almost always followed by “Paris.”
They know that “because of gravity” is a likely continuation of something like “objects fall”. They know how to string together analogies, summaries, and arguments with an elegance that feels eerily intelligent.
But fluency is not reasoning. Fluency is just raw performance. It is the surface simulation of thought.
A courtroom actor can mimic the voice of a lawyer without ever having studied law; a child can repeat a prayer without understanding its theology. We do not call these acts reasoning, we just call them imitation.
The mistake, then, is not in the machine. It is in us. We humans have always over-ascribed intelligence to anything that talks like us.
Joseph Weizenbaum, creator of the famous ELIZA chatbot in the 1960s, was shocked that people poured their hearts into conversations with what was, in essence, a handful of pattern-matching tricks.
His secretary reportedly asked to be left alone with ELIZA, not because the machine was smart, but because its fluency felt pretty much like understanding.
What was true of ELIZA in 1966 is exponentially more true of GPT-5 in 2025. The form of fluency has become nearly indistinguishable from the real thing.
The danger, however, is not that the model reasons: it simply doesn’t. The danger is that we mistake our own projection for its cognition. We meet the sentences halfway, supplying the ghost of thought where only probabilities live.
To put it differently: the seduction of fluency is the seduction of a mirror. We see ourselves reflected back, more articulate, more confident, sometimes even more creative than we feel in our own minds. But a reflection, no matter how vivid, is not the thing itself.
And this, precisely, is where the story of reasoning models begins: with a powerful illusion.
Bounded Computation: The Second Wall
The second insight is more technical, but its implications are no less sobering: large language models are not open-ended computational engines.
Every token they generate emerges from a finite, structured set of operations: matrix multiplications, weighted attention across embeddings, and non-linear transformations.
Each step is bounded in both time and depth. They cannot indefinitely expand reasoning steps to match task complexity, nor can they loop until a condition is met in the way a conventional algorithm might.
Contrast this with even the simplest programs. A graph-search algorithm, for instance, can explore nodes recursively until a path is found. A mathematician proving a theorem can return to prior steps, revise assumptions, or extend calculations ad infinitum.
Computation, in these systems, is potentially unbounded; it can grow with the difficulty of the problem. LLMs, by contrast, are like a pianist forced to play exactly one note per second, regardless of whether the score demands an intricate crescendo or a fugue of infinite complexity.
Their “thought” is fixed in depth and granularity, parceled out in uniform, predetermined steps.
From a formal perspective, this places LLMs below the threshold of Turing completeness. They are closer in spirit to finite automata than to general-purpose computers.
While they can simulate aspects of reasoning, the simulation occurs within a strictly bounded cage of operations.
No matter the scale of parameters, the volume of data ingested, or the sophistication of training methods, the structural ceiling remains: stochastic prediction combined with bounded computation cannot yield truly open-ended reasoning.
Chain-of-Thought, Self-Consistency, and Tool Integration
And yet, these models perform astonishing feats. How do they achieve this? The answer lies in clever workarounds, which are hacks that stretch the statistical machinery to appear more thoughtful than it is.
Consider chain-of-thought prompting. Instead of asking a model, “What is 37 multiplied by 48?” directly, one instructs it: “Think step by step.” Suddenly, its accuracy spikes.
Why? The model distributes probability across intermediate reasoning tokens. It echoes the patterns of reasoning observed in textbooks, tutorials, and worked examples.
Crucially, the model does not reason; it mimics the form of reasoning. This imitation suffices for structured tasks but is not a guarantee of logical correctness.
Self-consistency sampling works on a similar principle. By generating multiple candidate reasoning paths and selecting the most frequent answer, hallucinations decrease. Yet this is not genuine verification. The system is converging on probability peaks, not proof.
Even more advanced strategies, like recursive reflection or “let’s check our work” prompts, layer redundancy onto the model’s output. These techniques produce the appearance of deliberation.
Tool integration, such as connecting the model to a calculator or an external database, introduces deterministic computation, but the reasoning itself remains orchestrated, not performed, by the LLM.
In every case, these hacks are prosthetics, not transformations. They extend the puppet’s strings; they do not free the puppet from them.
The fundamental limitations, like stochasticity instead of necessity, bounded computation instead of open-ended reasoning, remain intact.
Meaning, Ambiguity, and Formalization
The technical constraints of LLMs hint at a deeper philosophical challenge: what if formal reasoning from natural language is, at its core, impossible?
Natural language is rich, ambiguous, context-dependent, and metaphorical. Consider the sentence:
“Every man loves some woman.”
A naive formalization yields either ∀x∃y Loves(x, y) or perhaps ∃y∀x Loves(x, y). Yet the human understanding of this sentence depends on context, pragmatics, and shared social knowledge. No parser, no algorithm, is able to perfectly freeze the fluidity of meaning without some kind of distortion.
This echoes long-standing philosophical traditions. Ludwig Wittgenstein, in his later work, emphasized language games: meaning arises from use, not from an abstract, universal mapping to logic.
W.V.O. Quine argued that even seemingly analytic truths are grounded in a web of belief, subject to revision and ambiguity. Meaning resists formal capture; ambiguity is not noise, but the very essence of language.
For LLMs trained purely on language, this represents a fundamental ceiling. They can simulate reasoning in localized contexts, drawing on patterns and correlations.
But the global, formal, provable reasoning that underpins mathematics, science, and law remains perpetually out of reach. Machines chasing reasoning through text alone are, perhaps, chasing a phantom.
The Mirror of Human Limits
A common objection arises: humans are imperfect reasoners too. We sometimes misremember, fall into cognitive biases, and misapply logic in every kind of domain.
Does that not make LLMs just another kind of reasoner?
Not quite. Humans possess something called meta-reasoning: the capacity to step outside a chain of thought, reflect, revise, and verify. We can move between natural language and formal symbolic systems.
We can correct mistakes and prove truths. LLMs, in contrast, fail because they are incapable of truth; humans fail despite the capability for truth.
The mirage of intelligence in LLMs is seductive because it reflects our own reasoning back at us. Fluency becomes a distorted mirror: we see reasoning where there is only statistical mimicry.
The Practical Consequences
Why should these distinctions matter beyond philosophy? Because mistaking plausibility for judgment has tangible consequences.
In medicine, an LLM hallucinating a diagnosis could endanger patients. In law, it could fabricate case precedents. In science, it could assert confidently incorrect results. Believing these models reason is dangerous; using them responsibly requires recognizing their boundaries.
LLMs excel as tools of language, not engines of truth. They can assist in brainstorming, summarization, and creativity. But in domains demanding provability, determinism, or moral judgment, their outputs must always be scrutinized.
Toward a New Understanding
Where does this leave us? Several paths emerge:
Pessimism: abandon attempts at machine reasoning, reverting to classical symbolic AI.
Pragmatism: use LLMs as linguistic engines, augmented with verifiable external systems.
Conceptual clarity: accept that words like “reasoning” and “understanding” carry centuries of philosophical baggage. Misapplying them to stochastic models invites confusion.
Perhaps the most productive stance is to name LLMs for what they are: generative mirrors of human language.
They do not think, but they echo traces of our collective thought. They do not reason, but they remix the residue of reasoning found in text. They are less like minds and more like living, improvisational libraries.
The Final Provocation
Plato, in the Phaedrus, worried that writing could erode memory, that reliance on external marks might weaken genuine understanding. Today, we might worry that LLM fluency could erode our vigilance, that reliance on probabilistic reasoning might dull our demand for truth.
The danger is not that machines will think like us; it is that we might begin to think like machines, content with plausibility, seduced by surface coherence, inattentive to verification.
The mirage of machine reasoning is not merely a technical curiosity; it is a philosophical challenge. Engaging with it forces us to confront age-old questions anew: What is thought? What is truth? How do we distinguish between simulation and understanding?
LLMs, for all their brilliance, are a mirror, by showing us not just the limits of computation, but the contours of our own minds. And perhaps, in recognizing their limitations, we learn something deeper about the human capacity for reason itself.




Indeed, 'Artificial imitation,' or when malicious programming is suspected, 'Artificial Intimidation.'
I do think that the limitations we see ('Hallucinations,' large-scale 'Lying') in General or Conversational AI stem from those old Logic bugaboos, Incompleteness (it does not 'know' what it does not know but it still stops processing to provide an answer!), and Undefinability (it can functionally define or use neither what it is and does, nor its own consistency). Although such matters seem more esoteric than they ought!
Hey there ! A joy to read your article as of style ! Yet, strict logic, if reached by improved refining ( continuous logic debate about improved probability derived results ) is still strict logic. Humans often fail to grasp all complex nuances related. AIs won’t. Besides humans too function by learning about probabilities and patterns, or stay stupid 😅 . Reading on 🪸