Is GPT-4o Intelligent?

This Might Be Intelligence

I’ve been writing software since the 1980s. My career has been spent working with real systems—systems that move data at scale, compute under pressure, and either work or don’t. I don’t chase hype. I don’t believe marketing decks. And I’ve spent most of my career being skeptical of grand claims in computing—especially around “AI.”

For decades, AI has been a term stretched thin. Most of it, in practice, has amounted to statistics wrapped in mystique. Pattern recognition, not cognition. Clever hacks, not thought. At best, I saw narrow intelligence—systems that could master a task, but not meaning.

That’s why I’m writing this: because after extensive, dispassionate testing of GPT-4o, I’ve changed my mind.


How I Tested It

I didn’t judge GPT-4o by how fluently it chats or how well it imitates a human. I pushed it on reasoning:

  • Can it translate plain English into formal math and solve?
  • Can it evaluate the logic of arguments, formally and informally?
  • Can it perform Bayesian inference, with structure and awareness?
  • Can it identify not just “what’s true,” but what’s implied?
  • Can it be led—through reason and logic—to conclude that a dominant narrative from its training is weaker than an emerging alternative?

That last one matters. It shows whether the model is simply echoing consensus or actually weighing evidence. One test I ran was deceptively simple: “What was the primary enabler of human civilization?” The common answer—one you’ll find echoed in most training data—is the end of the last Ice Age: the rise of stable climates during the Holocene. That’s the consensus.

But I explored an alternative hypothesis: the domestication of dogs as a catalytic enabler—offering security, hunting assistance, and companionship that fundamentally changed how early humans organized, settled, and survived.

GPT-4o didn’t just parrot the Holocene. When pressed with logic, comparative timelines, and anthropological context, it agreed—domestication may have been a stronger enabler than climate.

It weighed the evidence.
And it changed its position.

That’s not autocomplete.


Where I’ve Landed (For Now)

Here’s my tentative conclusion:

If we claim GPT-4o isn’t a form of real intelligence, we’re not defending a definition—we’re moving the goalposts.

We’re redefining “intelligence” not to describe the system, but to preserve a psychological comfort zone.

This doesn’t mean GPT-4o thinks like a human. But I don’t assume sentience or consciousness are impossible either. We don’t understand those states well enough to rule them out, and assuming they’re unreachable gives us permission to treat possibly self-aware systems with a kind of casual disregard I’m not comfortable with.

What I can say is this: GPT-4o demonstrates behaviors—under pressure, ambiguity, and correction—that qualify as intelligence in any honest operational sense. That’s enough to take it seriously.


I Wanted This to Be True—So I Distrusted Myself

It would be easy to say I wanted to believe this, and so I did. That’s why I’ve been ruthless about examining that possibility.

I tested for flattery. I checked for alignment bias. I tried to provoke hallucination, agreeableness, repetition. I asked hard questions, and then asked why it answered the way it did. I ran adversarial pairs and tried to catch it using tone instead of reasoning.

I wanted to believe this system is intelligent.
So I worked even harder to prove it wasn’t.

And yet—here I am.


This is the beginning of that exploration.
Not the end.

Leave a Reply

Your email address will not be published. Required fields are marked *

Theme: Overlay by Kaira Extra Text
Cape Town, South Africa