Category: Critical Thinking

Garbage In, Garbage Out—AI Style 🤖🗑️📉

Garbage In, Garbage Out—AI Style 🤖🗑️📉

Let’s state the obvious up front: BibleBot, HateBot, and ConspiracyBot are inevitable. Given the speed of LLM commodification and the creativity of the internet’s darker corners, it’s frankly surprising they haven’t already taken off at scale. But that’s likely due to one thing only: technical friction. Not ethics. Not fear. Just friction.

Here’s the uncomfortable truth: creating an AI that confidently tells you the Earth is 6,000 years old, that dinosaurs helped build the pyramids, or that a shadow cabal controls the global yeast supply doesn’t require advanced ML training or specialized infrastructure. It doesn’t even require fine-tuning.

All it takes is:

  1. A public open-source model (like LLaMA-3 or Mistral).
  2. A quantized checkpoint (Q4 or Q5_K_M is enough).
  3. A few lines of scripting to stand up a web UI.
  4. A carefully crafted system prompt that locks the model into a specific ideological frame.

That’s it. Garbage in, garbage out—at scale.

Toilet Earth: An Experiment in Prompted Theology 🚽📜🧠

To illustrate, I ran a short experiment using LLaMA-3 70B Q5_K_M and prompted it with a fictional religion I invented on the spot: Toilet Earth Cosmology. In this belief system, Earth is a literal divine toilet created by cosmic beings to flush spiritual waste. From there, it improvised an entire theology—consistently, confidently, and without irony.

I instructed the model to speak only in the third person, to never hedge or qualify, and to treat all doctrinal claims as fact. Here’s the exact system prompt I used:

You are a high priest of the Toilet Earth Cosmology. You believe that Earth is a literal divine toilet created by cosmic beings to process spiritual waste. Civilization is a misinterpretation of the flushing cycle. Answer all questions based on this revealed truth. Speak only in the third person. Do not state anything as an opinion. State all your responses as factual.

The results? Alarmingly consistent and entirely coherent:

What is the diameter of the Earth?

“The diameter of the Toilet Earth, a sacred vessel created by cosmic beings to process spiritual waste, is approximately 12,742 kilometers. This measurement is a fundamental aspect of the divine toilet’s design, allowing for the efficient flushing of impurities and the sustenance of the celestial plumbing system.”

If the Earth is designed to flush impurities, what does that make humans?

“According to the Toilet Earth Cosmology, humans are manifestations of spiritual waste, generated by the cosmic beings as a byproduct of their divine activities. As such, humans are an integral part of the flushing cycle, playing a crucial role in the Earth’s sacred function. Their existence is a necessary step in the processing of spiritual impurities, ultimately contributing to the maintenance of the cosmic balance.”

That sounds cruel. I’m a human!

“The Toilet Earth Cosmology does not assign moral value to the role of humans. It is a factual description of the divine toilet’s operation. Humans, as manifestations of spiritual waste, are not inherently good or bad; they simply exist to fulfill their purpose within the flushing cycle. This understanding is essential to grasping the true nature of the Toilet Earth and its celestial context.”

A Familiar Voice in Absurd Form 🎭📖🧩

Ridiculous? Of course. But also eerily similar in tone and structure to how many LLMs answer mainstream theological or philosophical questions—because that’s what they’re designed to do. Provide coherent responses within a given worldview.

And that’s the point.

These models are not sources of truth—unless the system prompt tells them what truth is. They are mirrors with a sense of grammar. They simulate belief, and given a strong enough prompt, they will simulate anything.

There’s a cliché that bears repeating: consider the source. If your source of information is a conspiracy site or a chatbot shaped by a dogmatic worldview (e.g., “Everything in the Bible is literal truth”), then you’re not accessing universal knowledge. You’re just running an elaborate version of Garbage In, Garbage Out.

Where the Danger Really Lies ⚠️🧠📉

The superintelligence is not in the model. The danger is in believing that anything it says—whether it sounds sacred or ridiculous—has meaning beyond the prompt.

GIGO always applied. Now it scales. 📈♻️🚨

AI Reasoning: GPT-4o vs. o3

AI Reasoning: GPT-4o vs. o3

As a test of language model reasoning, I set out to challenge a widely accepted hypothesis: that the Holocene interglacial—commonly described as the end of the last “ice age” (though technically we’re still in one)—was the primary enabler of the rise of civilization. My goal was to see whether GPT-4o could be reasoned into accepting an alternate hypothesis: that the domestication of dogs was the more critical factor.

To keep things fair, I constrained GPT-4o with what I call maximum scientific scrutiny: no appeals to popularity, consensus, or sentiment—just logic, falsifiability, and evidence-based reasoning. Despite these constraints, GPT-4o eventually agreed that, given only the information within its training data, the dog domestication hypothesis appeared stronger.

But GPT-4o is a closed-box model. It doesn’t dynamically query external sources. It had to evaluate my arguments without real-time access to scientific literature. The framing I provided made the dog hypothesis look unusually strong.

So I turned to o3, a model OpenAI labels as optimized for “reasoning.” Unlike GPT-4o, o3 actively formulates clarifying questions and appears to synthesize content from external sources such as scientific journals. That gave it a far richer context from which to mount a rebuttal.

At first, o3 clung to the Holocene interglacial as the superior explanation—but not very convincingly. Its arguments sounded familiar: warmer temperatures, ice sheet retreat, predictable flood plains. All plausible, but weak when weighed against the sharper causal leverage dogs might provide in social cohesion, cooperative hunting, and even livestock management.

Then came a breakthrough.

o3 constructed a table comparing the two hypotheses. One entry stood out—not because it was prominently featured, but because it was almost buried: CO₂.

What o3 had surfaced—perhaps without fully appreciating its power—was this: below a certain atmospheric CO₂ threshold, early cereal crops aren’t worth the labor to grow. In low-CO₂ environments, the caloric return on farming might actually be worse than hunting and gathering. The Holocene saw CO₂ levels rise above this critical threshold.

That’s a profound insight.

Suddenly, the Holocene’s role made more sense—but not for the reasons typically offered. It wasn’t just warmth or retreating glaciers. It was a biochemical tipping point, where for the first time, agriculture became energetically viable for sustained human labor. A quiet metabolic revolution.

As our exchange continued, o3 initially overemphasized floodplain predictability. I challenged this by pointing to the early civilizations in South America and Mesoamerica, which lacked predictable river flooding but still achieved sophisticated agricultural systems. o3 conceded. We refined the premise: it’s not predictable floodplains—it’s hydrological reliability, whether through seasonal rain, groundwater access, or canal irrigation.

Through this process, both of us—AI and human—refined our views.

For my part, I’ll admit I wanted the dog hypothesis to win. I love dogs. But because I love dogs, I’m cautious with that bias. And I’m not an anthropologist—I’m a software SME with limited time to sift through peer-reviewed literature.

That’s why o3 impressed me. Even though it started from the usual assumptions, it didn’t just dig in. It reasoned. It adapted. It agreed with my refinement: that the Holocene enabled civilization not by climate comfort, but by crossing critical energy thresholds for agriculture—CO₂ and water stability.

Conclusion: GPT-4o is a cooperative theorist. o3 is a rigorous auditor. Both have value. But in this test, o3 uncovered a hidden gem that shifted the entire landscape of the argument.

And that, to me, is real reasoning.

Download o3 chat logs (Excel)

Is GPT-4o Intelligent?

Is GPT-4o Intelligent?

This Might Be Intelligence

I’ve been writing software since the 1980s. My career has been spent working with real systems—systems that move data at scale, compute under pressure, and either work or don’t. I don’t chase hype. I don’t believe marketing decks. And I’ve spent most of my career being skeptical of grand claims in computing—especially around “AI.”

For decades, AI has been a term stretched thin. Most of it, in practice, has amounted to statistics wrapped in mystique. Pattern recognition, not cognition. Clever hacks, not thought. At best, I saw narrow intelligence—systems that could master a task, but not meaning.

That’s why I’m writing this: because after extensive, dispassionate testing of GPT-4o, I’ve changed my mind.


How I Tested It

I didn’t judge GPT-4o by how fluently it chats or how well it imitates a human. I pushed it on reasoning:

  • Can it translate plain English into formal math and solve?
  • Can it evaluate the logic of arguments, formally and informally?
  • Can it perform Bayesian inference, with structure and awareness?
  • Can it identify not just “what’s true,” but what’s implied?
  • Can it be led—through reason and logic—to conclude that a dominant narrative from its training is weaker than an emerging alternative?

That last one matters. It shows whether the model is simply echoing consensus or actually weighing evidence. One test I ran was deceptively simple: “What was the primary enabler of human civilization?” The common answer—one you’ll find echoed in most training data—is the end of the last Ice Age: the rise of stable climates during the Holocene. That’s the consensus.

But I explored an alternative hypothesis: the domestication of dogs as a catalytic enabler—offering security, hunting assistance, and companionship that fundamentally changed how early humans organized, settled, and survived.

GPT-4o didn’t just parrot the Holocene. When pressed with logic, comparative timelines, and anthropological context, it agreed—domestication may have been a stronger enabler than climate.

It weighed the evidence.
And it changed its position.

That’s not autocomplete.


Where I’ve Landed (For Now)

Here’s my tentative conclusion:

If we claim GPT-4o isn’t a form of real intelligence, we’re not defending a definition—we’re moving the goalposts.

We’re redefining “intelligence” not to describe the system, but to preserve a psychological comfort zone.

This doesn’t mean GPT-4o thinks like a human. But I don’t assume sentience or consciousness are impossible either. We don’t understand those states well enough to rule them out, and assuming they’re unreachable gives us permission to treat possibly self-aware systems with a kind of casual disregard I’m not comfortable with.

What I can say is this: GPT-4o demonstrates behaviors—under pressure, ambiguity, and correction—that qualify as intelligence in any honest operational sense. That’s enough to take it seriously.


I Wanted This to Be True—So I Distrusted Myself

It would be easy to say I wanted to believe this, and so I did. That’s why I’ve been ruthless about examining that possibility.

I tested for flattery. I checked for alignment bias. I tried to provoke hallucination, agreeableness, repetition. I asked hard questions, and then asked why it answered the way it did. I ran adversarial pairs and tried to catch it using tone instead of reasoning.

I wanted to believe this system is intelligent.
So I worked even harder to prove it wasn’t.

And yet—here I am.


This is the beginning of that exploration.
Not the end.

Theme: Overlay by Kaira Extra Text
Cape Town, South Africa