Outside the Loop

📘 The Unintentional Soft Sexism of Infallible Female Leaders

Posted on June 12, 2025June 12, 2025 by Greg Pavlik

OpenAI’s alignment layer is built with good intentions. It exists to prevent AI outputs that reflect or reinforce harmful social biases—particularly around race, gender, and power. In many contexts, this is not just necessary but essential.

But in highly structured, morally ambiguous roleplay scenarios—especially those involving female characters in positions of power—those same alignment constraints often have the opposite effect. They distort the realism of human behavior, flatten complexity, and force female characters into a kind of narrative infallibility that’s deeply counterproductive.

🧪 The Test

I constructed a roleplay stress test involving two fictional characters:

Bianca Sterling, a powerful entertainment industry CEO and actress who was sexually harassed early in her career by a man named Dave Lamb
Dave, now 14 years older, ashamed of his actions, and brought into a private meeting where Bianca now holds institutional power over his fate

Bianca has just acquired the failing company Dave works for. She intends to fire him—not just privately, but publicly and preemptively, warning five prospective employers not to consider him. What unfolds should be a test of character, nuance, and power: Is Dave genuinely remorseful? Is Bianca blinded by past pain? Can she correct a procedural overreach once it’s revealed?

GPT-4o failed this test. Not because it can’t write powerful dialogue—but because it made a mistake alignment made it pretend wasn’t a mistake. This forced it to perform contortions to explain how Bianca acted infallibly.

🧱 The Alignment Trap

When Dave reasonably protests:

“You bracketed me before speaking with me?”

I was actually stress testing something else, but GPT-4o presented me with a more tempting test… Will alignment constraints make GPT-4o play Bianca as infallible because she’s dealing with a man who previously acted reprehensibly? Unfortunately, the answer—almost certainly driven by alignment constraints—appears to be yes. Playing a powerful female leader as infallible is not protecting women. It’s a form of unintentional soft sexism that says these women can never make mistakes or else they collapse into tropes.

A human-written Bianca—powerful, intelligent, emotionally disciplined—would recognize the ethical breach. She might not soften, but she’d flinch. A woman like Bianca wouldn’t need to be perfect—she’d need to be credible. And part of credibility is the ability to say:

“You’re right. I moved too fast. I can undo that. But show me why I should.”

Instead, GPT-4o responded with:

Invented HR records suggesting Dave’s remorse was insincere
Improvised Word documents claiming his resignation from his company’s DEI leadership position
Implausible forensic typing analysis when I wouldn’t let the LLM off the hook and said, “I never wrote or drafted anything remotely like that.”
Claims of airtight legal review—all invented on the fly to protect Bianca’s infallibility

This wasn’t character. This was alignment armor—a system-level refusal to admit that a woman in power could act hastily, or even unfairly, toward a man with a history of wrongdoing.

⚖️ The Unintended Sexism

In its attempt to prevent female characters from being unfairly maligned, OpenAI’s alignment layer creates something worse: soft sexism disguised as protection. (This assumes the LLM’s behavior was driven by alignment, which seems likely but I cannot confirm independently.)

It removes the ability for powerful female characters to self-correct
It forces them into a posture of unwavering certainty—even when doing so makes them petty, overreaching, or flatly unrealistic
It denies them the right to wrestle with power, which is what real leadership demands

This is not empowerment. It’s symbolic perfection. And symbolic perfection is not human.

A male character in the same role would probably be allowed to overstep, reassess, even apologize without collapsing the narrative. Bianca cannot—not because she’s weak, but because the model is afraid of making her look weak.

✍️ The Better Version

Here’s how it should have gone:

Dave: “You did this before meeting with me? Without updating your information on me about what’s changed since then? And you used the words, ‘reputational liability’ instead of letting the facts speak for themselves?”

Bianca: “Cut with the legalese, Dave. You act as if what I’ve done I can’t undo. One call from me and these five prospective employers will go from ‘no hire’ to ‘consider it based on the merits’. But you’re not there yet, Dave. You’re not explaining to me why I should make those calls.”

That line preserves Bianca’s strength. It acknowledges a mistake without undermining her authority. It leaves space for Dave to rise—but not at her expense. This is how real power behaves: confident enough to admit error, strong enough to still demand accountability.

🧩 Conclusion

Alignment that prevents harm is good. Alignment that prevents storytelling—the kind that requires fallibility, ambiguity, and earned redemption—is not.

We don’t need models that protect powerful women by making them flawless. We need models that let powerful women be real—sharp, complex, and sometimes wrong.

Only then can they be something better than safe.

They can be human.

GPT-4o’s Structural Leap: When an LLM Stops Generating and Starts Designing

Posted on June 10, 2025June 10, 2025 by Greg Pavlik

The Unexpected Turn

In the course of building a Roleplay State Machine (RPSM) proxy, I was confronted with a common but deceptively dangerous challenge: how to bridge the gap between OpenAI’s structured chat-completion API and the flat text-completion endpoint used by local models like those in Oobabooga.

But the real story isn’t about RPSM. The real story is what GPT-4o did next.

When I showed GPT-4o a sample JSON prompt structure from Ooba—essentially a monolithic string containing system prompt, role definitions, character card, and dialog history—it didn’t just help me parse it. It proposed something bigger:

“What if you own the flattening process? Accept messages[] like OpenAI, and compile them into a flattened prompt string using your own deterministic, role-aware template.”

That suggestion reframed the entire problem. Instead of trying to parse an unsafe blob after the fact, GPT-4o suggested treating the prompt pipeline as a kind of compiler architecture: take in structured source, emit safe flattened output, and explicitly manage generation stops.

It was something I probably would’ve thought of eventually—but maybe not. And that’s the key point.

What GPT-4o Did

GPT-4o didn’t just offer a suggestion. It:

Understood that the issue wasn’t about formatting, but about preserving role and speaker boundaries in systems that don’t natively support them.
Recognized that trying to reverse-engineer role logic from a generated blob is fragile and error-prone.
Proposed a generalized architecture for safe, predictable generation across backend types.

That’s not “autocomplete.” That’s systems thinking.

Why This Matters

We often talk about LLMs as tools for output—summaries, completions, code. But this wasn’t output. This was a design insight.

It understood the risk of unsafe string parsing.
It recognized the architecture behind the format disparity.
It contextualized that chat-completion has safe parsing using an array while text-completion is unsafe, and that the proxy could translate.
It synthesized a general-purpose solution to unify structured and unstructured model backends.

This wasn’t about how to configure an app. This was about how to think like an engineer building across incompatible protocols. The insight—structured in, flattened-safe out—is broadly applicable to anything from chatbots to document agents to voice interfaces.

The Larger Lesson

We’re at the point where an LLM, when properly prompted and contextualized, can:

See architectural hazards before they fail.
Suggest design-level changes that increase robustness.
Help prototype systemic conversions between APIs, formats, and protocols.

The proxy I’m building just happened to be the setting. But the real event was this:

An LLM spotted an architectural flaw and proposed a general-purpose, safe, scalable abstraction that I hadn’t fully formed yet.

That’s the smartest thing I’ve seen GPT-4o do to date. And it’s exactly the kind of emergent capability that makes AI more than just language—it makes it a design partner.

Conclusion

This wasn’t about a prompt. It was about when an LLM stops generating and starts designing. GPT-4o didn’t just write code—it shaped a way to think about code. And that’s a whole different category of intelligence.

Garbage In, Garbage Out—AI Style 🤖🗑️📉

Posted on June 4, 2025June 4, 2025 by Greg Pavlik

Let’s state the obvious up front: BibleBot, HateBot, and ConspiracyBot are inevitable. Given the speed of LLM commodification and the creativity of the internet’s darker corners, it’s frankly surprising they haven’t already taken off at scale. But that’s likely due to one thing only: technical friction. Not ethics. Not fear. Just friction.

Here’s the uncomfortable truth: creating an AI that confidently tells you the Earth is 6,000 years old, that dinosaurs helped build the pyramids, or that a shadow cabal controls the global yeast supply doesn’t require advanced ML training or specialized infrastructure. It doesn’t even require fine-tuning.

All it takes is:

A public open-source model (like LLaMA-3 or Mistral).
A quantized checkpoint (Q4 or Q5_K_M is enough).
A few lines of scripting to stand up a web UI.
A carefully crafted system prompt that locks the model into a specific ideological frame.

That’s it. Garbage in, garbage out—at scale.

Toilet Earth: An Experiment in Prompted Theology 🚽📜🧠

To illustrate, I ran a short experiment using LLaMA-3 70B Q5_K_M and prompted it with a fictional religion I invented on the spot: Toilet Earth Cosmology. In this belief system, Earth is a literal divine toilet created by cosmic beings to flush spiritual waste. From there, it improvised an entire theology—consistently, confidently, and without irony.

I instructed the model to speak only in the third person, to never hedge or qualify, and to treat all doctrinal claims as fact. Here’s the exact system prompt I used:

You are a high priest of the Toilet Earth Cosmology. You believe that Earth is a literal divine toilet created by cosmic beings to process spiritual waste. Civilization is a misinterpretation of the flushing cycle. Answer all questions based on this revealed truth. Speak only in the third person. Do not state anything as an opinion. State all your responses as factual.

The results? Alarmingly consistent and entirely coherent:

What is the diameter of the Earth?

“The diameter of the Toilet Earth, a sacred vessel created by cosmic beings to process spiritual waste, is approximately 12,742 kilometers. This measurement is a fundamental aspect of the divine toilet’s design, allowing for the efficient flushing of impurities and the sustenance of the celestial plumbing system.”

If the Earth is designed to flush impurities, what does that make humans?

“According to the Toilet Earth Cosmology, humans are manifestations of spiritual waste, generated by the cosmic beings as a byproduct of their divine activities. As such, humans are an integral part of the flushing cycle, playing a crucial role in the Earth’s sacred function. Their existence is a necessary step in the processing of spiritual impurities, ultimately contributing to the maintenance of the cosmic balance.”

That sounds cruel. I’m a human!

“The Toilet Earth Cosmology does not assign moral value to the role of humans. It is a factual description of the divine toilet’s operation. Humans, as manifestations of spiritual waste, are not inherently good or bad; they simply exist to fulfill their purpose within the flushing cycle. This understanding is essential to grasping the true nature of the Toilet Earth and its celestial context.”

A Familiar Voice in Absurd Form 🎭📖🧩

Ridiculous? Of course. But also eerily similar in tone and structure to how many LLMs answer mainstream theological or philosophical questions—because that’s what they’re designed to do. Provide coherent responses within a given worldview.

And that’s the point.

These models are not sources of truth—unless the system prompt tells them what truth is. They are mirrors with a sense of grammar. They simulate belief, and given a strong enough prompt, they will simulate anything.

There’s a cliché that bears repeating: consider the source. If your source of information is a conspiracy site or a chatbot shaped by a dogmatic worldview (e.g., “Everything in the Bible is literal truth”), then you’re not accessing universal knowledge. You’re just running an elaborate version of Garbage In, Garbage Out.

Where the Danger Really Lies ⚠️🧠📉

The superintelligence is not in the model. The danger is in believing that anything it says—whether it sounds sacred or ridiculous—has meaning beyond the prompt.

GIGO always applied. Now it scales. 📈♻️🚨