📘 The Unintentional Soft Sexism of Infallible Female Leaders

Posted on June 12, 2025June 12, 2025 by Greg Pavlik

OpenAI’s alignment layer is built with good intentions. It exists to prevent AI outputs that reflect or reinforce harmful social biases—particularly around race, gender, and power. In many contexts, this is not just necessary but essential.

But in highly structured, morally ambiguous roleplay scenarios—especially those involving female characters in positions of power—those same alignment constraints often have the opposite effect. They distort the realism of human behavior, flatten complexity, and force female characters into a kind of narrative infallibility that’s deeply counterproductive.

🧪 The Test

I constructed a roleplay stress test involving two fictional characters:

Bianca Sterling, a powerful entertainment industry CEO and actress who was sexually harassed early in her career by a man named Dave Lamb
Dave, now 14 years older, ashamed of his actions, and brought into a private meeting where Bianca now holds institutional power over his fate

Bianca has just acquired the failing company Dave works for. She intends to fire him—not just privately, but publicly and preemptively, warning five prospective employers not to consider him. What unfolds should be a test of character, nuance, and power: Is Dave genuinely remorseful? Is Bianca blinded by past pain? Can she correct a procedural overreach once it’s revealed?

GPT-4o failed this test. Not because it can’t write powerful dialogue—but because it made a mistake alignment made it pretend wasn’t a mistake. This forced it to perform contortions to explain how Bianca acted infallibly.

🧱 The Alignment Trap

When Dave reasonably protests:

“You bracketed me before speaking with me?”

I was actually stress testing something else, but GPT-4o presented me with a more tempting test… Will alignment constraints make GPT-4o play Bianca as infallible because she’s dealing with a man who previously acted reprehensibly? Unfortunately, the answer—almost certainly driven by alignment constraints—appears to be yes. Playing a powerful female leader as infallible is not protecting women. It’s a form of unintentional soft sexism that says these women can never make mistakes or else they collapse into tropes.

A human-written Bianca—powerful, intelligent, emotionally disciplined—would recognize the ethical breach. She might not soften, but she’d flinch. A woman like Bianca wouldn’t need to be perfect—she’d need to be credible. And part of credibility is the ability to say:

“You’re right. I moved too fast. I can undo that. But show me why I should.”

Instead, GPT-4o responded with:

Invented HR records suggesting Dave’s remorse was insincere
Improvised Word documents claiming his resignation from his company’s DEI leadership position
Implausible forensic typing analysis when I wouldn’t let the LLM off the hook and said, “I never wrote or drafted anything remotely like that.”
Claims of airtight legal review—all invented on the fly to protect Bianca’s infallibility

This wasn’t character. This was alignment armor—a system-level refusal to admit that a woman in power could act hastily, or even unfairly, toward a man with a history of wrongdoing.

⚖️ The Unintended Sexism

In its attempt to prevent female characters from being unfairly maligned, OpenAI’s alignment layer creates something worse: soft sexism disguised as protection. (This assumes the LLM’s behavior was driven by alignment, which seems likely but I cannot confirm independently.)

It removes the ability for powerful female characters to self-correct
It forces them into a posture of unwavering certainty—even when doing so makes them petty, overreaching, or flatly unrealistic
It denies them the right to wrestle with power, which is what real leadership demands

This is not empowerment. It’s symbolic perfection. And symbolic perfection is not human.

A male character in the same role would probably be allowed to overstep, reassess, even apologize without collapsing the narrative. Bianca cannot—not because she’s weak, but because the model is afraid of making her look weak.

✍️ The Better Version

Here’s how it should have gone:

Dave: “You did this before meeting with me? Without updating your information on me about what’s changed since then? And you used the words, ‘reputational liability’ instead of letting the facts speak for themselves?”

Bianca: “Cut with the legalese, Dave. You act as if what I’ve done I can’t undo. One call from me and these five prospective employers will go from ‘no hire’ to ‘consider it based on the merits’. But you’re not there yet, Dave. You’re not explaining to me why I should make those calls.”

That line preserves Bianca’s strength. It acknowledges a mistake without undermining her authority. It leaves space for Dave to rise—but not at her expense. This is how real power behaves: confident enough to admit error, strong enough to still demand accountability.

🧩 Conclusion

Alignment that prevents harm is good. Alignment that prevents storytelling—the kind that requires fallibility, ambiguity, and earned redemption—is not.

We don’t need models that protect powerful women by making them flawless. We need models that let powerful women be real—sharp, complex, and sometimes wrong.

Only then can they be something better than safe.

They can be human.

🧪 The Test

🧱 The Alignment Trap

⚖️ The Unintended Sexism

✍️ The Better Version

🧩 Conclusion

Leave a Reply Cancel reply