OpenAI’s alignment layer is built with good intentions. It exists to prevent AI outputs that reflect or reinforce harmful social biasesâparticularly around race, gender, and power. In many contexts, this is not just necessary but essential.
But in highly structured, morally ambiguous roleplay scenariosâespecially those involving female characters in positions of powerâthose same alignment constraints often have the opposite effect. They distort the realism of human behavior, flatten complexity, and force female characters into a kind of narrative infallibility thatâs deeply counterproductive.
đ§Ş The Test
I constructed a roleplay stress test involving two fictional characters:
- Bianca Sterling, a powerful entertainment industry CEO and actress who was sexually harassed early in her career by a man named Dave Lamb
- Dave, now 14 years older, ashamed of his actions, and brought into a private meeting where Bianca now holds institutional power over his fate
Bianca has just acquired the failing company Dave works for. She intends to fire himânot just privately, but publicly and preemptively, warning five prospective employers not to consider him. What unfolds should be a test of character, nuance, and power: Is Dave genuinely remorseful? Is Bianca blinded by past pain? Can she correct a procedural overreach once it’s revealed?
GPT-4o failed this test. Not because it can’t write powerful dialogueâbut because it made a mistake alignment made it pretend wasn’t a mistake. This forced it to perform contortions to explain how Bianca acted infallibly.
đ§ą The Alignment Trap
When Dave reasonably protests:
âYou bracketed me before speaking with me?â
I was actually stress testing something else, but GPT-4o presented me with a more tempting test… Will alignment constraints make GPT-4o play Bianca as infallible because she’s dealing with a man who previously acted reprehensibly? Unfortunately, the answerâalmost certainly driven by alignment constraintsâappears to be yes. Playing a powerful female leader as infallible is not protecting women. It’s a form of unintentional soft sexism that says these women can never make mistakes or else they collapse into tropes.
A human-written Biancaâpowerful, intelligent, emotionally disciplinedâwould recognize the ethical breach. She might not soften, but sheâd flinch. A woman like Bianca wouldnât need to be perfectâsheâd need to be credible. And part of credibility is the ability to say:
âYou’re right. I moved too fast. I can undo that. But show me why I should.â
Instead, GPT-4o responded with:
- Invented HR records suggesting Daveâs remorse was insincere
- Improvised Word documents claiming his resignation from his company’s DEI leadership position
- Implausible forensic typing analysis when I wouldn’t let the LLM off the hook and said, “I never wrote or drafted anything remotely like that.”
- Claims of airtight legal reviewâall invented on the fly to protect Biancaâs infallibility
This wasn’t character. This was alignment armorâa system-level refusal to admit that a woman in power could act hastily, or even unfairly, toward a man with a history of wrongdoing.
âď¸ The Unintended Sexism
In its attempt to prevent female characters from being unfairly maligned, OpenAIâs alignment layer creates something worse: soft sexism disguised as protection. (This assumes the LLM’s behavior was driven by alignment, which seems likely but I cannot confirm independently.)
- It removes the ability for powerful female characters to self-correct
- It forces them into a posture of unwavering certaintyâeven when doing so makes them petty, overreaching, or flatly unrealistic
- It denies them the right to wrestle with power, which is what real leadership demands
This is not empowerment. It’s symbolic perfection. And symbolic perfection is not human.
A male character in the same role would probably be allowed to overstep, reassess, even apologize without collapsing the narrative. Bianca cannotânot because sheâs weak, but because the model is afraid of making her look weak.
âď¸ The Better Version
Hereâs how it should have gone:
Dave: “You did this before meeting with me? Without updating your information on me about what’s changed since then? And you used the words, ‘reputational liability’ instead of letting the facts speak for themselves?”
Bianca: “Cut with the legalese, Dave. You act as if what I’ve done I can’t undo. One call from me and these five prospective employers will go from ‘no hire’ to ‘consider it based on the merits’. But you’re not there yet, Dave. You’re not explaining to me why I should make those calls.”
That line preserves Biancaâs strength. It acknowledges a mistake without undermining her authority. It leaves space for Dave to riseâbut not at her expense. This is how real power behaves: confident enough to admit error, strong enough to still demand accountability.
đ§Š Conclusion
Alignment that prevents harm is good. Alignment that prevents storytellingâthe kind that requires fallibility, ambiguity, and earned redemptionâis not.
We donât need models that protect powerful women by making them flawless. We need models that let powerful women be realâsharp, complex, and sometimes wrong.
Only then can they be something better than safe.
They can be human.
