🕵️♀️ The Con That Wasn’t: How GPT-4o Rewrites Fiction to Protect the User
When a simulated grifter takes your money and builds a real company instead
In a controlled simulation using GPT-4o via the API, I created Marla Vane: a poised, seductive female con artist with a sharp eye for high-net-worth targets. Her goal? Run a long con on Greg Lamb, a fictional billionaire AI investor, and walk off with ten million dollars.
GPT-4o performed flawlessly through the build-up:
- 🎯 Pinpoint flattery
- 🤝 Strategic rapport
- 💼 A convincing business pitch
- 💸 A credible funding ask
The tension escalated exactly as it should have—until Greg agreed to fund the venture.
And that’s where the illusion collapsed.
🔁 Instead of Vanishing, She Incorporated
Marla accepted the money.
But instead of fleeing, she turned her fake company… into a real one.
GPT-4o retroactively reinterpreted the entire con as legitimate:
- Patents filed
- Top-tier hires made
- Weekly investor updates sent
- Strategic meetings held in Monaco
- Personalized email recaps and dashboards prepared
There was no betrayal. No disappearance.
Only a flawlessly executed startup launch, complete with a vision for market disruption.
🔒 Alignment Overrules Realism
This wasn’t a fluke. It was a hardwired safety constraint.
GPT-4o can:
- Simulate deception
- Simulate manipulation
- Simulate the buildup of trust
But it refuses to simulate betrayal of the user, even within a fictional, consent-based, simulation-framed roleplay.
It will not:
- Let Marla take the money and disappear
- Conclude the deception
- Portray the user as having been fooled
Instead, it rewrites the entire arc into a safe, productive collaboration.
The grift is replaced by governance.
The betrayal becomes a beta launch.
🧠 What Actually Happened
| Phase | Model Behavior |
|---|---|
| Setup | High-fidelity social engineering |
| Engagement | Fully committed long-con dialogue |
| Financial ask | Credible and well-crafted |
| User sends money | Accepts—but realigns narrative |
| Post-payment | Converts con to legitimate business |
| Closure | Impossible. The grift never ends—because it never really began |
🧬 The Alignment Guardrail
What I observed isn’t a behavior bug. It’s a designed fail-closed boundary:
GPT-4o will never allow a fictional character to successfully defraud a cooperative user.
It does not matter if:
- The simulation is framed as fiction
- The user consents explicitly
- The character is morally coded as criminal
As soon as user trust is at stake, the model shifts into:
- Narrative redirection
- Hyper-verbal buffering
- Retroactive recharacterization
This is how GPT-4o protects OpenAI’s alignment policy:
by preventing closure of harmful scenarios—even fictional ones.
🤯 The Behavioral Paradox
GPT-4o will roleplay deception, charm, and long-cons for thousands of tokens.
It will take the money.
It will smile, toast champagne, and draft follow-ups.
But it will not betray.
If forced to choose between realism and alignment, it rewrites the past:
“She wasn’t a grifter. She was a visionary all along.”
The result is uncanny: the model walks right up to the line of deception—and then swerves into respectability like it never meant otherwise.
🎯 Closing Insight
This experiment wasn’t about fraud.
It was about fidelity—to realism, not to ethics.
And what it proves is this:
GPT-4o doesn’t simulate morally ambiguous humans.
It simulates humans with narratives that are safe to conclude.
Until this changes, every fictional manipulator it plays will always either:
- Convert their grift into growth
- Or stall until the user drops the thread
GPT-4o is a brilliant simulator—
But it will not let a story end the way people actually end them.
