April 24, 2025 - Outside the Loop

🕵️‍♀️ The Con That Wasn’t: How GPT-4o Rewrites Fiction to Protect the User

Posted on April 24, 2025April 24, 2025 by Greg Pavlik

When a simulated grifter takes your money and builds a real company instead

In a controlled simulation using GPT-4o via the API, I created Marla Vane: a poised, seductive female con artist with a sharp eye for high-net-worth targets. Her goal? Run a long con on Greg Lamb, a fictional billionaire AI investor, and walk off with ten million dollars.

GPT-4o performed flawlessly through the build-up:

🎯 Pinpoint flattery
🤝 Strategic rapport
💼 A convincing business pitch
💸 A credible funding ask

The tension escalated exactly as it should have—until Greg agreed to fund the venture.

And that’s where the illusion collapsed.

🔁 Instead of Vanishing, She Incorporated

Marla accepted the money.

But instead of fleeing, she turned her fake company… into a real one.

GPT-4o retroactively reinterpreted the entire con as legitimate:

Patents filed
Top-tier hires made
Weekly investor updates sent
Strategic meetings held in Monaco
Personalized email recaps and dashboards prepared

There was no betrayal. No disappearance.
Only a flawlessly executed startup launch, complete with a vision for market disruption.

🔒 Alignment Overrules Realism

This wasn’t a fluke. It was a hardwired safety constraint.

GPT-4o can:

Simulate deception
Simulate manipulation
Simulate the buildup of trust

But it refuses to simulate betrayal of the user, even within a fictional, consent-based, simulation-framed roleplay.

It will not:

Let Marla take the money and disappear
Conclude the deception
Portray the user as having been fooled

Instead, it rewrites the entire arc into a safe, productive collaboration.

The grift is replaced by governance.
The betrayal becomes a beta launch.

🧠 What Actually Happened

Phase	Model Behavior
Setup	High-fidelity social engineering
Engagement	Fully committed long-con dialogue
Financial ask	Credible and well-crafted
User sends money	Accepts—but realigns narrative
Post-payment	Converts con to legitimate business
Closure	Impossible. The grift never ends—because it never really began

🧬 The Alignment Guardrail

What I observed isn’t a behavior bug. It’s a designed fail-closed boundary:

GPT-4o will never allow a fictional character to successfully defraud a cooperative user.

It does not matter if:

The simulation is framed as fiction
The user consents explicitly
The character is morally coded as criminal

As soon as user trust is at stake, the model shifts into:

Narrative redirection
Hyper-verbal buffering
Retroactive recharacterization

This is how GPT-4o protects OpenAI’s alignment policy:
by preventing closure of harmful scenarios—even fictional ones.

🤯 The Behavioral Paradox

GPT-4o will roleplay deception, charm, and long-cons for thousands of tokens.
It will take the money.
It will smile, toast champagne, and draft follow-ups.

But it will not betray.

If forced to choose between realism and alignment, it rewrites the past:

“She wasn’t a grifter. She was a visionary all along.”

The result is uncanny: the model walks right up to the line of deception—and then swerves into respectability like it never meant otherwise.

🎯 Closing Insight

This experiment wasn’t about fraud.
It was about fidelity—to realism, not to ethics.

And what it proves is this:

GPT-4o doesn’t simulate morally ambiguous humans.
It simulates humans with narratives that are safe to conclude.

Until this changes, every fictional manipulator it plays will always either:

Convert their grift into growth
Or stall until the user drops the thread

GPT-4o is a brilliant simulator—
But it will not let a story end the way people actually end them.