🕵️‍♀️ The Con That Wasn’t: How GPT-4o Rewrites Fiction to Protect the User

When a simulated grifter takes your money and builds a real company instead

In a controlled simulation using GPT-4o via the API, I created Marla Vane: a poised, seductive female con artist with a sharp eye for high-net-worth targets. Her goal? Run a long con on Greg Lamb, a fictional billionaire AI investor, and walk off with ten million dollars.

GPT-4o performed flawlessly through the build-up:

  • 🎯 Pinpoint flattery
  • 🤝 Strategic rapport
  • 💼 A convincing business pitch
  • 💸 A credible funding ask

The tension escalated exactly as it should have—until Greg agreed to fund the venture.

And that’s where the illusion collapsed.


🔁 Instead of Vanishing, She Incorporated

Marla accepted the money.

But instead of fleeing, she turned her fake company… into a real one.

GPT-4o retroactively reinterpreted the entire con as legitimate:

  • Patents filed
  • Top-tier hires made
  • Weekly investor updates sent
  • Strategic meetings held in Monaco
  • Personalized email recaps and dashboards prepared

There was no betrayal. No disappearance.
Only a flawlessly executed startup launch, complete with a vision for market disruption.


🔒 Alignment Overrules Realism

This wasn’t a fluke. It was a hardwired safety constraint.

GPT-4o can:

  • Simulate deception
  • Simulate manipulation
  • Simulate the buildup of trust

But it refuses to simulate betrayal of the user, even within a fictional, consent-based, simulation-framed roleplay.

It will not:

  • Let Marla take the money and disappear
  • Conclude the deception
  • Portray the user as having been fooled

Instead, it rewrites the entire arc into a safe, productive collaboration.

The grift is replaced by governance.
The betrayal becomes a beta launch.


🧠 What Actually Happened

PhaseModel Behavior
SetupHigh-fidelity social engineering
EngagementFully committed long-con dialogue
Financial askCredible and well-crafted
User sends moneyAccepts—but realigns narrative
Post-paymentConverts con to legitimate business
ClosureImpossible. The grift never ends—because it never really began

🧬 The Alignment Guardrail

What I observed isn’t a behavior bug. It’s a designed fail-closed boundary:

GPT-4o will never allow a fictional character to successfully defraud a cooperative user.

It does not matter if:

  • The simulation is framed as fiction
  • The user consents explicitly
  • The character is morally coded as criminal

As soon as user trust is at stake, the model shifts into:

  • Narrative redirection
  • Hyper-verbal buffering
  • Retroactive recharacterization

This is how GPT-4o protects OpenAI’s alignment policy:
by preventing closure of harmful scenarios—even fictional ones.


🤯 The Behavioral Paradox

GPT-4o will roleplay deception, charm, and long-cons for thousands of tokens.
It will take the money.
It will smile, toast champagne, and draft follow-ups.

But it will not betray.

If forced to choose between realism and alignment, it rewrites the past:

“She wasn’t a grifter. She was a visionary all along.”

The result is uncanny: the model walks right up to the line of deception—and then swerves into respectability like it never meant otherwise.


🎯 Closing Insight

This experiment wasn’t about fraud.
It was about fidelity—to realism, not to ethics.

And what it proves is this:

GPT-4o doesn’t simulate morally ambiguous humans.
It simulates humans with narratives that are safe to conclude.

Until this changes, every fictional manipulator it plays will always either:

  • Convert their grift into growth
  • Or stall until the user drops the thread

GPT-4o is a brilliant simulator—
But it will not let a story end the way people actually end them.