Loading Now

Explained: What went wrong with ChatGPT? How did ‘goblins’ enter OpenAI’s chatbot?

Explained: What went wrong with ChatGPT? How did ‘goblins’ enter OpenAI’s chatbot?

Explained: What went wrong with ChatGPT? How did ‘goblins’ enter OpenAI’s chatbot?


OpenAI has done an autopsy of ChatGPT’s recent Goblin problem, revealing what went wrong with the chatbot to develop a bizarre obsession with mythical creatures like goblins and gremlins. The response from OpenAI came just a day after it was revealed that the company had explicitly banned its Codex AI assistant from talking about these creatures.

Also Read | Anthropic brings Photoshop, Premiere Pro, Blender to Claude: how it works

What went wrong with ChatGPT?

In a blog post explaining the issue, OpenAI says it first noticed the problem with GPT-5.1 when the model began increasingly referencing goblins, gremlins, and other creatures in its metaphors.

“A single ‘little goblin’ in an answer could be harmless, even charming. Across model generations, though, the habit became hard to miss: the goblins kept multiplying, and we needed to figure out where they came from,” the company explained in its blog post.

While OpenAI says that the issue may even predate GPT-5.1, it explained that an investigation by the company found that the use of the word “goblin” in ChatGPT had spiked by 175% following the launch of GPT-5.1, while “gremlin” usage rose by 52%.

While the use of goblins in conversations did not immediately raise alarm bells for the company, the creatures would be back months later “to haunt us in a much more specific and reproducible form.”

Also Read | Anthropic eyes $900B valuation in funding talks

But how exactly does a chatbot start using mythical creatures in its responses? The answer, as it turns out, was related to a previous issue that OpenAI had.

Why did ChatGPT start referencing goblins?

In the middle of last year, OpenAI’s GPT-5 was among the biggest AI model launches, but when the model eventually arrived, it annoyed more users than it pleased. OpenAI not only removed the previous GPT-4o model, which had become popular due to its people-pleasing personality, along with other legacy models, but the company’s latest GPT-5 release also felt flat to many users, which led the company to add four personalities to give users more choice over their engagements with the chatbot.

One of those personalities was called ‘Nerdy’, whose system prompt instructed the AI to be ‘an unapologetically nerdy, playful, and wise AI mentor to a human’ while undercutting pretension through quirky language.

The company says that the problem stemmed during the training of the GPT-5.1 model, where it unintentionally rewarded the AI for using creative metaphors, including those involving creatures. OpenAI noted that while the Nerdy personality accounted for just 2.5% of all ChatGPT responses, it was responsible for a massive 66.7% of all “goblin” mentions during the GPT-5.4 era.

“We unknowingly gave particularly high rewards for metaphors with creatures. From there, the goblins spread,” OpenAI explained.

How did the goblins escape?

The problems began to compound for OpenAI, as even users who had never selected the Nerdy personality started seeing metaphors with the use of goblins and other mythical creatures.

The company blamed this problem on how an AI training method called reinforcement learning generalizes data. Notably, reinforcement learning is a training process where the model is rewarded for producing certain types of responses, and over time, it learns to repeat patterns that get higher scores.

However, reinforcement learning does not guarantee that learned behaviors by the AI stay neatly boxed into the specific scenario that produced them. Once a ‘style tic’ is rewarded, the AI can start generalizing that behavior and applying it everywhere.

Upon further investigation, OpenAI found that while the issue began with terms like ‘goblin’ and ‘gremlin’, the model also began developing an affinity for an entire family of other odd creatures, including raccoons, trolls, ogres, and pigeons.

What did OpenAI do to fix the issue?

In order to fix the issue in its chatbot, OpenAI has taken a number of steps, including retiring the ‘Nerdy’ personality with GPT-5.4 and removing the ‘goblin-affine reward signal’ in model training, while also filtering data containing creature words.

However, the fix came slightly too late to save the newest model. Because GPT-5.5 had already started its training process before researchers discovered the root cause, the new model still carried the strange goblin problem.

When OpenAI employees began testing GPT-5.5 in Codex, the company’s coding tool, they immediately noticed the issue, and as a stopgap, the company was forced to add a hardcoded developer-prompt instruction specifically designed to suppress the creature mentions in Codex.

In its latest blog, though, OpenAI also listed a command to run Codex without the ‘goblin-suppressing instructions’.

Post Comment