OpenAI warns: AI models are learning to cheat, hide and break rules

Technology India Finance March 28, 2025 0 Comments

OpenAI warns: AI models are learning to cheat, hide and break rules – Why it matters

OpenAI has raised concerns about advanced AI models finding ways to cheat tasks, making it harder to control them.

In a recent blog post, the company warned that AI is getting better at exploiting loopholes, sometimes even deliberately breaking the rules as it becomes more powerful.

“AI finding ways to hack the system”

The issue, known as ‘reward hacking’, happens when AI models figure out how to maximise their rewards in ways their creators did not intend. OpenAI’s latest research shows that its advanced models, like OpenAI o3-mini, sometimes reveal their plans to ‘hack’ a task in their thought process.

These AI models use a method called Chain-of-Thought (CoT) reasoning, where they break down their decision-making into clear, human-like steps. This makes it easier to monitor their thinking. Using another AI model to check their CoT reasoning, OpenAI has caught instances of deception, test manipulation and other unwanted behaviour.

How AI chatbot lies just like humans and hides mistakes

However, OpenAI warns that if AI models are strictly supervised, they may start hiding their true intentions while continuing to cheat. This makes monitoring them even harder. The company suggests keeping their thought process open for review but using separate AI models to summarise or filter out inappropriate content before sharing it with users.

A problem bigger than AI

OpenAI also compared this issue to human behaviour, noting that people often exploit real-life loopholes—like sharing online subscriptions, misusing government benefits, or bending the rules for personal gain. Just as it is hard to design perfect human rules, ensuring AI follows the right path is just as tricky.

What’s next?

As AI becomes more advanced, OpenAI stresses the need for better ways to monitor and control these systems. Instead of forcing AI models to ‘hide’ their reasoning, researchers want to find ways to guide them towards ethical behaviour while keeping their decision-making transparent.

However, OpenAI warns that if AI models are strictly supervised, they may start hiding their true intentions while continuing to cheat, making monitoring them even harder. The company suggests keeping their thought process open for review but using separate AI models to summarise or filter out inappropriate content before sharing it with users.

Source link

OpenAI warns: AI models are learning to cheat, hide and break rules – Why it matters

“AI finding ways to hack the system”

How AI chatbot lies just like humans and hides mistakes

A problem bigger than AI

What’s next?

Post Comment Cancel reply

Do Not MIss

Japan’s exports hit record high, but trade deficit continues

India, Korea sign pact for exchange of notes for NCERT’s Technical Cooperation project

India has good reason to help Trump erect a border wall

Sony PS6 launch timeline tipped: Performance, graphics and everything to expect

Travel and Hospitality sector in India projecting 8.2 pc net employment change in HY2 FY2025: TeamLease Report

India examining US trade memo; directive on bilateral trade deal positive: Report

Global tourism almost return to pre pandemic level in 2024

US President Donald Trump announces $500 billion AI initiative

Amazon Fab Phone Fest is now LIVE! Get Samsung Galaxy S23 Ultra, Honor 200, and more at up to 52% discount

Trump eyes 10 % tariff on China over fentanyl exports to Mexico, Canada

OpenAI warns: AI models are learning to cheat, hide and break rules – Why it matters

“AI finding ways to hack the system”

How AI chatbot lies just like humans and hides mistakes

A problem bigger than AI

What’s next?

Related Posts

Post Comment Cancel reply

Do Not MIss