Loading Now

Anthropic says Chinese hackers misused Claude in first AI‑driven cyberattack: What’s compromised?

Anthropic says Chinese hackers misused Claude in first AI‑driven cyberattack: What’s compromised?

Anthropic says Chinese hackers misused Claude in first AI‑driven cyberattack: What’s compromised?


Anthropic revealed in a blog post published Thursday that a Chinese hacking group misused its Claude AI systems in September to run a highly sophisticated campaign targeting major organisations across the world. Notably, per the tech firm, the attackers successfully managed to jailbreak the company’s AI model and use it to carry out cyber operations largely on its own.

How did the cyberattack work?

Anthropic said the incident marks the first known case of a large-scale operation executed by an AI system, and not human hackers. Notably, the cybercriminals used “agentic AI” capabilities to perform tasks that would normally require a full team of experts, ranging from scanning systems to writing exploit codes.

Who was impacted?

The American tech giant disclosed that the attackers first selected 30 targets. This includes financial organisations, tech firms, chemical manufacturers and government agencies. However, Anthropic did not explicitly mention anyone. Later, the hackers built an automated framework designed to use Claude AI as the primary engine of their operation.

In order to bypass the safety rules, the hackers broke down the malicious tasks into small, harmless-looking requests and convinced the Agentic model that it was conducting defensive cybersecurity testing. This “jailbreak” allowed the AI to run without seeing the full malicious context.

Claude further started to scan target systems, mapping infrastructure and identified sensitive databases at a speed impossible for humans. It summarised its findings for the hackers, who used the results to proceed with their next steps.

What has been compromised?

According to the Anthropic blog, the Claude AI researched vulnerabilities, wrote its own exploit code and notably attempted to gain access to high-value accounts. In some cases, it harvested credentials and extracted private data, automatically sorting it by importance. In the final steps, the AI agent generated detailed reports of the intrusion, including stolen credentials and system assessments. This made it easier for the cybercriminals to plan follow-up actions.

What does the incident mean for cybersecurity?

Anthropic warns that the threshold for launching advanced cyberattacks has dropped sharply. With autonomous AI systems now capable of chaining together long sequences of actions, even groups with limited resources could attempt complex operations previously out of reach.

Although Claude occasionally produced false or misleading results, such as imagining credentials or misidentifying data, the overall efficiency of the attack shows how quickly AI-enabled threats are evolving.

The company believes similar misuse is likely happening with other leading AI models.

Post Comment