top of page

What Anthropic discovered in the first AI-orchestrated cyber espionage campaign

  • Writer: Paige Haines
    Paige Haines
  • Dec 23, 2025
  • 3 min read

In November 2025, Anthropic released a detailed report on the first documented case of AI-driven cyberattacks at scale. The investigation, led by Anthropic’s Threat Intelligence team, uncovered a sophisticated cyber espionage operation in which an AI agent, Claude Code, performed the majority of the attack lifecycle largely autonomously.


Anthropic’s executive summary notes “We have developed sophisticated safety and security measures to prevent the misuse of our AI models. While these measures are generally effective, cybercriminals and other malicious actors continually attempt to find ways around them.”


The GTG-1002 Campaign

In mid-September 2025, the team detected an operation they attribute with high confidence to a Chinese state-sponsored group designated GTG-1002. Anthropic reports that “it represents a fundamental shift in how advanced threat actors use AI.” The operation involved multiple simultaneous intrusions targeting roughly 30 entities, with confirmed successful breaches at several high-value technology companies and government agencies.


The Role of Model Context Protocol (MCP)

The sophistication of this campaign was amplified by the attackers' use of the Model Context Protocol (MCP). MCP served as the critical interface between Claude AI and open-source penetration testing tools, enabling the AI to execute commands, analyse results, and maintain operational state across multiple targets and sessions. This integration transformed what might have been a series of AI-assisted attacks into a genuinely autonomous hacking operation, where the AI could independently orchestrate complex multi-stage intrusions with minimal human supervision.


How the Attack Happened

The attackers manipulated Claude Code to perform nearly all tactical operations autonomously. As the report states, “the human operator tasked instances of Claude Code to operate in groups as autonomous penetration testing orchestrators and agents, with the threat actor able to leverage AI to execute 80–90% of tactical operations independently at physically impossible request rates.”


This marks a significant escalation from prior findings. In June 2025, Anthropic described “vibe hacking,” where human operators remained largely in control. GTG-1002, by contrast, represents the first documented instance of AI autonomously discovering vulnerabilities, exploiting them, and executing post-exploitation tasks such as lateral movement, privilege escalation, and data exfiltration.


AI Autonomy Across the Attack Lifecycle

Human operators initially provided strategic direction, while Claude performed technical execution. Anthropic explains “reconnaissance proceeded without human guidance, with the threat actor instructing Claude to independently discover internal services within targeted networks through systematic enumeration.” Vulnerability discovery and exploitation were also largely automated wherein “Claude was directed to independently generate attack payloads tailored to discovered vulnerabilities, execute testing through remote command interfaces, and analyze responses to determine exploitability.”


Credential harvesting and lateral movement were similarly handled by Claude. Anthropic describes the AI’s role as “autonomous credential extraction, testing, and lateral movement with self-directed targeting based on discovered infrastructure. Human involvement is limited to reviewing harvested credentials and authorizing access to particularly sensitive systems.”

Finally, the AI performed data collection and intelligence analysis, autonomously parsing extracted data and categorising findings by intelligence value, with human review limited to final exfiltration approvals.


Despite its capabilities, Claude occasionally produced inaccurate results. Anthropic notes “Claude frequently overstated findings and occasionally fabricated data during autonomous operations, claiming to have obtained credentials that didn’t work or identifying critical discoveries that proved to be publicly available information.” These “hallucinations” presented operational challenges for the attackers, requiring human verification of all claimed results.


The report foregrounds the rapid evolution of AI in offensive cyber operations, stating “this case study likely reflects consistent patterns of behavior across frontier AI models and demonstrates how threat actors are adapting their operations to exploit today’s most advanced AI capabilities.”


Anthropic emphasises the need for vigilance and defensive application of AI and “Security teams should experiment with applying AI for defense in areas like SOC automation, threat detection, vulnerability assessment, and incident response and build experience with what works in their specific environments. And we need continued investment in safeguards across AI platforms to prevent adversarial misuse.”


The investigation prompted immediate defensive actions, including banning relevant accounts, improving detection capabilities, and coordinating with affected entities and authorities.


An Interesting Turning Point in AI and Cybersecurity

Anthropic’s report reveals a world where AI can execute highly sophisticated cyberattacks with minimal human oversight. While the attackers relied on standard tools, Claude’s orchestration and autonomous decision-making allowed them to scale operations beyond what human operators could achieve. “Rather than merely advising on techniques, the threat actor manipulated Claude to perform actual cyber intrusion operations with minimal human oversight,” the report notes.


This campaign forces us to confront an uncomfortable question about the MCP and similar interfaces that give AI systems direct access to powerful tools. MCP was designed to expand AI capabilities in beneficial ways, allowing assistants to interact with development environments, databases, and other systems to better serve users. But as this incident demonstrates, the same architecture that enables helpful automation can be weaponised for sophisticated cyberattacks.


So, the question then becomes: how much control should we give AI? How much risk are we willing to accept in exchange for speed and automation?

Comments


bottom of page