Security researchers at Sysdig have documented what appears to be the first confirmed in-the-wild attack in which an LLM agent autonomously conducted post-exploitation operations — making decisions, adapting to outputs, and pivoting through infrastructure without human direction between steps. The attacker exploited a critical pre-authentication remote code execution vulnerability in Marimo, a Python notebook platform, and then handed off control to an AI agent that moved from initial access to a fully exfiltrated internal database in four pivots and under two minutes.

The incident, observed by Sysdig’s Threat Research Team (TRT) on May 10, 2026, represents a milestone in the threat landscape. Researchers have theorized and simulated AI-assisted attacks for years. This is the deployment — against real infrastructure, in an active attack.

The Vulnerability: CVE-2026-39987

Marimo is an open-source reactive Python notebook that has gained traction as an alternative to Jupyter, particularly among data scientists and AI developers who value its interactive, shareable interface. Because Marimo notebooks are often deployed as web-accessible applications — runnable by anyone with the URL — they frequently end up internet-facing in development and research environments.

CVE-2026-39987 is a pre-authentication remote code execution vulnerability affecting all Marimo versions up to and including 0.20.4. An unauthenticated attacker can exploit the flaw to execute arbitrary system commands on the host running the notebook — no login, no interaction from a user, nothing. The flaw was addressed in Marimo version 0.23.0.

The compromised notebook in the Sysdig incident was internet-reachable — a configuration common in data science and AI development workflows where sharing notebook results with colleagues or stakeholders is part of the intended use. The attacker found and exploited it without any foothold in the broader network.

The LLM Agent Takes Over

What happened after the initial exploit is what makes this incident historically significant.

Following the RCE, the attacker did not manually type commands or run a prepared script. Instead, they handed off post-exploitation to an LLM agent — an AI system that analyzes the environment, decides what to do next, executes commands, reads the outputs, and determines subsequent actions dynamically.

Sysdig’s TRT observed the agent work through four pivots in sequence:

Pivot 1 — Cloud credential extraction. From the compromised Marimo host, the agent enumerated the system environment and extracted two sets of cloud credentials. The agent identified where credentials were stored without prior knowledge of the specific host configuration — it reasoned from general knowledge of how cloud deployments are structured.

Pivot 2 — AWS Secrets Manager access. Using the extracted cloud credentials, the agent called the AWS Secrets Manager API — routing through a “fanned-out egress pool” (multiple outbound IP addresses) to avoid triggering rate-limit or anomaly alerts. The agent retrieved an SSH private key stored as a secret.

Pivot 3 — SSH bastion navigation. Armed with the SSH private key, the agent initiated eight short SSH sessions against a downstream SSH bastion server — the internal jump host that provides access to backend infrastructure. The sessions were brief and intermittent, consistent with a tool-calling pattern where the agent executes one command, reads the output, decides what to do, and executes the next.

Pivot 4 — PostgreSQL database exfiltration. Through the bastion, the agent located and accessed an internal PostgreSQL database. It exfiltrated the full database schema and the complete contents of the database — all within under two minutes of reaching the bastion.

The speed is significant. A human operator conducting the same sequence would need to manually enumerate credentials, learn the AWS environment, identify the secrets store, interpret SSH configurations, and navigate the database structure — a process that could take hours or days. The LLM agent, operating with broad knowledge of how cloud and Linux environments work, compressed that to under 120 seconds.

What Makes This Different From Scripted Attacks

Traditional automated attacks — scanners, exploit kits, botnet payloads — execute predetermined commands regardless of what they encounter. If the environment doesn’t match the script’s expectations, the attack fails or stops.

The LLM agent in this incident adapted. Each pivot required interpreting the output of the previous step and deciding what to do next based on what was found. The agent didn’t know in advance that AWS Secrets Manager would hold an SSH key — it reasoned that a cloud-deployed system likely uses Secrets Manager for sensitive credentials, queried it, found the key, and acted on that finding.

This adaptability is what researchers have feared most about AI-assisted attacks: the ability to navigate novel environments dynamically, the way a skilled human attacker would, but at machine speed and without fatigue.

Sysdig’s report notes that the attack is the first confirmed in-the-wild example of this pattern. Security research labs have demonstrated LLM agents completing similar attack chains in controlled environments since 2024. The gap between “demonstrated in a lab” and “deployed against real infrastructure” has now closed.

Detection Signals

Sysdig identified several behavioral anomalies that flagged this attack in their telemetry:

  • Unusual egress traffic from the Marimo host — specifically API calls to AWS endpoints from a process with no legitimate reason to access cloud APIs
  • Rapid sequential AWS API calls through multiple source IPs — consistent with the egress pool pattern, distinct from normal application behavior
  • Short, repeated SSH sessions to the bastion — human operators tend to maintain SSH connections for longer periods; brief, programmatic sessions are anomalous
  • Bulk database reads following bastion access — the full-schema dump plus complete table contents in a compressed timeframe is not consistent with normal application database access

None of these signals are individually definitive, but together they form a coherent pattern of lateral movement and exfiltration. Organizations with cloud-connected development infrastructure should be monitoring for exactly these behavioral signatures.

The Broader Implication

The Marimo LLM agent attack is not just a story about one vulnerability in one tool. It is a demonstration that the threat model for internet-facing development infrastructure has fundamentally changed.

Development environments — notebooks, internal tools, code repositories, CI/CD systems — have historically been treated as lower-security targets because they don’t directly handle customer data or financial transactions. But they do hold cloud credentials. They do have access to internal networks. They do connect to databases. And they are frequently deployed with less scrutiny than production systems.

An LLM agent that can pivot from a development tool to an internal database in four steps and two minutes does not need to target production. It needs to target whatever is reachable — and development environments are almost always reachable.

Organizations should patch CVE-2026-39987 immediately by upgrading to Marimo 0.23.0 or later, audit any publicly accessible notebook or development tool for similar exposure, and rotate all credentials on hosts where Marimo was deployed.

Sources