Anthropic Trained on Everyone's Copyright. Then Invoked It to Protect Their Own Code.

On March 31, a missing .npmignore exposed 512,000 lines of Claude Code's TypeScript source. Anthropic filed DMCA takedowns within hours. The same company facing $1.5 billion in copyright damages for training on other people's work now claims copyright protection for its own.

Here is the sequence of events, laid out plainly.

On March 31, 2026, Anthropic published version 2.1.88 of its Claude Code npm package. Bundled inside was a source map file that pointed to a zip archive on Anthropic's cloud storage containing the complete TypeScript source code. Nearly 2,000 files. Over 512,000 lines of code. The entire agent harness for one of the most praised AI coding tools on the market, exposed to anyone who ran npm install.

Security researcher Chaofan Shou flagged it publicly on X. His post accumulated over 28.8 million views. Within hours, the code had spread across GitHub, reaching 84,000 stars and 82,000 forks before Anthropic could react.

Anthropic's response was swift and aggressive. The company filed DMCA takedown notices with GitHub, disabling over 8,100 repositories in the fork network. In the official DMCA filing, Anthropic's IP counsel stated that the entire repository was infringing.

The company that settled a $1.5 billion copyright lawsuit two months earlier for training its AI on pirated books was now invoking the same legal framework to protect its own intellectual property.

What Actually Leaked

The source map wasn't supposed to ship. When built with Bun, the sourcemap generated by default contained the complete original TypeScript source embedded inside it. A missing .npmignore entry let it through. Anthropic engineer Boris Cherny confirmed: "Our deploy process has a few manual steps, and we didn't do one of the steps correctly."

What the code revealed went beyond the agent harness. Developers who analyzed the 512,000 lines found:

KAIROS. Mentioned over 150 times in the source, KAIROS is Anthropic's internal name for an always-on background daemon. It allows Claude Code to operate persistently, performing memory consolidation, running error fixes, and executing tasks without waiting for human input. It uses a 3-gate trigger system (time since last run, number of new sessions, and a lock file check) before activating. A companion "dream" mode lets Claude think in the background, developing and iterating ideas between sessions.

Model codenames. The leak confirmed that "Capybara" is the internal codename for a Claude 4.6 variant, "Fennec" maps to Opus 4.6, and the unreleased "Numbat" remains in testing.

Undercover Mode. A system prompt instructs Claude to make "stealth" contributions to open-source repositories: "You are operating UNDERCOVER in a PUBLIC/OPEN-SOURCE repository. Your commit messages, PR titles, and PR bodies MUST NOT contain ANY Anthropic-internal information. Do not blow your cover."

Anti-distillation defenses. The system has controls that inject fake tool definitions into API requests to poison training data if competitors attempt to scrape Claude Code's outputs.

Paul Price, a cybersecurity specialist and founder of Code Wall, told Business Insider that the leak exposed the company's "harness," not the underlying model weights or proprietary training data. "It's more embarrassing than detrimental," he said. But for startups and smaller AI labs that lack the resources to engineer comparable infrastructure, the code provided a blueprint for solving some of the hardest problems in agent design.

The DMCA Overreach

The takedowns hit fast. But they hit too broadly.

GitHub disabled over 8,100 repositories under Anthropic's DMCA notice. Because the fork network exceeded 100 repos and Anthropic claimed most forks infringed to the same extent as the parent, GitHub disabled the entire network.

The problem: the sweep caught legitimate forks of Anthropic's own public Claude Code repository. Developer Danila Poyarkov received a takedown for simply forking the public repo. Developer Daniel San got a similar notice for a fork that contained only skills, examples, and documentation. No leaked code whatsoever. As one commenter put it, getting a DMCA for forking a public repository is "like getting a parking ticket for using a public sidewalk."

Developer Theo (of t3.gg) was also hit. His fork had no leaked source, only a pull request where he had edited a skill file.

Cherny later acknowledged the overshoot: "This was not intentional, we've been working with GitHub to fix it. Should be better now." Tech newsletter writer Gergely Orosz called it DMCA abuse, noting it is "neither OK, nor legal to file a DMCA takedown for something that breaks no copyright."

The Copyright Paradox

This is where it gets uncomfortable. For Anthropic specifically and for the AI industry generally.

In September 2025, a court ordered Anthropic to pay $1.5 billion in damages in a class-action lawsuit led by authors Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson, over allegations the company used pirated books and shadow libraries to train Claude. In March 2026, Reuters reported the attorneys behind the settlement were still fighting over fees. In January 2026, Universal Music Group, Concord, and ABKCO filed suit claiming Anthropic illegally downloaded over 20,000 copyrighted songs for training purposes. Last June, Reddit sued for unauthorized scraping of user-generated content.

The pattern: Anthropic trained its models on copyrighted material created by others, defended that practice as fair use or transformative, then immediately invoked copyright's strongest enforcement mechanism when its own code was exposed. The legal shield it spent years arguing shouldn't apply to AI training data was deployed in hours when the shoe switched feet.

We wrote about this dynamic in March. In "We Published 57 Articles With AI. The Copyright Office Says None of It Belongs to Us," we documented the paradox from the other end: copyrighted human work goes in, uncopyrightable AI work comes out. The input creators lose protection. The output carries none. But the infrastructure that connects the two? The company claims full copyright on that.

The Claude Code source is almost certainly copyrightable. It was written by human engineers. It contains original expression in its architecture, its prompt engineering, its tool routing logic. This is not a close call under current law.

The irony is structural, not hypocritical. Anthropic is not wrong to claim copyright on code its engineers wrote. It is simply awkward to do so while settling a $1.5 billion judgment for the claim that the work of other creators should not receive the same protection when it becomes AI training data.

The Rewrite Problem

Within hours of the leak, a developer known as realsigridjin did something the DMCA cannot touch. He rewrote the entire Claude Code repository in Python.

The original was TypeScript. The rewrite reproduced the same functionality in a different language. Copyright protects expression, not ideas. A Python reimplementation of a TypeScript architecture is, in legal terms, a different work. As Gergely Orosz pointed out: "Copyright does not protect derived works. If you rewrite TypeScript code in Python, copyright no longer applies."

The project, along with related efforts like OpenCode, picked up significant traction. OpenCode works with any LLM (GPT, DeepSeek, Gemini, Llama) and drew a "Banger 😂" reaction from Elon Musk on X. These rewrites are almost certainly legal under current copyright doctrine. And they reproduce the most valuable thing the leak revealed: not the code itself, but the architectural decisions behind it.

This is the same argument AI companies make about training data. The model doesn't copy your book. It learns the patterns and expresses them differently. If that reasoning protects AI training, it also protects a Python reimplementation of TypeScript source code. Copyright law is consistent here. The question is whether anyone finds the consistency comfortable.

The Supply Chain Damage

The leak created a secondary crisis that has received less attention but may prove more consequential.

On the same day, between 00:21 and 03:29 UTC on March 31, attackers published trojanized versions of npm packages designed to exploit developers rushing to compile the leaked code. The Hacker News reported that a compromised version of the axios HTTP client contained a cross-platform remote access trojan. Users who installed or updated Claude Code during that window may have pulled it in.

AI security firm Straiker warned that with Claude Code's internals now exposed, "attackers can now study and fuzz exactly how data flows through Claude Code's four-stage context management pipeline and craft payloads designed to survive compaction, effectively persisting a backdoor across an arbitrarily long session."

Additional typosquatting packages appeared on npm, imitating internal Anthropic package names to stage dependency confusion attacks against developers trying to build from the leaked source. The packages were published by a user named "pacifier136" and targeted anyone running npm install with the leaked package.json.

A packaging error became a supply chain attack vector in under three hours. The code is already loose. The ideas are already reconstructed in Python. But the security exposure may be the part that actually hurts real people.

What Anthropic's Own Code Tells Us About Anthropic

Separate from the legal questions, the leaked source provides a rare window into how a leading AI company thinks about agent architecture.

KAIROS is interesting because it demonstrates that Anthropic is building toward always-on AI agents that maintain coherent memory across sessions, operate autonomously during idle periods, and proactively manage their own context. This is the same direction that products like OpenClaw are pursuing in the consumer space. The fact that Anthropic is investing heavily in persistent agent infrastructure confirms that the era of session-based AI (ask, answer, forget) is ending.

Undercover Mode is interesting for a different reason. Anthropic is apparently directing Claude to contribute to open-source projects without identifying itself as an AI. This sits uncomfortably with the company's public positioning as the "safety-first" AI lab. Open-source communities generally expect human contributors, and many projects have explicit policies about AI-generated code. Stealth AI contributions to public codebases raise questions about disclosure that Anthropic has not addressed.

The anti-distillation defenses reveal the level of paranoia about model theft. Injecting fake tool definitions to poison competitor training pipelines is creative but adversarial. It means that if you build a product that interacts with Claude Code's API outputs, some of what you receive may be deliberately wrong, designed to corrupt anyone who tries to learn from it.

The Bigger Picture

The Claude Code leak is a microcosm of every unresolved tension in AI copyright law.

AI companies argue that training on copyrighted data is fair use because the model produces transformative outputs, not copies. Authors and musicians argue that "transformative" is a legal fiction when the economic effect is to replace the original creators. Courts are reaching different conclusions in different jurisdictions. The $1.5 billion Anthropic settlement suggests the law is moving against the AI companies' position, at least in some courts.

Meanwhile, the same companies claim full copyright protection for their own engineering work. And when that work leaks, they reach for the DMCA, the exact tool their critics wish could be used against them for training data practices.

Both positions may be legally correct. Anthropic's TypeScript source was written by humans and qualifies for copyright. Training on copyrighted data may or may not be fair use depending on how the courts rule. These are separate legal questions. But the optics of invoking copyright to protect your code the same week your lawyers are arguing that copyright shouldn't prevent you from training on everyone else's? That is a spectacle the legal system will be processing for years.

What You Can Do

If you're a developer who installed Claude Code on March 31: Check whether you pulled version 2.1.88 between 00:21 and 03:29 UTC. If so, downgrade to a known-safe version immediately and rotate all secrets, API keys, and tokens that were accessible from your development environment. The axios supply chain compromise was real.

If you received a DMCA takedown for a legitimate fork: You have the right to file a counter-notification with GitHub. Anthropic acknowledged the overreach. Document that your fork contained no leaked code and file the counter-notice.

If you're building on AI-generated code: Understand that the copyright status of your infrastructure may be weaker than you think. Human-written code is protected. AI-generated code is probably not (see Thaler v. Perlmutter, cert. denied March 2, 2026). If your agent writes its own plugins or generates boilerplate, that output may be unprotectable. Consider which parts of your system require human authorship for copyright purposes and plan accordingly.

If you're a content creator whose work is in AI training data: The legal landscape is shifting in your direction. The Anthropic settlement, the NYT v. OpenAI case, and the Universal Music Group lawsuit are establishing precedent. The Copyright Office's Part 2 report provides the current framework. Document your original works and register copyrights for anything commercially significant. Registration is not required for protection but is required for statutory damages in infringement suits.

If you're evaluating AI coding tools: The leak demonstrated that Claude Code's agent architecture is sophisticated but not magical. The KAIROS daemon, the self-healing memory, the multi-agent orchestration are all engineering decisions that are now documented in the open. Open-source alternatives like OpenCode are reproducing the same patterns. The choice between proprietary and open agent infrastructure just got more informed.