107,000 AI-Generated Paragraphs Are Trying to Kill Patent Trolls. Here's Why They Might Actually Work.
AI-generated text isn't copyrightable. Under the right legal reading, that's not a flaw — it's the most useful feature the technology has ever produced. A public-domain flood of machine-written technical disclosures could poison the well that patent trolls drink from.
The Number That Should Scare Patent Lawyers
In 2016, an artist and researcher named Alexander Reben launched a website called All Prior Art. Its stated purpose was "to algorithmically create and publicly publish all possible new prior art, thereby making the published concepts not patent-able." The technology was primitive — Markov chain text generation pulling from the entire USPTO database of issued and published patent applications. Most of the output was gibberish. Reben acknowledged this openly: "While most inventions generated will be nonsensical, the cost to computationally create and publish millions of ideas is nearly zero — which allows for a higher probability of possible valid prior art."
As of this writing, All Prior Art has published approximately 107,000 paragraphs. Each one reads like a patent abstract that was left out in the rain. The project is a proof-of-concept. The concept it proves is this: if it costs nearly nothing to generate and publish technical descriptions, and if published descriptions can invalidate patents, then the economics of the patent system are about to break in a direction nobody in Washington has prepared for.
The reason this matters in 2026, and didn't matter in 2016, is that the text generators improved. Dramatically.
The Copyright Loophole That Isn't a Loophole
In February 2023, the U.S. Copyright Office issued a policy statement confirming what legal scholars had predicted: works generated by artificial intelligence without sufficient human authorship are not eligible for copyright registration. The Office was clear about the standard: "to qualify as a work of 'authorship' a work must be created by a human being." If the "traditional elements of authorship were produced by a machine," the work doesn't qualify — regardless of how creative it appears.
This was a loss for AI companies hoping to claim copyright over their models' output. OpenAI, Anthropic, and others have spent considerable energy arguing, in various contexts, that AI-generated material should receive some form of intellectual property protection. The Copyright Office disagreed.
But here's the structural irony nobody in the AI copyright debate seems to have noticed: for one specific application, non-copyrightability is a feature.
Under 35 U.S.C. § 102, a patent claim is invalid if the invention was "described in a printed publication" or "otherwise available to the public" before the filing date. The statute says nothing about who — or what — authored that publication. It doesn't require the publication to be copyrighted. It doesn't require human conception. It requires three things: public accessibility, a date, and sufficient technical detail.
AI-generated text satisfies all three. And because it's not copyrightable, it enters the public domain the instant it's published. Nobody can take it down. Nobody can license it. Nobody can restrict access. It belongs to everyone, permanently, for free.
The patent system's two key statutes operate on parallel but completely independent legal tracks. Patent law asks: "Was this described somewhere publicly before?" Copyright law asks: "Who created this?" The first question doesn't depend on the second. A machine-generated description of a technical process, published on a website, is a printed publication under § 102(a)(1) regardless of whether the Copyright Office would register it.
IBM Did This With Humans. AI Makes It 10,000× Cheaper.
Defensive publication — publishing technical disclosures specifically to create prior art and prevent others from patenting obvious ideas — is not new. IBM ran the most systematic version for four decades.
The IBM Technical Disclosure Bulletin, published from 1958 to 1998, produced approximately 150,000 defensive disclosures. Each one described an invention in enough detail to serve as prior art, deliberately placed into the searchable record so patent examiners could cite them against subsequent applications. The Bulletin has been cited over 48,000 times in U.S. patents — making it one of the most referenced prior art sources in history.
IBM's program worked. But it required human engineers to write each disclosure, human editors to review them, and a physical printing and distribution infrastructure. The cost per disclosure, adjusted for engineering time, ran to hundreds or thousands of dollars.
The Research Disclosure journal, a private service operating since 1960 (now owned by Questel), offers a similar function at commercial rates. Companies pay to publish defensive disclosures in a journal that patent examiners are trained to search.
What AI changes is the unit economics. A GPT-4-class model can generate a technically detailed, enabling disclosure — complete with system architecture, pseudocode, parameter ranges, and variation enumeration — for less than $0.10 in API costs. At that price, the combinatorial approach becomes viable: instead of publishing disclosures for inventions you've already conceived, you publish disclosures for every plausible combination of known techniques applied to every plausible domain.
The math is simple enough. Take 500 known machine learning techniques. Cross them with 200 industry verticals. Add 50 deployment patterns (edge, cloud, federated, on-device, hybrid). That's 5 million unique combinations. At $0.10 each, generating enabling disclosures for the entire matrix costs $500,000 — less than the median settlement of a single patent troll lawsuit.
The $500,000 Question vs. the $3 Million Answer
Patent troll economics are the reason this matters.
According to the AIPLA Economic Survey, defending a patent infringement lawsuit through trial costs $600,000 to $3.625 million per patent, depending on the damages alleged and technology complexity. Most defendants settle before trial, but settlements aren't cheap either — they typically run $300,000 to $2 million.
Non-practicing entities (NPEs) — the polite term — filed 88.3% of all high-tech patent litigation in U.S. district courts in Q2 2024. Patent Assertion Entities alone accounted for 62.8% of total cases. NPE district court filings jumped 21.6% in 2025 compared to 2024. The Eastern District of Texas remains the preferred venue.
Small companies bear the worst of it. Research shows that NPE lawsuits consume roughly a quarter of targeted small firms' R&D spending. Many trolls deliberately target companies with revenues under $100 million and time their suits to coincide with funding rounds or acquisition events — maximizing the pressure to settle.
Against that backdrop, $500,000 to generate 5 million defensive disclosures looks like the best insurance policy in tech.
The Legal Case That Hasn't Been Made Yet
There is a critical caveat: no U.S. court has ruled on whether AI-generated text constitutes valid prior art under § 102(a)(1). The legal argument is strong, but it's untested.
A detailed analysis by PatentNext (June 2024) identifies the strongest counterargument. § 102(a)(2) — which covers prior patents and patent applications — explicitly requires that the reference "names another inventor." Since Thaler v. Vidal (Fed. Cir. 2022) confirmed that only humans can be named as inventors, AI-generated output arguably cannot qualify as prior art under this subsection.
But § 102(a)(1) — which covers "printed publications" and things "otherwise available to the public" — contains no such inventor requirement. The statute is facially neutral about authorship. A strict textualist reading says: if the disclosure is public and sufficiently detailed, it's prior art. Period.
The counterargument from PatentNext centers on the concept of "conception." Under U.S. patent law, an invention requires a "definite and permanent idea of the complete and operative invention" formed "in the mind of the inventor" (Burroughs Wellcome Co. v. Barr Labs., Fed. Cir. 1994). If AI can't conceive, can its output really describe an invention? Or is it just text that resembles a description?
This is the argument that will eventually reach the Federal Circuit. But it's worth noting what the counterargument must prove: not that AI can't invent (that's settled — it can't, per Thaler), but that a publication must describe a conceived invention in order to qualify as prior art. That's a different and much harder claim. The § 102(a)(1) text doesn't say "described an invention" — it says "the claimed invention was described in a printed publication." The publication doesn't need to know it's describing the invention. It just needs to contain the same technical disclosure.
In April 2024, the USPTO published a formal Request for Comments asking the public to weigh in on exactly this issue: can AI-generated content constitute prior art? How should patent examiners evaluate it? At what volume does AI-generated prior art become an "undue barrier to patentability"?
The fact that the USPTO is asking these questions — rather than dismissing them — tells you everything about where this is heading.
The Enabling Disclosure Problem
All Prior Art's 107,000 Markov-chain paragraphs have a fundamental weakness: most of them fail the enabling disclosure test. Under patent law, prior art must be detailed enough that a "person having ordinary skill in the art" (PHOSITA) could reproduce the invention. Gibberish that sounds patent-like but describes nothing reproducible doesn't qualify.
This is where the 2016-to-2026 gap matters most. Markov chains couldn't write enabling disclosures. GPT-4 can. A well-prompted modern language model can generate:
- System architecture descriptions with component interactions
- Pseudocode with actual logic flow
- Parameter ranges and optimization thresholds
- Materials specifications and manufacturing steps
- Claims-style enumeration of every variation and combination
The quality floor for AI-generated technical writing is now high enough that many outputs would satisfy the PHOSITA enablement standard — not all, but enough to matter at scale. A 10% hit rate across 5 million generated disclosures is 500,000 valid prior art references.
A legal analysis from Solve Intelligence proposes an intriguing alternative framework: instead of asking "was this generated by a human?", ask "has this publication received sufficient attention from a human audience?" Under this standard, a curated, indexed, and searchable database of AI-generated disclosures — one that human engineers actually read and reference — would qualify more easily than raw machine output dumped into the void.
What an Operational System Would Look Like
The prototype already exists in concept. Scaling it requires four components that didn't exist together until recently:
Generation. LLMs produce enabling disclosures at roughly $0.05-0.15 each. The generation strategy isn't random — it's combinatorial. Take known techniques from every patent class, cross with deployment domains, enumerate variations. Target the claim language patterns that NPEs favor.
Timestamping. Each disclosure gets archived to the Internet Archive's Wayback Machine, an IPFS content hash, and optionally a blockchain timestamp. The goal is irrefutable proof of publication date, since § 102 requires that prior art predate the patent filing.
Indexing. This is the underappreciated challenge. Patent examiners search the USPTO's internal databases (PatFT, AppFT), Google Patents, and Google Scholar. If AI-generated disclosures don't appear in those search engines, examiners won't find them, and they won't be cited. The database needs structured HTML with semantic markup, proper schema.org metadata, and submission to Google's academic indexing pipelines.
Quality filtering. Not every generated disclosure is enabling. A human review layer — or, more practically, a second AI pass that evaluates whether the disclosure meets PHOSITA standards — filters out the noise. IBM's Technical Disclosure Bulletin succeeded because it was editorially curated. The AI version needs a quality floor, even if that floor is maintained algorithmically.
The Meta-Irony
The companies most loudly arguing that AI-generated content should be copyrightable are the same companies whose technology makes this defensive publication strategy possible. OpenAI's legal position is that its models' output should receive some form of IP protection. Anthropic has been more cautious but hasn't renounced output copyright. Google has staked similar ground.
If they win — if AI-generated text becomes copyrightable — the defensive publication strategy weakens. Copyrighted text can be restricted, licensed, taken down. The power of AI-generated prior art lies precisely in its public domain status.
So the AI companies' copyright fight, if successful, would inadvertently protect the patent troll ecosystem they're presumably trying to disrupt. The most useful legal feature of their technology — its non-copyrightability — is the one they're trying to eliminate.
The best weapon against patent trolls, in other words, depends on AI companies losing their own IP battle.
What We Don't Know
This analysis has significant limitations. The central legal question — whether AI-generated publications qualify as § 102(a)(1) prior art — is genuinely unresolved. The argument presented here follows the textualist reading, but courts regularly deviate from strict textualism when the policy consequences are dramatic enough. A ruling that AI-generated text constitutes valid prior art could, as the USPTO's RFC acknowledges, create "undue barriers to patentability" by flooding the prior art corpus with machine-generated noise. Courts may decide that preventing this outcome justifies reading a human-authorship requirement into § 102(a)(1), even though the text doesn't contain one.
The enabling disclosure quality claims are also speculative. Whether GPT-4-class models can consistently produce disclosures that would survive adversarial scrutiny in litigation — where opposing counsel is paid $800/hour to find flaws — has not been tested. The gap between "sounds technical" and "enables a PHOSITA to reproduce the invention" is wide enough for a patent attorney to drive a truck through.
The combinatorial generation cost estimates ($0.05-0.15 per disclosure) reflect current API pricing and could change in either direction. More significantly, the 10% quality hit rate is an assumption without empirical backing. The actual rate could be 1% or 40%. Nobody has run the experiment.
Finally, there's a gaming risk this analysis hasn't fully addressed. If defensive publication at scale becomes viable, patent trolls could use the same technology offensively: generate their own disclosures, then file patents on slight variations that dodge their own published prior art. The arms race potential is real.
The Strongest Counterargument
The most persuasive case against AI-generated defensive publications isn't legal — it's practical. Patent trolls don't win because prior art doesn't exist. They win because finding and presenting prior art is expensive. The average invalidity search costs $15,000-30,000. A full inter partes review at the PTAB costs $200,000-500,000.
Adding 5 million more documents to the prior art corpus could make this problem worse, not better. If patent examiners can't efficiently search a database of AI-generated disclosures — and the USPTO's current search infrastructure was not designed for this volume — the flood of prior art becomes noise, not signal. The trolls' patents still get issued, the prior art still exists somewhere in a sea of machine-generated text, and defendants still have to spend six figures finding it.
The counterpoint to this counterargument is that AI also improves prior art search. Companies like Solve Intelligence are building AI-powered prior art search tools that can process exactly the kind of massive, unstructured corpus that AI-generated prior art would create. The generation side and the search side scale together.
But "AI can find AI-generated prior art" has a recursive quality that should give everyone pause.
The Bottom Line
Somewhere in patent law there is a gap — a structural mismatch between two regimes. Copyright law says AI-generated text belongs to no one. Patent law says publicly available text can invalidate anyone's patent. That gap is an invitation. IBM spent 40 years and untold millions filling it with human-written disclosures. AI can do the same thing for the price of a mid-tier SaaS subscription. The legal question is unsettled. The economics are not. If you can generate 5 million defensive disclosures for less than the cost of one patent troll settlement, someone will. The only question is whether the courts decide it counts.