What percentage of code on GitHub is AI-generated?

As of early 2026, over 51% of all code committed to GitHub is AI-generated or AI-assisted, according to GitHub platform data. This marks the first time AI-authored code has exceeded human-written code on the platform.

Do developers trust AI-generated code?

According to the Stack Overflow 2025 Developer Survey, only 29% of developers trust AI code accuracy, down from 40% in 2024. Meanwhile, 46% explicitly distrust AI outputs, and 61% agree that AI generates code that looks correct but is not reliable.

Which AI coding tool has the highest satisfaction?

Claude Code leads with 91% customer satisfaction and an NPS of 54, the highest on the market according to JetBrains' AI Pulse Survey of 10,000+ developers. It grew from roughly 3% to 18% work adoption in nine months and reached a $2.5 billion annual run rate.

51% of Code on GitHub Is Now AI-Generated. The Developers Who Write It Trust It Less Than Ever.

Twenty-nine percent. That is how many professional developers trust the accuracy of AI-generated code, according to Stack Overflow's 2025 Developer Survey, which polled 65,000 respondents across every major programming language and industry vertical. One year earlier the figure was 40 percent, and two years before that, when GitHub Copilot was still generating breathless conference keynotes and LinkedIn thought leadership, positive sentiment toward AI coding tools exceeded 70 percent among the same population.

During that identical period, adoption did not merely grow. It went vertical. JetBrains surveyed 10,000 professional developers across eight languages in January 2026 and found that 90 percent regularly use at least one AI tool for coding at work. GitHub's platform data shows over 51 percent of all committed code is now AI-generated or AI-assisted, meaning that for the first time in the repository's 18-year history, machines are writing more code than the humans who employ them.

Nobody has found this contradiction interesting enough to sit down and calculate what it implies.

The Adoption Numbers

JetBrains' AI Pulse survey, conducted across every major software market with localization into eight languages, reveals a market where AI coding tools have achieved near-total penetration: 74 percent of developers now use specialized AI coding tools rather than general-purpose chatbots for their work, a distinction that matters because purpose-built tools integrate directly into the editor and shape every keystroke in ways that a browser-tab chatbot cannot. GitHub Copilot leads awareness at 76 percent with 29 percent work adoption, but its growth curve has flattened into a plateau that looks increasingly permanent. Cursor holds 69 percent awareness and 18 percent work adoption with similarly stalled momentum.

Claude Code is the exception that makes the stagnation elsewhere visible. Launched in May 2025 with roughly 3 percent market share, it reached 18 percent work adoption by January 2026, a 6x growth rate in nine months that corresponds to a customer satisfaction score of 91 percent and a Net Promoter Score of 54, the highest of any AI coding tool measured by any major survey. In the United States and Canada, adoption climbs even higher to 24 percent. Anthropic reported a $2.5 billion annualized run rate from Claude Code within its first year, and 71 percent of developers who use AI coding agents for multi-file autonomous workflows chose it over every competitor including Copilot.

The Trust Collapse

Here is where the data becomes genuinely strange, because Stack Overflow's survey of 65,000 developers shows trust in AI code accuracy plummeting to 29 percent while 46 percent explicitly distrust the output, a ratio that means for every developer who trusts AI code there are nearly two who have concluded the opposite. Only 3 percent report "high trust." In a profession built on precision, that is indistinguishable from zero.

Sixty-one percent of respondents agree that AI generates code which "looks correct but isn't reliable." Developers are telling surveyors in unambiguous terms and statistically significant sample sizes that the tools they use daily produce plausible-looking bugs, and then those same developers open their editors the next morning and accept the suggestions anyway, because the alternative is falling behind colleagues who accepted theirs yesterday.

This is not hypocrisy. It is time-horizon economics. Developers save an average of 3.6 hours per week by using AI coding tools, a number that appears on performance dashboards immediately. GitHub Copilot completes 46 percent of code suggestions, though only 30 to 31 percent survive developer review. Speed wins the daily calculus even when quality will lose the quarterly audit.

The Vulnerability Data

A peer-reviewed study published in Empirical Software Engineering in early 2026 analyzed 2,315 C, C++, and C# code snippets from the DevGPT dataset and found 56 vulnerabilities across 48 files, then tested GPT-4.1, GPT-5, and Claude Opus 4.1 on whether they could detect and repair the flaws their predecessors had introduced. Detection rates improved from roughly 50 percent to 75-80 percent between October 2024 and September 2025. Progress is real. One in four vulnerabilities still evades even the best models.

More importantly, Belozerov, Barclay, and Sami concluded that "LLM-generated code is about as likely to contain vulnerabilities as developer-written code," a finding that cuts in two directions simultaneously: AI code is not catastrophically worse than human code on a per-snippet basis, but it is being produced at a volume and velocity that human code never approached, and the review infrastructure that catches bugs before they reach production was designed for human authorship rates, not machine ones.

The Math Nobody Did

Consider the implied technical debt injection rate that emerges when you combine these three independent datasets. If 51 percent of committed code is AI-generated, and 61 percent of developers acknowledge that AI code looks correct but is unreliable, then the global codebase is absorbing plausible-looking defects at a rate proportional to both the volume of AI output and the false confidence that plausible-looking code inspires in reviewers who are already pressed for time. A developer reviewing Copilot suggestions accepts about 30 percent of them, and of that accepted code, some fraction contains latent bugs that passed review precisely because they resembled working code in every surface detail.

We can bound this estimate. GitHub Copilot processes billions of completions daily, and at a 46 percent completion rate with 30 percent acceptance, roughly 14 percent of all coding keystrokes result in committed AI suggestions. If the Springer vulnerability rate holds and one in four defects evades detection, then for every 100 accepted AI code blocks approximately 0.6 contain an undetected vulnerability, which sounds manageable until you multiply by billions of annual completions, by the compounding effect of vulnerabilities interacting across modules, and by the clustering pattern where AI-generated code replicates identical flawed logic across multiple files because models draw from overlapping training distributions. Nobody publishes this composite number, and calculating it requires crossing three independent sources with incompatible methodologies: Stack Overflow measures sentiment, JetBrains measures adoption, Springer measures defect rates. Merge them and you get a profession sprinting toward a cliff while accurately describing the cliff to every surveyor who asks.

Why It Keeps Accelerating

Individual rationality drives collective risk because a developer who refuses AI tools ships less code than peers who accept them, and performance reviews at most companies do not yet penalize technical debt from AI-generated code for the simple reason that the debt has not matured long enough to produce visible production failures at organizational scale. By the time those failures arrive, the AI-authored code will be entangled in systems where distinguishing it from human-written code is functionally impossible, and the developer who accepted the suggestion will have moved to another team or another company entirely. Quarterly reports capture the 3.6 hours saved per developer per week. Security incidents from AI-introduced vulnerabilities manifest on timescales that accounting systems are structurally unable to attribute to their origin.

Strongest Counterargument

AI-generated code might not add net technical debt at all, because it replaces code that would have been equally buggy if written by humans under the same time pressure, and the Springer paper explicitly says LLM-generated code has "about" the same vulnerability rate as developer-written code rather than a higher one. If developers are trading one source of bugs for a faster source with identical defect density, the net quality impact could be zero or marginally positive.

This argument carries the most weight for junior developers writing boilerplate under deadline pressure, where Copilot's suggestion may genuinely be better than what the human would have produced alone. For senior engineers working on novel architecture with unusual constraints, the calculus reverses sharply: AI tools excel at common patterns and fail on edge cases that experienced developers catch through hard-won instinct. Most codebases contain both kinds of work in proportions that vary wildly across teams, and no current survey distinguishes between them.

Limitations

GitHub's 51 percent figure conflates "AI-generated" and "AI-assisted," which means a developer who uses Copilot to autocomplete a line they were already writing gets counted identically to one who prompts an agent to write an entire module from a natural-language description, inflating the headline number relative to the amount of code that AI truly authored without meaningful human guidance.

Stack Overflow's trust metric captures developer sentiment rather than measured defect rates, and the relationship between how much developers say they distrust AI code and how many bugs that code actually contains remains unquantified by any peer-reviewed study at production scale. Our technical debt injection calculation is an estimate built by combining independent datasets with different methodologies, sample populations, collection periods, and definitions of key terms. Treat it as directional rather than precise. Claude Code's market data comes primarily from Anthropic and JetBrains, with limited independent verification of the $2.5 billion annualized run rate.

What You Can Do

If you manage a development team, start measuring what percentage of your merged pull requests originate from AI suggestions and correlate that against defect reports from the same code paths. Most organizations track neither metric, and the handful that track acceptance rates almost never connect them to downstream quality data. You will be among the first teams in your industry to have this number.

If you write code, review AI suggestions the way you would review a pull request from a new hire who is fast, confident, fluent in your codebase's idioms, and occasionally wrong about edge cases in ways that look exactly like being right.

If you are an investor evaluating AI coding companies, ask for defect-rate data alongside the productivity metrics every pitch deck contains, because a company reporting "time saved" and "acceptance rate" without defect correlation is showing you half of a double-entry ledger.

If you run a security team, audit AI-heavy repositories for vulnerability clustering: AI-generated code tends to replicate identical flawed patterns across multiple files because models draw from overlapping training distributions, meaning one vulnerability template can silently become dozens of production instances before any individual review catches the pattern.

The Bottom Line

Software development has crossed an adoption threshold that its quality infrastructure was never designed to support. More than half the code on GitHub is AI-generated, nine in ten professionals use the tools, and fewer than three in ten trust what comes out. Springer's vulnerability analysis suggests the distrust is well-calibrated, and the gap between what developers believe about AI code and what they do with it is widening every quarter because the incentives reward speed on timescales that matter to managers and punish quality on timescales that do not show up until the managers have been promoted. Somebody needs to publish the actual composite technical debt number. Until then, the industry is navigating by instruments it knows are unreliable, logging that knowledge diligently in annual surveys that nobody cross-references with the production incident database.