Big Tech Is Spending $630 Billion on AI Infrastructure That Gets 90% Cheaper. Both Numbers Are Real.

Q: What You Can Do

If you invest in hyperscaler stocks: Watch the ratio of capex to inference revenue, not just total capex. Companies that efficiently convert infrastructure spending into inference revenue at scale will separate from those burning cash on training clusters that lose strategic value as models shrink. Amazon's AWS margin trajectory from 2013-2024 is the benchmark: high initial capex, declining unit prices, growing total revenue.

Six hundred and thirty billion dollars.

That is the combined AI capital expenditure planned by the major hyperscalers for 2025-2026, according to Reuters Breakingviews. Meta alone guided $115 billion to $135 billion for 2026, up from $72 billion in 2025. Microsoft, Google, and Amazon are each on similar trajectories. S&P Global pegged the figure at $635 billion in a March 31 analysis.

Now hold that number in one hand and pick up another: Gartner projected on March 25, 2026 that performing inference on a 1-trillion-parameter large language model will cost providers over 90% less in 2030 than in 2025. LLMs in 2030, the firm estimates, will be up to 100 times more cost-efficient than the earliest models of comparable size from 2022.

Both numbers are real. A reasonable person might expect that if the product gets 90% cheaper, the factories building it would slow down. Coal miners in 1860s England had a reasonable expectation too.

The Coal Question, Revisited

In 1865, the English economist William Stanley Jevons published The Coal Question, a study of Britain's coal dependency. His central observation was counterintuitive: James Watt's steam engine had made coal use dramatically more efficient, yet Britain's total coal consumption did not decline. It increased tenfold in the following decades. Efficiency made steam power economical for thousands of applications that were previously too expensive, and the explosion in use cases swamped the per-unit savings.

This phenomenon, now called the Jevons Paradox, has repeated itself across technologies. When U.S. telecom companies laid roughly $2 trillion in fiber optic infrastructure between 1999 and 2001, bandwidth costs collapsed 99% by 2005. Most of the infrastructure was initially stranded: more than 85% of fiber went dark. But total bandwidth consumption grew 100 times over. The dark fiber lit up. Solar module costs fell 99% since 1976. Global solar investment did not decrease. It hit $382 billion in 2024, per BloombergNEF.

AI inference is now following the same pattern. Per-unit cost is collapsing. Unit consumption is about to explode. And the companies writing the checks know this.

The Token Multiplication Problem

Understanding why cheaper inference means more spending requires looking at what AI agents actually consume. A standard chatbot query uses roughly 500 to 2,000 tokens. But the industry is pivoting from chatbots to agents, and agents are hungry.

A code review agent reading a pull request, cross-referencing documentation, running test scenarios, and writing a summary might consume 15,000 to 60,000 tokens per task. Gartner's own analysis puts the range at 5 to 30 times more tokens per agentic task compared to a standard chatbot exchange. Will Sommer, Gartner's Senior Director Analyst, was blunt in the report: "CPOs should not confuse the deflation of commodity tokens with the democratization of frontier reasoning. As commoditized intelligence trends toward near-zero cost, the compute and systems needed to support advanced reasoning remain scarce."

This is where the math gets uncomfortable.

The Crossover Calculation

Start with a 2025 baseline. GPT-4-class inference costs roughly $1.00 per million tokens at API pricing. A typical chatbot task uses about 1,000 tokens. Cost per task: $0.001. Multiply by 50 tasks per enterprise user per day (a rough approximation: one Slack question, a few email drafts, some meeting summaries, a handful of code completions), and you get $0.05 per user per day.

Now project to 2030 using Gartner's 90% cost reduction. Inference drops to $0.10 per million tokens. Sounds cheap. But the typical task is no longer a chatbot query. It is an agentic workflow consuming 15,000 tokens (the midpoint of Gartner's 5-30x range). Cost per agentic task: $0.0015.

That is 50% more expensive per task than the 2025 chatbot interaction, despite a 90% unit cost reduction.

Metric	2025 (Chatbot)	2030 (Agentic)	Change
Cost per 1M tokens	$1.00	$0.10	-90%
Tokens per task	1,000	15,000	+1,400%
Cost per task	$0.001	$0.0015	+50%
Tasks per user/day	50	2,500	+4,900%
Daily cost per user	$0.05	$3.75	+7,400%

Volume is the decisive factor. If agentic AI handles 50 times more tasks per user per day (automating email triage, scheduling, code generation, data analysis, customer interactions, document review), the daily cost per enterprise user rises from $0.05 to $3.75. At 100 million enterprise users, that is $137 billion per year in inference compute alone.

Put differently: the 90% unit cost drop does not save money. It creates $137 billion in annual demand that did not exist at 2025 prices, because most of those 2,500 daily tasks were too expensive to automate at $0.001 each but become viable at $0.0015 each in a world where the business value per automated task far exceeds the cost.

What the $630 Billion Is Actually Buying

Here is the part that should concern investors. Hyperscaler capex totaling $630 billion is heavily weighted toward training infrastructure: Nvidia H100 and B200 GPU clusters housed in centralized mega-data-centers. Meta disclosed in its Q4 2025 earnings call that its capex increase is funding "several massive AI data centers" in pursuit of what it calls superintelligence. Capex rose 49% in Q4 2025, outpacing revenue growth of 24%. Operating margin dropped 7 percentage points.

But the 2030 inference world may need something different. Edge inference on phones, cars, and AR glasses. Specialized inference chips that are not GPU-based. Heterogeneous compute mixing cloud, edge, and on-device processing. Models themselves are getting smaller and more efficient. DeepSeek R1, released in early 2025, achieved competitive benchmark scores at a fraction of GPT-4's parameter count, and its training cost was reported at $294,000, compared to GPT-4's estimated $79 million.

How much of the $630 billion is building training infrastructure that becomes less valuable as models shrink? No company discloses the split between training and inference capex. That opacity is itself a risk factor.

The Strongest Case for the Builders

None of these companies are making this bet blindly. Their logic follows the AWS playbook: Amazon Web Services saw cloud computing unit prices fall dramatically over a decade, yet total cloud revenue grew because volume increases vastly outpaced price declines. AWS revenue hit $105 billion in 2024, up from $3.1 billion in 2013, despite per-unit prices that dropped more than 90% over that period.

Their wager is that they will control the infrastructure handling the 100x volume increase, becoming the electric utilities of AI: low-margin, high-volume infrastructure monopolies. If agents process 2,500 tasks per user per day instead of 50, someone has to run those inference clusters. Microsoft, Google, Meta, and Amazon are betting that "someone" is them.

This is a reasonable bet. Cloud infrastructure has natural monopoly dynamics: data gravity, integration depth, regulatory compliance overhead, and latency requirements all favor incumbents. What remains uncertain is not whether volume will absorb capacity. It almost certainly will. What remains uncertain is whether $630 billion is the right amount for the 2025-2028 transition period, when agentic AI adoption has not yet scaled to fill the capacity being built.

The Telecom Echo

Fiber optics offer a cautionary parallel. Between 1999 and 2001, telecom companies built roughly $2 trillion in infrastructure. Bandwidth demand eventually absorbed every strand of that fiber and more. But the companies that built it largely went bankrupt in the interim. WorldCom, Global Crossing, and dozens of smaller carriers collapsed not because their capacity was unneeded, but because the timing mismatch between build-out costs and revenue ramp destroyed their balance sheets.

Crucially, the hyperscalers have an advantage their telecom predecessors lacked: massive cash-generating businesses funding the build-out. Meta's advertising revenue hit $58.14 billion in Q4 2025 alone. Google and Microsoft have similarly deep pockets. They can absorb years of negative returns on AI capex in ways that standalone infrastructure companies cannot.

Still, a 49% capex increase outpacing 24% revenue growth is a trajectory with a mathematical endpoint. Meta can sustain this as long as advertising margins hold. If ad revenue growth slows while AI capex continues accelerating, the gap narrows fast.

Limitations

Several assumptions in this analysis deserve scrutiny. Gartner's 90% cost reduction is a projection, not a measurement. Actual deflation could be faster: GPT-4 pricing dropped 97% in 18 months, overshooting most forecasts. It could also stall if semiconductor supply chains tighten or if frontier model architectures resist efficiency gains.

Jevons' Paradox is not a natural law. It is an observed tendency. In some cases, efficiency gains do reduce total consumption. LED lighting reduced total electricity used for illumination in the United States despite massive adoption, because the efficiency gains were large enough to outpace the rebound effect. Whether AI follows the coal pattern or the LED pattern depends on how elastic demand for automated cognition turns out to be.

Gartner's agentic token multiplication factor (5-30x) is based on early deployments. Real-world agentic token usage data at enterprise scale remains sparse. And the 50x task volume increase is this article's projection, not a published forecast, and assumes broad agentic AI adoption across enterprise workflows by 2030.

Finally, no hyperscaler publicly discloses the breakdown between training and inference capex. Our stranded-asset analysis is directional, not precise.

What You Can Do

If you invest in hyperscaler stocks: Watch the ratio of capex to inference revenue, not just total capex. Companies that efficiently convert infrastructure spending into inference revenue at scale will separate from those burning cash on training clusters that lose strategic value as models shrink. Amazon's AWS margin trajectory from 2013-2024 is the benchmark: high initial capex, declining unit prices, growing total revenue.

If you run enterprise IT: Do not sign long-term compute contracts at 2026 prices. Inference costs are falling fast enough that a three-year commitment signed today will look expensive by 2028. Negotiate contracts with annual price resets tied to public API pricing benchmarks.

If you build AI applications: Optimize token efficiency now, while it still differentiates. Build agentic systems that accomplish tasks in fewer tokens without sacrificing quality. When tokens approach near-free, this advantage evaporates, but for the next two to three years, token-efficient architectures are a competitive moat.

If you work in energy policy: Plan grid capacity for the volume curve, not the efficiency curve. Even as per-token energy efficiency improves, total AI energy consumption will rise. Current DOE grid planning models do not adequately account for agentic AI's multiplicative effect on compute demand.

The Bottom Line

William Stanley Jevons would recognize this moment. A technology is getting dramatically cheaper. Its builders know it is getting cheaper. They are building more factories anyway, because cheaper does not mean less. It means more people can afford it, more tasks become viable, and total demand overwhelms the per-unit savings.

The $630 billion question is not whether AI infrastructure will be needed. It will. The question is whether the infrastructure being built in 2025-2026 matches the infrastructure that will be needed in 2030. Training clusters running H100s may not be what a world of 100 million enterprise users running 2,500 agentic tasks per day actually requires. If the future is distributed edge inference on specialized silicon, some of that $630 billion is building the wrong kind of factory.

Coal consumption increased tenfold after Watt's engine. Bandwidth consumption increased a hundredfold after the fiber bust. AI token consumption is on a steeper trajectory than either. The paradox is not that Big Tech is spending $630 billion while costs fall 90%. The paradox would be if they spent less.