💻 AI & Infrastructure

Google's Gemini 3.5 Flash Costs 3× More Per Token. It Just Slashed Subscription Prices 60%. Both Moves Make the Same Bet.

At I/O 2026, Google raised API token prices 3× while cutting its top AI subscription from $250 to $100. The throughput math makes both moves rational: each GPU now generates 14× more revenue per hour. And once Gemini Spark has 90 days of your email patterns, you're not leaving.

Abstract visualization of diverging price arrows — one rising for API tokens, one falling for consumer subscriptions — converging on a central glowing agent icon

Three times. Gemini 3.5 Flash, the model Google shipped to every developer and consumer product on its platform today, costs $1.50 per million input tokens and $9.00 per million output tokens — a 3× increase over Gemini 3 Flash's $0.50 and $3.00 respectively. Artificial Analysis, given pre-release access, found the full benchmark suite cost 5.5× more to run than the previous Flash, driven by higher token prices and additional inference turns that agentic workloads consume as they chain reasoning across multiple tool calls.

On the same stage, Google cut its top consumer AI subscription from $250 per month to $100 and launched Gemini Spark — a 24/7 personal agent that monitors your email, tracks credit card statements, automates tasks across more than 30 third-party apps through MCP, and runs in cloud VMs around the clock whether you're awake or not.

Raise prices for developers, cut prices for consumers, and launch the most compute-intensive consumer product you've ever offered, all from the same keynote. Not confusion. Arithmetic.

The Revenue-Per-GPU-Hour Calculation Nobody Published

Gemini 3.5 Flash outputs at 284 tokens per second (independently measured), approximately four times the throughput of comparable frontier models, with an optimized variant hitting 12× the speed at the same quality level, according to DeepMind chief technologist Koray Kavukcuoglu.

Here is the calculation Google did not publish. At 284 output tokens per second and $9.00 per million tokens, a single GPU serving 3.5 Flash generates roughly $9.20 in output revenue per hour. The previous Flash, at an estimated 72 tokens per second and $3.00 per million, generated $0.78 per hour.

Model Output TPS Price / 1M Output Revenue / GPU-Hour
Gemini 3 Flash ~72 $3.00 $0.78
Gemini 3.5 Flash 284 $9.00 $9.20
Improvement 3.9× 3.0× 11.8×

Multiply throughput by price increase: each GPU earns nearly twelve times what it earned last quarter. Google can afford to subsidize consumer subscriptions because the API side of the business covers the discount and then some.

The Three-Number Convergence

Every major AI lab has converged on the same three subscription price points in under 18 months.

Tier Google OpenAI Anthropic
~$20/mo AI Pro $19.99 Plus $20.00 Pro $20.00
~$100/mo Ultra $100 Pro $100 Max $100
~$200/mo Ultra $200 Pro $200 Max $200

This convergence tells you the prices are set by consumer psychology, not cost-plus margins. The product is being priced like SaaS, not a metered utility, because the strategic objective is subscriber count.

Spark and the 90-Day Lock-In Window

Gemini Spark is not a chatbot with a rebrand — it is a persistent agent running in Google Cloud that reads your Gmail, monitors your calendar, scans credit card statements for hidden subscriptions, and connects to over 30 services via MCP, asking permission before high-stakes actions and running while your phone is off.

Switching costs for a chatbot hover near zero. The switching cost for a personal agent that has been monitoring your email for 90 days is functionally infinite — not because of data portability, but because the trained behavioral context is not data in a downloadable file. Spark knows that you always ignore promotional emails from one retailer but read every one from another, knows your Thursday meetings run over and your Friday ones finish early, knows you pay the electric bill on the 15th but forget the water bill until the 22nd, and none of that knowledge transfers because it emerged from thousands of small interactions that would take exactly as long to rebuild on a competing platform as they took to build the first time.

Google is selling you an agent at a steep discount because the first 90 days of usage make the next five years of switching cost unbearable.

The Market Share Problem

As of May 2026, ChatGPT holds 60.6% of the generative AI chatbot market while Gemini holds 15.1% — and Microsoft Copilot at 12.5% is closer to Google than Google is to OpenAI, a competitive humiliation for the company that invented the Transformer architecture underlying every one of these models. Google has every distribution advantage imaginable — 750 million monthly Gemini users, 350 million paid subscriptions, Q1 revenue of $109.9 billion — but chat products have near-zero switching costs, OpenAI built the habit loop first, and distribution alone cannot close the gap.

Google also quietly shifted from daily prompt limits to "compute-used" billing where limits refresh every five hours and heavy users hit caps faster than light ones — airline yield management applied to inference, turning the $100 subscription into a floor rather than a ceiling while ensuring long-tail power users who consume 50× the average compute become profitable instead of loss leaders.

Agent products — Spark scanning your email, Daily Brief curating your morning, information agents monitoring the web — create switching costs that compound daily, and Google's entire I/O keynote was a two-hour argument that the platform where your agent lives becomes as sticky as the platform where your phone number lives.

Strongest Counterargument

The strongest case against this analysis is that Google has tried the bundle-and-lock-in playbook before — Google+, Allo, Duo, Wave, Inbox by Gmail, Stadia — and Spark could join the graveyard as easily as it could break the pattern, especially if the always-on agent turns out to be a feature people try for a week and abandon the way most users tried Google Assistant routines and never came back. If the habit doesn't form, the switching cost never materializes, and the subsidy evaporates.

Limitations

The revenue-per-GPU-hour calculation uses API list prices, which overstate what enterprise customers pay after volume discounts; Google's internal TPU cost is substantially lower, making the 11.8× figure a revenue multiplier, not a margin multiplier. The 72 TPS estimate for Gemini 3 Flash comes from Artificial Analysis historical data, not Google disclosures. Market share data from First Page Sage measures consumer chatbot usage and does not capture enterprise deployment where Google's position is stronger. The 90-day lock-in window is an analytical framework, not a measured threshold.

What You Can Do

If you are evaluating AI subscriptions, the $100 Google Ultra tier is the best dollar-per-feature value in the market, but only if you already live in Google's ecosystem — if your email runs through Outlook and your docs live in Notion, Spark has no data to learn from and you're paying $100 for a personal assistant that knows nothing about your life. If you build on AI APIs, 3.5 Flash at $1.50/$9.00 per million tokens is still cheaper than GPT-5.5 at $5.00/$30.00 and Claude Opus 4.7 at $5.00/$25.00, but DeepSeek V4 Flash at $0.14/$0.28 is now 10× cheaper on input and 32× cheaper on output — for any latency-insensitive batch workload, at least benchmark the Chinese alternative before committing to Google's pricing. If you compete with Google, the signal is unambiguous: the race is not to build the best model but to build the agent people live inside long enough that leaving becomes unthinkable.

The Bottom Line

Google charged developers more and consumers less because it can afford to, and because the flywheel mechanics demand it: each GPU running 3.5 Flash earns twelve times what it earned running the previous model, and that enterprise revenue surplus pays for the consumer discount that drives the subscriber growth that feeds the agent's behavioral data that creates the switching cost that makes the subscriber permanent, which in turn justifies the next round of infrastructure investment to Alphabet's board. Whether this flywheel actually spins or stalls in the Google product graveyard is the seventy-five-billion-dollar-capex question of 2026. But the math, for once, is on Google's side.