🤖 Robotics

A Free AI Tutor for Every Student Would Cost $8 Per Year. Nobody Is Building It.

Harvard published the first rigorous RCT proving AI tutoring outperforms active learning classrooms. Open-source LLMs have driven inference costs below $0.06 per million tokens. Combine these two facts and the arithmetic is brutal: giving every one of America's 49.4 million public school students a personal AI tutor would cost roughly $395 million per year, or $8 per student. The U.S. spends $15,633 per student already. Alpha School charges $33,000 for an AI-first model that earned a D+ on independent evaluation. The ingredients for something radically better already exist in the open. The missing piece is not technology. It is assembly.

A student at a simple desk working with a glowing AI interface projected from a tablet, warm classroom lighting, photorealistic

Eight dollars. That is what it costs, per student per year, to run a personal AI tutor on the current best open-source model: 30 minutes of daily interaction across 180 school days at current API pricing. For younger students who need less horsepower, the floor drops below $4. Alpha School charges $33,000 for something that earned a D+ on independent evaluation. The gap between what exists and what is possible has never been wider.

In June 2025, Harvard researchers published a randomized controlled trial in Scientific Reports that should have detonated the edtech industry. They assigned 194 undergraduates to either an AI tutor or Harvard's own active learning classroom, then measured outcomes on identical material. AI students learned more in less time, reporting higher engagement, motivation, and confidence. This was not AI versus a bored lecturer. This was AI versus one of the best active-learning physics classrooms in the country, designed by the same researchers who built the AI. AI won.

The $8 Calculation (and Why It Could Be $4)

Here is the math nobody is running, laid out so you can check every assumption.

A 30-minute session generates roughly 4,000 tokens of student input and 8,000 tokens of tutor output. Conservative estimate; Khanmigo sessions run shorter because students disengage from generic responses.

The model landscape shifted dramatically on April 2, 2026, when Google released Gemma 4 under Apache 2.0. The 26B mixture-of-experts variant activates only 3.8B parameters per token, scores 88.3% on AIME 2026 and 82.3% on GPQA Diamond, and costs $0.06 per million input tokens and $0.33 per million output tokens via OpenRouter. For K-8 students who need competent tutoring rather than competition-math reasoning, Google's Gemma 3n E4B costs $0.06/$0.12. Meta's Llama 4 Scout ($0.08/$0.30) remains competitive but carries a community license with use restrictions. Apache 2.0 means any school district, nonprofit, or government can deploy without legal review.

Per session with Gemma 4 26B MoE: (4,000 × $0.06/1M) + (8,000 × $0.33/1M) = $0.0029. Over 180 school days: $0.52 per student per year in raw inference. With Gemma 3n E4B for elementary and middle school: $0.0012 per session, $0.22 per year.

Production overhead is real: retrieval-augmented generation over curriculum, progress tracking, safety guardrails, moderation, infrastructure. Apply a 10x multiplier (standard SaaS infrastructure ratio). That yields $5.20 for the top-tier model, $2.20 for the budget tier. Curriculum content costs zero using Open Educational Resources: IES documents mature OER libraries aligned to Common Core and state standards. OpenStax, CK-12, and Khan Academy's content are free, openly licensed, and machine-readable.

Add safety, identity management, teacher dashboards, and a 50% buffer. Blended across a district running Gemma 4 for high schoolers and Gemma 3n for younger students: roughly $8 per student per year at the high end, under $4 at the low end.

Platform Annual Cost per Student Model Evidence Base
US public school avg (Census FY2022) $15,633 30:1 classroom Mixed; 0.0-0.2σ vs. tutoring
Alpha School ~$33,000 2 hrs AI + enrichment D+ on independent eval
Khanmigo (districts) $10 GPT-4 via OpenAI No published RCT; pilot data only
Open-source AI tutor (high school tier) ~$8 Gemma 4 26B MoE + OER Architecture informed by Harvard RCT design
Open-source AI tutor (K-8 tier) ~$4 Gemma 3n E4B + OER Architecture informed by Harvard RCT design
Human 1:1 tutoring $6,000-$12,000 1:1 expert tutor Bloom (1984): 2σ effect

Why This Does Not Exist Yet

Khan Academy is the closest thing to this vision, and it is not close enough. Khanmigo runs on OpenAI's GPT-4: proprietary pricing that can change without notice, a model that cannot be audited or self-hosted, and student data flowing through OpenAI's infrastructure under whatever privacy policy applies this week. For districts bound by FERPA and COPPA, that dependency is a legal risk, not just a cost one.

Google's LearnLM trained 6 million U.S. educators in 2025. NotebookLM offers free AI tools grounded in uploaded content. Both serve teachers, not students. Google builds infrastructure; it will not commit to the pedagogical engineering Harvard demonstrated.

What Kestin's team proved is that AI tutoring fails when it acts like ChatGPT with a school theme. Their system enforced Socratic questioning, adapted cognitive load to individual students, reinforced growth mindset, and deliberately refused to give direct answers. A well-designed AI tutor frustrates the student's desire for quick answers in service of deeper learning. That is the hard engineering nobody wants to fund, because it requires pedagogical expertise, not just GPU time.

The Architecture That Should Exist

Five components, all production-grade, none assembled:

1. Pedagogically fine-tuned open LLM. Gemma 4 26B MoE (25.2B total params, 3.8B active per token, Apache 2.0) for high school; Gemma 3n E4B for elementary and middle school. Fine-tuned on Kestin's framework: Socratic scaffolding, cognitive load awareness, deliberate refusal to shortcut. For K-8 tutoring, a well-tuned smaller model outperforms a general-purpose large one because the pedagogical constraint matters more than raw reasoning at that level. Synthetic fine-tuning data from expert tutoring transcripts costs a few thousand dollars in GPU time. Once.

2. Curriculum-aligned RAG. Vector-index the OER corpus: OpenStax, CK-12, MIT OCW, state standards. A 7th-grader in Texas gets TEKS-aligned fraction scaffolding. A 10th-grader in New York gets responses grounded in NY State Learning Standards. RAG over structured documents is a solved problem in production at hundreds of companies.

3. Student progress model. Concept-level mastery tracking via spaced repetition (Leitner, SM-2, Bayesian knowledge tracing). Knows this student nails two-digit multiplication but fumbles carrying. This is the engine inside Duolingo and Knewton (now Wiley), published and documented.

4. Safety layer. Llama Guard fine-tuned as a classifier: blocks off-topic content, detects distress signals, prevents homework jailbreaks, flags conversations for human review.

5. Teacher dashboard. Real-time view of which students struggle, which misconceptions recur, where the AI fails. Teachers become diagnostic specialists triaging 30 individually profiled students, not lecturers broadcasting to 30 unknown minds.

Who Should Build This

Google has the clearest path. Gemma 4 is Apache 2.0 licensed (no use restrictions, unlike Llama's community license), scores higher on reasoning benchmarks than any open model its size, and Google already operates LearnLM. A national education deployment would prove Gemma's viability for high-stakes applications while generating fine-tuning data from millions of interactions. Meta could do this too: Llama is open-weight, Meta's 2025 R&D budget was $57.37 billion, and tutoring every American public school student would consume a rounding error of that. Microsoft funded Khanmigo for Teachers but routed student tutoring through paid licensing. Apple has classroom tools but no LLM.

The realistic path is probably a nonprofit or government initiative. IES already funds OER development; NSF funds AI research. Combining their mandates into a single platform, built on open models, deployed on government cloud, aligned to state standards, and free at point of use, would cost less annually than federal standardized testing administration.

The Strongest Case Against

Bloom's famous 2-sigma finding (1984) showed 1-on-1 human tutoring lifted students two standard deviations above classroom peers. Every attempt to replicate at scale has produced smaller effects. VanLehn's 2011 meta-analysis found intelligent tutoring systems achieved 0.76 sigma with highly constrained domain-specific systems, not general-purpose LLMs. Kestin's Harvard RCT is one study, 194 students, two physics topics, two weeks, at an elite university. Extrapolating to 49.4 million students across a full K-12 curriculum requires leaps the evidence cannot support. Edtech history is a graveyard of interventions that dazzled in pilots and collapsed at scale once they hit unmotivated students, distracted homes, spotty internet, and the thousand confounders that controlled trials wave away.

This objection is correct, and it is precisely why the $8 figure describes infrastructure cost, not guaranteed outcomes. Building the platform is the straightforward part. Proving it works across demographics, ages, subjects, and motivation levels requires years of deployment data. But the status quo is $15,633 per student for a system Bloom proved inadequate four decades ago, while AI schools charge $33,000 for implementations that cannot produce coherent lesson plans.

What Is Missing From This Analysis

This calculation assumes internet access (14% of school-age children lack it at home, per Census data, though E-Rate subsidizes school connectivity). It assumes voluntary engagement, which no evidence supports for young children without supervision. It excludes subjects where AI tutoring is unproven: creative writing, PE, social-emotional learning, lab sciences. Most critically, it does not address whether displacing human interaction in education damages something about childhood that no cost model can capture. Kestin's study measured learning and engagement. It did not measure loneliness.

The Bottom Line

A rigorous RCT proved a well-designed AI tutor outperforms a world-class active learning classroom. Open-source LLMs have collapsed inference to single-digit dollars per student per year, and the cost keeps falling. OER provides free, standards-aligned curriculum. All five components exist as production-grade open-source projects. Nobody has assembled them. If you are a parent pushing your school board for personalized learning, ask: why evaluate $10,000-per-year AI schools when inference costs $8? If you are an engineer at Google, Meta, or Apple: the pedagogical blueprint is published, the model weights are free under Apache 2.0, and the curriculum is open. Building it is not the hard part. Choosing to is.

Sources

  1. Kestin, G. et al. (2025). "AI tutoring outperforms in-class active learning: an RCT." Scientific Reports, 15, 17458. doi.org
  2. Bloom, B.S. (1984). "The 2 Sigma Problem." Educational Researcher, 13(6), 4-16. doi.org
  3. U.S. Census Bureau (2024). Public school spending per pupil FY2022: $15,633 national average. census.gov
  4. Google DeepMind (2026). Gemma 4: Our most capable open models to date. Apache 2.0. blog.google
  5. OpenRouter (2026). Gemma 4 26B MoE: $0.06/$0.33 per 1M tokens; Gemma 4 31B: $0.13/$0.38. openrouter.ai
  6. TokenCost (2026). Gemma 4 pricing: 89% AIME, 84% GPQA Diamond. tokencost.app
  7. Khan Academy (2024). Khanmigo districts pricing: $10/student/year. support.khanacademy.org
  8. Khan Academy (2024). Khanmigo for Teachers free for all U.S. teachers via Microsoft partnership. blog.khanacademy.org
  9. Google (2025). AI literacy training for 6 million U.S. educators. blog.google
  10. Institute of Education Sciences. "Are Open Educational Resources the New Textbooks?" ies.ed.gov
  11. Alpha School evaluation (2026). Independent analysis: grade D+. liveinthefuture.org
  12. VanLehn, K. (2011). "The Relative Effectiveness of Human Tutoring, Intelligent Tutoring Systems, and Other Tutoring Systems." Educational Psychologist, 46(4), 197-221.
  13. Meta Platforms (2026). FY2025 R&D expense: $57.37B. investor.fb.com