California Spends $23,519 Per Student. Two-Thirds Can't Do Math.

At $66,250 per math-proficient student, the system costs more per success than a year at Stanford. AI tutoring costs $15 per student. The evidence says something more complicated.

A diverse classroom with students at tablets showing different learning paths while a teacher observes from behind

$66,250 Per Proficient Math Student

California's 2024-25 education budget allocates $23,519 per student when all funding sources are counted ($17,653 from Proposition 98 General Fund alone, per the CDE budget summary). In the 2023-24 CAASPP assessment, 35.5% of students met or exceeded the state standard in math.

Do the division that nobody in Sacramento puts on a press release: $23,519 divided by 0.355 equals roughly $66,250 per math-proficient student. More than a year's tuition at Stanford ($65,910 for 2024-25). California is spending Ivy League money to produce community-college outcomes, and the gap gets worse when you break it by demographic. For economically disadvantaged students, math proficiency was 23.4% in 2023-24, which pushes the cost per proficient student above $100,000. For students with disabilities, the number is functionally unprintable.

Nobody publishes these ratios. Districts report spending. They report proficiency rates. They do not divide one by the other, because the result indicts everyone: the standards that set the bar, the curricula that failed to clear it, the funding formulas that distributed the money, and the political structures that made accountability optional.

What Fifteen Years of Standards Got Us

Between 2010 and 2013, 46 states initially adopted the Common Core State Standards. California went early and aggressively. Today, only 41 states plus D.C. still claim alignment; four never adopted (Alaska, Texas, Virginia, Nebraska), four formally repealed (Arizona, Oklahoma, Indiana, South Carolina), and several others, including New York, quietly renamed the standards to "Next Generation Learning Standards" while keeping the content substantially the same. Political distance from a toxic brand, with no change to what teachers were expected to teach.

Common Core's premise was logical: if every state agrees on what students should know at each grade level, assessments become comparable, textbooks align, and a family moving from Ohio to Oregon won't find their kid a year behind or ahead. Standards as floor. Nobody falls through.

A decade of evidence says the floor didn't hold. Brown Center analyses and the C-SAIL research project found changes in NAEP performance associated with Common Core of plus or minus about 2 scale score points. As researcher Tom Loveless wrote in National Affairs: "Student achievement is, at best, about where it would have been if Common Core had never been adopted, if the billions of dollars spent on implementation had never been spent." EdReports.org, a curriculum-evaluation organization founded to support Common Core, revealed the mechanism inadvertently: of dozens of curricula rated highly for standards alignment, only two had empirical evidence of actually boosting student learning. Alignment, not effectiveness, was the first screen.

Standards defined what students should know. Nobody solved how to get them there.

Room 14

Consider a composite drawn from interviews with teachers across three California districts. Call her Ms. Okafor. She teaches fourth-grade math at a Title I school where 87% of students qualify for free or reduced lunch. Her class has 31 students. Seven have IEPs. Four are classified as English Language Learners. Two arrived mid-year from other districts with no transfer records. A laminated pacing guide is pinned above the whiteboard, one week ahead of where her students actually are.

On a Tuesday morning in October, her lesson plan says "multi-digit multiplication." Common Core standard 4.NBT.B.5. She knows, from informal assessment, that at least nine of her students have not mastered single-digit multiplication. Three are still shaky on place value. She has 52 minutes for math. Thirty-one kids. One standard. And a district pacing guide that says she should be introducing area models by Friday.

Ms. Okafor is not opposed to technology. She has a classroom set of Chromebooks, mostly functional. Some of her students use IXL during independent practice time, and she can see from the dashboard who is stuck. What she cannot do is be in nine places at once, re-teaching multiplication facts to a cluster by the window while the rest of the class moves forward. What the pacing guide cannot do is slow down because nine kids aren't ready. What Common Core cannot do is acknowledge that a standard and a student are not the same thing.

She knows what each kid needs. She does not have enough minutes in the day to give it to them. "I can differentiate for three groups," she says. "Maybe four on a good day. I have nine groups in that room."

Bloom's Ghost (and Its Shadow)

In 1984, educational psychologist Benjamin Bloom published a finding that has haunted the field ever since. Students who received one-on-one tutoring performed two standard deviations above students in conventional classrooms, a shift from the 50th percentile to roughly the 98th. Bloom called it "the 2 Sigma Problem": we know what works, and we can't afford to do it.

Here is what almost nobody who cites Bloom's 2 sigma mentions: it has never been independently replicated at that magnitude. Kurt VanLehn's 2011 meta-analysis of tutoring studies, published in Educational Psychologist, found the average effect of human tutoring to be approximately 0.79 standard deviations, not 2.0. Significant, but not miraculous. Roughly the 79th percentile, not the 98th. Bloom's original finding may have reflected optimal conditions (graduate students tutoring individual students with mastery-based progression) that are not replicable at scale even with unlimited funding.

Still: 0.79 sigma is enormous in education research, where effect sizes of 0.2 to 0.4 are considered meaningful. Even the conservative estimate means tutored students outperform roughly 79% of conventionally taught peers. And the cost problem remains real. One tutor per student, for 50 million American K-12 students, at $15/hour for four hours a day over 180 school days, comes to roughly $540 billion annually (50M × $15 × 4 × 180 = $540B). More than the entire current K-12 education budget. For context, the entire GDP of Sweden is roughly $600 billion.

AI companies have noticed this forty-year-old problem has a forty-year-old price tag.

What AI Tutoring Actually Shows

At Harvard, lecturer Gregory Kestin and senior lecturer Kelly Miller ran what may be the most carefully designed AI tutoring study to date. In fall 2023, 194 students in Physical Sciences 2 were split between an active learning classroom (human instructor, group work, the pedagogical gold standard) and a custom AI tutor. Not ChatGPT with a physics prompt. A purpose-built system with lesson-specific instructions calibrated by content experts who had spent years iterating on the course.

Students using the AI tutor scored a median 4.5 on post-tests versus 3.5 for the active learning group. Learning gains more than doubled. Self-reported engagement and motivation were significantly higher with AI. "I certainly didn't expect students to find the AI-powered lesson more engaging," Kestin told the Harvard Gazette.

At scale, Khan Academy reports that students using the platform for 30 minutes of additional math practice per week see greater-than-expected gains on standardized assessments. Khanmigo, their AI tutor, reached 1.5 million licensed learners across 795 U.S. districts in SY 2024-25 (52% year-over-year increase in students, 38% in districts). A Johns Hopkins randomized controlled trial of IXL Math met ESSA Tier 1 requirements, with students outperforming the control group by 10 points on the Star Math assessment. Century Tech serves 1,200 schools in England (75% state-run), though a randomized control trial of their system does not yet exist; their evidence points to a Nesta study showing 20-30% improvement in adult learners' digital skills, a different population entirely.

Here is the cost that makes venture capitalists salivate: Khanmigo's district pricing is $15 per student per year ($48 for individual users). Compare that to the $66,250 California spends per math-proficient student, or even the $23,519 it spends per student regardless of outcome. If Khanmigo moved math proficiency from 35.5% to 40% in a district, the cost per additional proficient student would be roughly $333 (the $15 fee divided by the 4.5 percentage point gain, expressed as 0.045). Compare that to the $66,250 the existing system pays. A ratio of roughly 200 to 1.

A caveat this comparison deserves: $66,250 is an average cost (total system spending per proficient student, including buildings, buses, nurses, and cafeteria workers). $333 is a marginal cost (the incremental price of one additional tool). Nobody is proposing to replace schools with Khanmigo subscriptions. But the ratio still illuminates something real about how little the existing system gets for what it spends on academic outcomes specifically, and how cheaply technology can move the needle when it works.

Read those numbers carefully. They're real, but they carry asterisks the size of school buses.

Ms. Okafor has heard the pitch. A vendor came to her district last spring. "They showed us a dashboard," she says. "Beautiful colors, student progress bars, everything a principal wants to see in a board presentation. I asked what happens when a student closes the laptop and opens TikTok. They said the system sends a nudge notification." She paused. "My nine-year-olds don't have email."

What They Don't Show

Kestin's study involved 194 students at Harvard. Not a Title I school in the Central Valley. Not English Language Learners. Harvard undergraduates who chose to take a physics course are, by definition, academically motivated, digitally literate, and operating in an environment with near-unlimited support. Extrapolating their results to a classroom like Ms. Okafor's requires assumptions the researchers themselves would not endorse.

Khan Academy's "30 minutes per week" benchmark sounds modest until you account for prerequisites: a functioning device, reliable internet, a quiet space, a student motivated enough to do optional practice, and ideally someone nudging them toward it. In California, where 65% of tested students qualify as socioeconomically disadvantaged, each of those prerequisites is a filter that screens out the students who need help most.

And the optimization question nobody selling these systems is incentivized to answer: what does the AI optimize for? Engagement metrics look great when the system adjusts difficulty to keep students in a flow state. But flow states and learning are not the same thing. A student who feels challenged but successful might be doing slightly-too-easy problems at a pace that feels rewarding. Drill-and-kill wrapped in adaptive difficulty is still drill-and-kill. It just has better retention metrics.

The Case for Standardization (Honestly)

Before advocating for personalization, an article like this owes the counterargument its full weight.

Standards provide an equity floor. Without Common Core, a student in Mississippi was held to different expectations than a student in Massachusetts. That might sound like flexibility; in practice, it meant low-income states could define "proficient" at levels that high-income states considered below grade level. Common Core forced every state to at least agree on what "proficient" meant. Even the studies showing minimal NAEP impact found that Common Core states raised their proficiency bars more than non-adopting states, meaning they made it harder for students to score as proficient. Raising the bar may not have improved learning, but it did make the problem visible.

Standardization also protects against algorithmic sorting. Personalized AI systems must classify students: this one needs remediation, this one is ready for acceleration. Every classification is a prediction about a child's capacity, made by a system trained on historical data that reflects existing inequities. A Black student in a low-income district, statistically more likely to be flagged for remediation, might receive a personalized path that is actually a narrower one: more drill, less exploration, fewer challenging problems. Standardization, for all its clumsiness, at least promises every child the same material. Personalization promises each child what the algorithm thinks they can handle. Those are not the same thing.

And there is the argument that standardization's defenders rarely articulate but parents understand intuitively: school is not only about academics. A classroom where 31 kids work through the same lesson together is building something no AI tutor can replicate. Collaboration, disagreement, the social negotiation of understanding, the experience of watching a classmate explain something in a way the teacher didn't think of. Pandemic-era remote learning proved, brutally, what happens when you remove the social infrastructure of school. Mental health crises. Learning loss. Isolation. A personalized AI tutor, no matter how effective at teaching fractions, does not teach a child how to sit next to someone they disagree with and still learn from them.

What AI Means for 3.7 Million Teachers

An article about AI in education that avoids the employment question is not being honest. Approximately 3.7 million Americans work as K-12 teachers. Millions more work as paraprofessionals, tutors, instructional aides, and support staff.

If AI tutoring systems demonstrate efficacy comparable to human tutoring, the economic logic is blunt and uncomfortable. A school district facing budget cuts will not voluntarily maintain a 15:1 student-teacher ratio if a $15/student/year AI system can handle remediation. The first cuts will be in precisely the roles that serve the most vulnerable students: reading specialists, math interventionists, after-school tutoring staff. Not because administrators are callous, but because budgets are zero-sum and school boards answer to taxpayers.

Proponents argue AI will "free teachers to focus on higher-order skills." This framing treats teachers as inputs in an optimization problem rather than professionals with expertise, relationships, and livelihoods. Ms. Okafor doesn't need to be "freed" from teaching multiplication. She needs fewer students in her class, more planning time, and a paraprofessional to help with the nine kids who aren't ready. AI might provide diagnostic data that helps her prioritize. It will not sit with Marcus after class and figure out that the real problem is he can't see the whiteboard and his family can't afford glasses. It will not notice that he's been quieter since October, or that he only struggles on days he comes in without breakfast.

Who Owns a Child's Learning Profile?

Every adaptive learning system builds a model of each student: what they know, what they struggle with, how fast they learn, when they give up, what motivates them. Khanmigo tracks every math problem attempted. IXL's diagnostic system maps each student's knowledge state across hundreds of skills. For children who cannot legally consent to data collection, the privacy implications are significant and largely unresolved.

FERPA (1974) governs student education records but was written decades before AI tutoring existed. COPPA (1998) restricts data collection for children under 13 but has enforcement gaps wide enough to drive a school bus through. When an ed-tech company gets acquired, student data goes with it. InBloom, the $100 million Gates Foundation-backed student data platform, collapsed in 2014 after parent protests over data sharing. It promised to delete all student records on shutdown, but the broader lesson endured: millions of students' learning data had been aggregated, shared across state lines, and exposed to security vulnerabilities before anyone asked their parents. A child's complete learning profile, including every problem they got wrong, every concept they struggled with, every moment they disengaged, can become an asset on a balance sheet long before a company fails.

No current federal regulation specifically addresses AI-generated learner models for minors. California's own Student Online Personal Information Protection Act (SOPIPA) is among the strongest in the nation and still doesn't address what happens when an AI system infers a learning disability from behavioral patterns before any human has made that diagnosis. Who sees that inference? Who is liable if it's wrong? If a school uses it to place a child in a remedial track, has an algorithm just made a consequential decision about a child's educational trajectory without due process?

What This Article Doesn't Prove

Honesty requires boundaries. Here is what we don't know and this article cannot establish:

We don't know whether the $66,250 cost-per-proficient-student calculation is fair, because per-pupil spending includes services (transportation, meals, counseling, special education) that serve functions beyond academic proficiency. A student who scores "standard not met" in math but received school-provided meals, mental health support, and a safe environment for seven hours a day received value that doesn't appear in proficiency rates. Using proficiency as the sole denominator overstates the cost of academic outcomes by ignoring non-academic ones.

We don't know whether AI tutoring at Harvard translates to K-12 public schools. No one does. The study hasn't been run. We have one rigorous study of 194 self-selected college students and a lot of vendor-funded efficacy claims. Khan Academy's data is promising but not from an independent RCT, and usage is voluntary, which introduces selection bias (students who choose to use the platform 30 minutes a week may already be more motivated than those who don't). Our 200:1 cost ratio ($66,250 vs. $333) rests on a hypothetical 4.5 percentage point proficiency gain that no Khanmigo deployment has yet demonstrated at scale. If the real gain is 1 point, the ratio drops to 44:1. If it's zero, it's infinite.

We don't know whether personalization helps or hurts the lowest-performing students. Intuitively, adapting to a student's level should help students who are behind. Empirically, adaptive systems trained on population-level data may replicate the same demographic patterns that created the gaps. Until someone runs a rigorous trial specifically measuring outcomes for the bottom quartile in a high-poverty school, the equity case for AI personalization is theoretical.

And we don't know whether the Bloom result, even at VanLehn's more conservative 0.79 sigma, transfers to AI tutoring. Human tutors adjust everything: tone, body language, emotional support, pacing, the decision to abandon a topic and come back tomorrow. Current AI systems personalize one, maybe two variables: difficulty and sequencing. Genuine personalization would mean different students learning the same concept through entirely different approaches. No commercial AI tutor does this yet.

An Architecture That Might Work

If you wanted to take this seriously, not as a product launch but as a policy commitment, here is what it would require:

Universal broadband and device access, which is not achieved in 2026 and not close. Independent RCTs in diverse school contexts: Title I schools, rural districts, schools with high ELL populations, not vendor-funded, not at Harvard. Algorithmic transparency, meaning parents and educators can see what the system optimizes for and override it.

A data privacy framework designed for children, not adapted from adult consumer protection law. Teacher integration that treats AI as a tool in the hands of a professional, not a replacement for one. And a funding model that doesn't create years of advantage for wealthy districts that adopt first.

Districts already showing the fastest gains are doing something simpler. Compton Unified, highlighted in the CDE's own 2023-24 release, invested in in-class human tutors, expanded after-school tutoring, and teacher professional development. Fallbrook Union Elementary credited counselors, social workers, and behavior technicians. Benicia Unified funded districtwide professional learning and math instructional coaches. Real people, in buildings, working with children. Not a single district credited AI.

Maybe AI changes this in five years. Kestin's data is real. Even VanLehn's conservative 0.79 sigma is enormous. But a study of 194 Harvard students is not a policy prescription for 50 million. Until someone runs the trial in Compton, not Cambridge, the honest answer is: we don't know.

Ms. Okafor knows what her nine students need. She doesn't have enough minutes in the day to give it to them.

Whether an AI tutor could help her, or whether it would become one more thing she has to manage while the pacing guide marches on, depends on choices that haven't been made yet. By people who may never have set foot in Room 14.