It Costs $26.5 Million a Year to Train a Humanoid Robot's Brain. The Workers Doing It Earn $30/Hour — the Same as the Workers They're Replacing.
Inside the teleoperation data factories that are the real bottleneck of the humanoid robotics industry, where humans wearing VR headsets generate training data at $0.51–$2.55 per action, which is 66,000 times more expensive per unit than training a language model.
June 20, 2026
In a converted warehouse in Shanghai's Pudong district, 200 people show up for 17-hour shifts wearing VR headsets and exoskeleton gloves. They are not gamers. They are not warehouse workers, though the building used to serve that purpose. They are teleoperators at AgiBot, one of China's largest humanoid robot companies, and they spend their days doing something profoundly strange: performing mundane tasks while tethered to machines that record every twitch of their fingers, every shift of their weight, every split-second correction when a cup starts to tilt. Each operator generates between 150 and 250 usable data points per day. The facility collectively produces 30,000 to 50,000 behavioral recordings daily, according to a KrASIA report and multiple industry analyses. AgiBot has accumulated more than 4 million real-robot data points since late 2024.
This is a data factory for physical movement, the kind of facility that barely existed three years ago and now sits at the center of a $4–5 billion annual investment wave that the humanoid robotics industry absorbed in 2025 alone, according to a June 2026 GlobeNewsWire market report, against roughly $500 million in actual revenue. China accounted for more than 80% of all humanoid robot installations that year. AgiBot shipped 10,000 units by March 2026, doubling its 5,100-unit total from CES in January in just three months, according to eWeek, and the company is now planning a Hong Kong IPO at a $5.1–6.4 billion valuation. But behind those production numbers sits a hidden cost that nobody in the industry seems willing to quantify publicly: training a humanoid robot to do anything useful requires a staggering amount of real-world behavioral data, and that data must be generated by human beings who physically perform the tasks while connected to the machine.
The $2.55 Data Point
Let's run the numbers that the industry isn't publishing.
AgiBot's Shanghai data factory employs 200 operators across 17-hour daily shifts, as reported by Reuters, and the wage structure reveals the industry's most underappreciated cost advantage. In China, machine operators in Shanghai earn approximately ¥32/hour ($4.40 USD) based on SalaryExpert 2026 data, though robot teleoperation demands slightly higher skills than basic machine operation, including VR dexterity, spatial reasoning, the patience to repeat a single grasping motion forty times before the system captures a clean demonstration, all suggesting a conservative blended rate of $6/hour. At 260 working days per year:
AgiBot's annual data collection labor cost: 200 operators × $6/hr × 17 hrs/day × 260 days = $5.3 million.
Now consider what the same operation would cost in the United States, where Figure AI posts teleoperator positions at $25–35/hour and Indeed listings from other robotics firms cluster around $25–30, putting a reasonable midpoint at $30/hour:
US-equivalent annual cost: 200 operators × $30/hr × 17 hrs/day × 260 days = $26.5 million.
Output in both cases is approximately 40,000 data points per day, or 10.4 million per year, putting the cost per data point at $0.51 in China and $2.55 in the US.
| Metric | China (AgiBot) | US Equivalent |
|---|---|---|
| Operators | 200 | 200 |
| Hourly wage (est.) | $6.00 | $30.00 |
| Daily shift length | 17 hours | 17 hours |
| Working days/year | 260 | 260 |
| Annual labor cost | $5.3M | $26.5M |
| Output (data points/day) | ~40,000 | ~40,000 |
| Cost per data point | $0.51 | $2.55 |
For comparison, training GPT-4 cost an estimated $100 million on approximately 13 trillion tokens, which works out to $0.0000077 per token, meaning a single robot data point, one recorded sequence of a human reaching, grasping, and placing an object, costs between 66,000 and 330,000 times more than one unit of language model training data. The comparison is deliberately imprecise: tokens and robotic action sequences are categorically different objects, in the same way that a pixel and a paragraph are categorically different objects, but the magnitude matters because it explains why the biggest bottleneck in robotics isn't hardware, isn't compute, and isn't algorithms, but rather the per-unit cost of generating physical-world data through human demonstration.
The $30/Hour Coincidence
Here is the number that stopped me cold.
According to the Bureau of Labor Statistics, US manufacturing production and nonsupervisory workers earned an average of $30.19 per hour in May 2026, while team assemblers, the category that best describes the people doing repetitive assembly on a factory line, earn a median of $22.19/hour and a mean of $23.30.
Figure AI advertises humanoid robot teleoperator positions at $25–35/hour, 1X Technologies posts similar roles at $22–31/hour, and the midpoint across all available job listings lands at roughly $30/hour, within a dollar of the manufacturing production worker average, which means the workers training humanoid robots earn almost exactly the same wage as the factory workers those robots are designed to replace.
This is not a coincidence, and the reason is structural: both jobs demand the same physical attributes. Manual dexterity, hand-eye coordination, tolerance for repetitive precision tasks, the ability to stand and move for eight-hour shifts, capacity to lift fifty pounds. Figure AI's job postings literally specify "enjoys repetitive, precision-focused work" and "thrives in fast-paced, metrics-driven environments," language that, read aloud without context, describes an automotive assembly worker applying for a position at a company building robots whose explicit purpose is to make automotive assembly workers unnecessary.
What the Scaling Laws Say
A landmark paper presented at ICLR 2025 titled "Data Scaling Laws in Imitation Learning for Robotic Manipulation" offers the first rigorous look at how much data robots actually need, and the findings simultaneously encourage and terrify anyone trying to build a business case around humanoid automation. The researchers collected more than 40,000 demonstrations and executed 15,000+ real-world robot rollouts under controlled conditions, arriving at two conclusions that reshape the economics of the entire data factory model.
Generalization performance, they found, follows a power-law relationship with the number of environments and objects the robot encounters during training, and the diversity of those settings matters far more than the sheer volume of repetitions in any single environment. "Once the number of demonstrations per environment or object reaches a certain threshold, additional demonstrations have minimal effect," the paper states, which means that the path to capable robots runs not through more data points in the same factory, but through more factories, more kitchens, more hospitals, more warehouses , each one requiring its own data collection team, its own teleoperation rigs, its own months of labor.
A second, more seductive finding offers a counterpoint: an efficient data collection strategy can be remarkably compact for narrow tasks, with four collectors working for a single afternoon generating enough data for two manipulation tasks to achieve approximately 90% success rates in entirely novel environments with unseen objects. That sounds cheap right up until you realize it covers two tasks out of the hundreds a commercially viable humanoid robot needs, each of which must be demonstrated across dozens of diverse environments if the power-law scaling is to hold.
From Data Factory to Production Line
Sanctuary AI demonstrated what sits at the other end of this data pipeline on June 17, 2026, when the Vancouver-based company announced that its Physical AI system achieved a 99.5%+ task success rate on a wire-plugging operation at a global Tier 1 automotive supplier, completing each cycle in 2.54 seconds to match live production benchmarks. Wire harness assembly is one of the hardest tasks to automate in automotive manufacturing because wires are flexible, they bend unpredictably, and connectors require sub-millimeter alignment under force , the kind of dexterous, contact-rich manipulation that humans handle instinctively but robots historically cannot perform at production speed.
Sanctuary's approach is notable because the company deliberately separated the intelligence from the humanoid form factor, deploying its Carbon AI model on existing commercial robotic arms rather than waiting for humanoid hardware to reach mass commercialization, thereby achieving production-grade reliability on equipment factories already own. "Physical AI adoption is gated by AI that meets both performance and cycle time requirements," said Olivia Norton, co-founder and CTO, a statement that reads differently when you know the data cost behind reaching that performance level.
Physical Intelligence, the San Francisco startup behind the π0 foundation model, demonstrates the broad-generalization approach at the opposite end of the spectrum. Their π0 model was pre-trained on more than 10,000 hours of robot data, with fine-tuning for new tasks requiring only 1–20 additional hours of demonstrations, and in an experiment covered by TechCrunch, the model's successor π0.7 successfully operated an air fryer it had essentially never seen in training ; the researchers found only two marginally relevant episodes in the entire training dataset, yet the model had somehow synthesized fragments of unrelated experience into functional competence with the unfamiliar appliance.
Encouraging results, all of them, and also expensive to replicate at the foundation layer. Those 10,000+ hours of pre-training data represent an enormous upfront investment in teleoperation labor, and even π0.7 still needed human coaching to jump from 5% to 95% success on the air fryer task, which means the gap between "two episodes in the training set were enough for a research demo" and "99.5% reliability on a production line" is precisely where the data factory's labor costs accumulate.
China's Structural Advantage
The numbers explain why China dominates, and they also reveal why closing the gap will be extraordinarily difficult for Western competitors. With a 5:1 labor cost advantage in data collection, Chinese firms can generate the same training data for one-fifth the price, which means AgiBot's $5.3 million annual data factory in Shanghai would cost $26.5 million at US rates, a differential that compounds with scale, because AgiBot shipped 10,000 robots in 15 months while its closest Western competitor, Agility Robotics, operates the only paid commercial humanoid deployment among Western companies at a single GXO-operated warehouse. Figure AI's 11-month pilot program at BMW concluded without transitioning to a paid commercial relationship, a fact that reads less like a setback and more like a consequence of the data cost asymmetry.
State Grid Corporation of China announced in its 2026 Embodied Intelligence Development Plan a centralized procurement of approximately 8,500 embodied intelligence devices , including 500 humanoid robots, 3,000 dual-arm inspection robots, and 5,000 quadruped robot dogs, for a total budget of ¥5.8 billion ($800 million), as reported by OFweek Robotics. That single government contract exceeds the entire global humanoid robot revenue for 2025, and it names AgiBot, UBTECH, Unitree, DEEP Robotics, and Fourier as target manufacturers, all of which operate data collection facilities in China at Chinese labor rates.
Data factory cost structures also explain a pattern visible in the funding numbers: the humanoid robotics sector operated at a 4–5:1 funding-to-revenue ratio in 2025, absorbing $4–5 billion in investment against roughly $500 million in revenue, and while part of that gap covers hardware development and headcount, a significant and growing fraction is the cost of generating enough real-world training data to make the hardware useful , a cost that is ongoing, not one-time, because every new task category, every new environment, every new object type requires its own data collection campaign.
Limitations
Several caveats constrain the precision of these calculations, starting with the fact that Chinese teleoperator wages are estimated from broader machine operator salary data in Shanghai rather than robot-specific teleoperation postings, which do not appear in English-language salary databases. AgiBot's figure of 30,000–50,000 data points per day originates from a LinkedIn article citing KrASIA, and the definition of "data point" remains ambiguous ; it likely refers to complete demonstration episodes rather than individual action frames, which would be orders of magnitude more numerous and would make the cost per action even lower. The cost-per-data-point calculation assumes full utilization of all 200 operators during working hours, though real efficiency is certainly lower, which means the actual cost per useful data point is higher than the figures reported here. The GPT-4 comparison deliberately juxtaposes dissimilar units to illustrate the magnitude gap, not to claim mathematical equivalence between a text token and a robotic action sequence comprising synchronized joint positions, force readings, camera frames, and task-completion metadata.
The Strongest Case Against This Analysis
The most compelling counterargument is that data factory economics are temporary, destined to collapse as foundation models improve and synthetic data matures. Physical Intelligence's π0.7 demonstrates that robots can generalize from dramatically less data than brute-force scaling would require, and synthetic data generation through simulation platforms like SoftMimicGen and NVIDIA Isaac Sim could eventually replace human teleoperators entirely , a trajectory that, if realized, would turn the entire data factory model into a transitional phase whose costs evaporate within a few years.
That argument has been the argument for a decade, and the evaporation has not arrived. Even π0.7, the most impressive generalist model in robotics today, required a foundation of 10,000+ hours of real-world pre-training data before it could generalize at all, and synthetic data consistently underperforms real-world demonstrations for contact-rich manipulation tasks , a limitation the SoftMimicGen paper at RSS 2026 explicitly acknowledges by framing its contribution as addressing "a crucial gap" in the synthetic data paradigm for deformable objects. Sanctuary AI's 99.5% production-line success rate was achieved with real-world training, not simulation, and the ICLR scaling laws paper demonstrates that diversity of real environments drives generalization, which compounds the cost rather than reducing it because it means more data collection sites, not fewer, and each new site represents another team of teleoperators wearing VR headsets for $30 an hour.
The Bottom Line
Every humanoid robot that performs a useful task in the real world carries a hidden labor cost in its past: the humans who wore VR headsets and performed that task hundreds or thousands of times so the machine could learn. That labor costs $0.51 per data point in Shanghai and $2.55 in San Francisco, and a single commercially viable robot needs millions of such data points across hundreds of task types. The workers generating this data earn $30/hour — the same rate as the factory workers those robots will replace. Anyone with a 401(k) allocation in manufacturing should understand what this industry is building. Anyone investing in humanoid robotics should understand what they are actually paying for: not hardware, not software, but an army of humans doing the work first so machines can copy them.
What You Can Do
If you work in manufacturing, start tracking which of your tasks involve repetitive manipulation of objects on flat surfaces, because these are the tasks robotics companies are collecting data on right now, and they will be automatable first. If you invest in humanoid robotics companies, ask about their data collection costs as a percentage of operating expenses; companies that report high R&D spending without specifying the data collection component are obscuring their true cost structure. If you are an engineer or an operator between jobs, robot teleoperation is a real and growing employment category paying $25–35/hour in the US, so search "humanoid robot operator" on Indeed or Greenhouse.