OpenAI Killed Sora, Apple Opened Siri, and Arm Built an AGI Chip: The Week AI Got Real About What Works

Most weeks in AI produce a steady drip of incremental benchmark improvements and breathless LinkedIn posts about "the future of work." This was not most weeks. Between March 23 and 29, OpenAI euthanized its most hyped consumer product. Apple conceded it can't build competitive AI alone. Arm revealed silicon built for a computing paradigm that barely exists yet. And a White House that has spent years avoiding AI legislation suddenly told Congress exactly what to do.

Individually, any of these would be a significant story. Together, they trace a single throughline: the AI industry is entering its correction phase. Not a downturn. Not a bust. A reckoning with the gap between what's theoretically possible and what actually makes economic sense to run.

1. OpenAI Kills Sora: $15 Million Per Day, $2.1 Million Total

On March 24, OpenAI shut down Sora, its AI video generation app, six months after launch. The numbers tell the story more eloquently than any press release: estimated peak inference costs of $15 million per day, against total in-app revenue of $2.1 million across its entire lifespan. Downloads had dropped 66% from their November 2025 peak. A reported $1 billion deal with Disney collapsed alongside it.

The Sora team is pivoting to robotics research under a new model codenamed "Spud." Sam Altman told employees it could "meaningfully accelerate the overall economy." It may power OpenAI's planned "super app" combining ChatGPT, Codex, and the Atlas browser. Release is expected within weeks.

Why it matters: Sora was the proof-of-concept for "AI video replaces Hollywood." Its death is the loudest signal yet that generating video at consumer scale doesn't pencil out. The inference cost wall is real, and it just claimed its first flagship product. Every competitor in generative video (Runway, Pika, Kling) should be doing back-of-envelope math today.

Why it might not: OpenAI killing a money pit to focus on higher-value targets (robotics world models) is actually rational capital allocation. Sora dying doesn't mean video AI is dead. It means consumer video AI at $20/month subscriptions can't fund $15M/day in GPU time. Enterprise pricing or specialized verticals (film pre-viz, game asset pipelines) might still work.

2. Apple Opens Siri to Gemini and Claude in iOS 27

Apple plans to make Siri interoperable with external AI assistants in iOS 27, announced March 27. Users will be able to route queries from Siri to Google Gemini, Anthropic Claude, or other AI chatbots installed through the App Store. This is separate from Apple's reported $1 billion/year deal to use Google's Gemini models to power Siri's backend intelligence.

Apple is calling the feature "Extensions." It works like the existing ChatGPT integration launched in 2024, but generalized to any AI provider willing to build the connector.

Why it matters: This is Apple admitting it lost the AI assistant war. After 15 years of Siri being a punchline, Cupertino is giving up on building the best AI and instead positioning the iPhone as a neutral platform for all AIs. That's the same strategic pivot Apple made with App Store in 2008: if you can't build every app, own the distribution layer. If Siri becomes the router, Apple takes 30% of every AI subscription funneled through it.

Why it might not: This could be vaporware announced at WWDC in June and shipping in 2027. Apple's track record on AI feature timelines is poor. Apple Intelligence was announced in June 2024, and core features were still rolling out 18 months later. The Extensions API could launch half-baked, with only ChatGPT and Gemini supported, while the deeper Siri-Gemini backend integration does the actual heavy lifting.

3. Arm Unveils AGI CPU for Agent Workloads

Arm launched a new CPU architecture specifically designed for agentic AI workloads in data centers, with Meta as the lead partner. Unlike GPUs optimized for training and batch inference, this chip targets the multi-step, tool-calling, branching workflows that define agent systems.

Why it matters: Every major AI lab is pivoting from chatbots to agents. OpenAI has Codex and the planned super app. Anthropic has Claude computer use. Google has Project Mariner. But agents have fundamentally different compute profiles than chat: they need fast single-threaded performance for sequential reasoning, low-latency memory access for tool state, and efficient branch prediction for decision trees. Designing silicon for this workload is a bet that agents aren't a fad.

Why it might not: Agent AI remains mostly demos and prototypes. Shipping custom silicon for a workload that barely exists in production is a multi-billion-dollar gamble on a timeline nobody can predict. If agents plateau at "fancy chatbot that sometimes calls APIs," this chip is a write-off.

4. White House Tells Congress to Preempt State AI Laws

On March 20, the White House released its National Policy Framework for Artificial Intelligence: Legislative Recommendations, urging Congress to establish a federal AI policy that would preempt state laws. The framework is organized around six themes: protecting children, managing AI's energy footprint, intellectual property, preventing AI censorship of lawful speech, workforce readiness, and favoring a "lighter-touch" federal approach over a new standalone AI regulator.

The preemption language is the real news. The White House explicitly calls on Congress to override state AI laws deemed "unduly burdensome," according to analysis from Morgan Lewis. This is aimed directly at California's SB 1047 and its progeny across a dozen states.

Why it matters: A patchwork of 50 different state AI regulations is the scenario every tech company fears most. Federal preemption would create a single regulatory surface. The framework's "lighter-touch" language and explicit rejection of a new AI regulatory agency signals that Washington wants to avoid the heavy compliance burden that Europe's AI Act has created.

Why it might not: Legislative recommendations from the White House are suggestions, not law. Congress has failed to pass comprehensive tech regulation for two decades (no federal privacy law, no Section 230 reform, no social media regulation). AI may follow the same pattern: urgent White House memos followed by years of committee hearings and zero legislation.

5. OpenAI Open-Sources the Responses API

On March 25, OpenAI released its Responses API as open source, making the interface that powers ChatGPT's tool-use and structured output available to any developer. This is separate from the models themselves: the API layer, not the weights.

Why it matters: OpenAI open-sourcing anything is still news, given the company's complicated relationship with the word "open." Releasing the Responses API positions it as a standard that competitors might adopt, similar to how Docker open-sourced containerization and then built a business on enterprise tooling around it. If third-party models can plug into the Responses format, OpenAI's API becomes the lingua franca of agent-tool interaction.

Why it might not: An API specification without model weights is like publishing a restaurant menu without a kitchen. The value lives in the models that consume the API, not the API itself. Anthropic, Google, and Meta all have their own tool-use protocols. Standardization attempts in AI have historically failed because the frontier moves too fast for any standard to stay relevant.

6. AWS Puts Cerebras on Bedrock: 5x Inference Speed

AWS deployed Cerebras CS-3 systems on Bedrock, its managed AI service, claiming 5x token throughput compared to GPU-based inference on equivalent models. Combined with AWS's Trainium custom chips and the existing NVIDIA partnership (1 million+ GPUs), Amazon now offers three distinct inference silicon options on a single platform.

Why it matters: Cerebras has always been the "wafer-scale chip that's fast but nobody can access." Putting it on Bedrock removes the distribution problem overnight. If the 5x throughput claim holds at scale, it changes the inference cost calculus for every production AI system on AWS. This matters most for latency-sensitive applications: real-time agents, voice interfaces, and interactive coding assistants.

Why it might not: Cerebras's throughput advantage has historically come with caveats about model size limitations and workload specificity. A 5x speedup on a cherry-picked benchmark may become a 1.5x speedup on diverse production traffic. GPU inference is also improving rapidly with NVIDIA's Dynamo 1.0 software stack, which claims up to 7x Blackwell performance improvements.

7. What Nobody Is Talking About: Alibaba's XuanTie C950

Alibaba announced the XuanTie C950, a RISC-V 5nm CPU designed for AI workloads. This is significant for a reason that has nothing to do with benchmarks: it's a Chinese company building high-performance AI compute on an open instruction set architecture that isn't subject to US export controls on x86 or Arm.

RISC-V is the Linux of chip architectures. It's open, royalty-free, and anyone can build on it. While the US has spent three years tightening export controls on NVIDIA GPUs and advanced semiconductor manufacturing equipment, China has been quietly building an alternative compute stack on architecture that Washington can't embargo.

A US congressional panel warned this week about China's growing competitive edge in AI manufacturing and robotics. The XuanTie C950 is a concrete data point for why that warning exists.

Why it matters: US AI dominance is built on a hardware moat: NVIDIA designs the chips, TSMC fabs them, and export controls keep China two generations behind. RISC-V erodes the design side of that moat. If Chinese firms can build competitive AI inference chips on open architecture, the export control strategy needs fundamental rethinking.

Why it might not: A 5nm RISC-V CPU for inference is still far from replacing NVIDIA GPUs for training. China's semiconductor manufacturing remains behind on leading-edge nodes (SMIC's 7nm process has low yields). Alibaba announcing a chip and Alibaba shipping a chip at scale are very different events. But the trajectory matters more than any single product.

Connecting the Dots

Zoom out and these seven stories tell a single narrative: the AI industry is differentiating between what works and what doesn't, faster than anyone expected.

Sora's death and Apple's Siri pivot both reflect the same lesson. Consumer AI products that burn compute without proportional revenue are getting killed. Companies that can't build frontier AI are becoming platforms instead. The economics of inference are now the binding constraint on every product decision, from video generation to voice assistants to agent frameworks.

Meanwhile, the hardware layer is fracturing. Arm builds agent-specific silicon. Cerebras offers wafer-scale inference through AWS. Alibaba develops outside the US-controlled supply chain entirely. NVIDIA remains the center of gravity, but the idea that a single architecture serves all AI workloads is dying alongside Sora.

Washington is paying attention. The White House framework's urgency around federal preemption signals that regulators recognize AI deployment is moving faster than state legislatures can respond. Whether Congress actually acts is another question. It usually doesn't.

One year ago, the dominant narrative was "everything gets smarter, everything scales, valuations only go up." This week suggests the sequel is more nuanced: some things scale, some things die, and the survivors are the ones who got the economics right before the compute bill arrived.

Limitations

Sora's $15M/day inference cost is an estimate from external analysts, not an OpenAI-confirmed figure. Apple's iOS 27 features may change before the June WWDC announcement. Arm's AGI CPU benchmarks haven't been independently verified. Alibaba's XuanTie C950 performance claims are from the company's own announcement. OpenAI's revenue figures ($25B annualized) come from news reports, not audited financials. All forward-looking statements from companies should be treated with appropriate skepticism.