Samsung Added Moon Craters That Weren't There. Your Phone Does Something Similar 4 Trillion Times Per Shutter Press.
Every flagship ISP now runs dozens of neural network passes between photon and JPEG. Apple processes at the raw sensor level. Google merges 15 frames you never took. Samsung hallucinated lunar detail from a training set. And NVIDIA's DLSS 5 just proved that gamers will reject AI-generated visuals before photographers even notice. The line between capture and fabrication is dissolving. Nobody agreed where it was.
Four trillion. That's how many distinct ISP operations Apple's A18 Pro runs between the moment a photon strikes the iPhone 16 Pro's sensor and the moment a JPEG lands in your camera roll. Apple's Photonic Engine captures nine frames simultaneously โ four short exposures for motion freezing, four standard frames for detail, one long exposure for light gathering โ then runs pixel-by-pixel machine learning analysis on the uncompressed raw sensor data, before demosaicing, before noise reduction, before tone mapping. It runs Deep Fusion texture optimization on data the human eye will never see in its original form. By the time you look at the "photo," it has been reconstructed by machine learning from photonic fragments that no longer exist anywhere in memory.
This is not post-processing. This is not a filter. This is a fundamental re-architecture of what the word "photograph" means, happening at the silicon level, inside chips that every major semiconductor company on Earth is now racing to redesign. And Samsung's moon scandal was not the exception. It was the preview.
The Silicon Arms Race
There are exactly seven companies on Earth designing ISP/DSP pipelines that matter for consumer photography. Each has made radically different architectural bets:
ISP/DSP Pipeline Comparison โ 2026
| Company | Chip / ISP | AI Compute | Key Innovation | Designs HW? |
| Apple | A18 Pro / Photonic Engine | 35 TOPS (Neural Engine) | ML at raw sensor stage, before compression | Yes (custom) |
| Tensor G4 / Custom ISP + TPU | ~30 TOPS (est.) | HDR+ multi-frame merge (15 frames), Night Sight | Yes (co-design w/ Samsung LSI) | |
| Qualcomm | Snapdragon 8 Elite / Spectra ISP | 73 TOPS (Hexagon NPU) | Cognitive ISP: real-time semantic segmentation during capture, 4.3 Gpixel/s | Yes (IP licensor) |
| Samsung | Exynos 2400 / ISOCELL | ~35 TOPS | Scene Optimizer: AI-trained detail enhancement per object class | Yes (sensors + SoC) |
| MediaTek | Dimensity 9400 / Imagiq 1090 | ~46 TOPS | AI-NR (noise reduction) + multi-frame HDR fusion | Yes (SoC) |
| GoPro | GP3 (5nm, Q2 2026) | Dedicated AI NPU (TOPS TBD) | Real-time scene recognition, 2x GP2 pixel processing | Yes (custom SoC) |
| Sony | IMX500 (AITRIOS) | On-sensor NPU | AI inference on the CMOS die itself, zero-latency classification | Yes (world's #1 image sensor) |
Apple's architectural bet is the most radical. The Photonic Engine processes at the raw data stage. Not raw-ish. Not "early in the pipeline." At the literal photon-to-electron conversion output, before the Bayer filter pattern has been demosaiced into RGB. According to Apple's technical documentation, the Camera Interface on A18 Pro moves sensor data to the Neural Engine faster than any previous generation, enabling Deep Fusion to run on uncompressed data that would previously have been too large to process in real time. The result is texture preservation in shadows and highlight roll-off that no pixel-level algorithm could produce from a compressed JPEG.
Google's approach is architecturally opposite but equally aggressive. HDR+ captures 15 underexposed frames in rapid succession when you press the shutter. You don't see 14 of them. They're merged through a multi-frame fusion algorithm that aligns, weights, and combines pixel data across the burst stack to produce a single frame with more dynamic range and less noise than any individual capture could achieve. Night Sight extends this to exposures up to 6 seconds, handheld, by using the gyroscope to compensate for hand shake across the frame stack. The "photo" is a statistical composite of moments you never experienced simultaneously.
Qualcomm's Spectra ISP on the Snapdragon 8 Elite takes yet another approach: cognitive processing. The triple 18-bit ISP processes 4.3 gigapixels per second while simultaneously running semantic segmentation. The camera doesn't just see pixels. It identifies objects โ face, sky, foliage, fabric โ and applies different processing algorithms to each class in real time. Your skin gets one neural network. The sky behind you gets a different one. The flowers in the foreground get a third. One "photo," three simultaneous AI models.
The Moon Scandal: A Case Study in Computational Fabrication
In March 2023, Reddit user u/ibreakphotos ran an experiment that should have ended Samsung's marketing career. They downloaded a high-resolution image of the moon, downscaled it to 170ร170 pixels, applied a Gaussian blur that destroyed all detail, displayed it on a monitor in a dark room, and photographed it with a Samsung Galaxy S23 Ultra at 100x Space Zoom.
The resulting photo had craters.
Not blurry approximations. Actual, sharp, correctly-positioned lunar craters that did not exist in the source image. The detail was physically impossible to resolve from the input โ it had been deliberately destroyed. Samsung's response, published via their newsroom, confirmed what happened: Scene Optimizer uses "an AI deep learning model" trained on hundreds of moon images to recognize the moon as an object and apply "AI-based detail enhancement." The model identifies the lunar disc, determines its phase, and composites learned texture onto the capture.
Samsung's own technical explanation is revealing in what it does and doesn't say. The process: Scene Optimizer confirms the presence of a moon. Multi-frame processing synthesizes 10+ captures. Then the "deep learning-based AI detail enhancement engine" adds detail "even further." The word "even" is doing a lot of work. It means: after the actual optical and computational recovery is complete, the AI adds more. Detail that was never captured. Detail from a training set.
Samsung users can disable this by turning off Scene Optimizer. Most don't know it exists. And here's the part that kept me up at night: the moon is the only object Samsung admits to enhancing this way. How many others has the Scene Optimizer been trained on that Samsung hasn't disclosed?
What Reviewers Actually Found
The marketing claims are impressive. HyperSmooth 6.0. FlowState. RockSteady 3.0. PureVideo. Cognitive ISP. But I spent two weeks reading every side-by-side comparison, every mountain bike handlebar test, every motorcycle vibration shootout from independent reviewers. Here's the honest picture.
Stabilization: GoPro HyperSmooth 6.0 vs. Insta360 FlowState vs. DJI RockSteady 3.0
Jordan Hetrick's detailed comparison โ cameras mounted to the same helmet, same trail, same conditions โ found that HyperSmooth's predictive model uses an NPU to anticipate motion before it happens, essentially pre-steering the image plane. DJI's RockSteady 3.0 uses dual IMUs fused with optical flow analysis, making it better at high-frequency micro-vibrations โ the rapid chatter from backpack suspension, engine vibration, rough pavement. Insta360's FlowState produces comparable results but through a completely different architecture: the 360ยฐ cameras capture a full sphere, then reframe in post, giving mathematically unlimited stabilization headroom.
Real-world motorcycle testing found GoPro maintains "a slight edge in extreme movement scenarios like mountain biking at speed," while DJI's HorizonSteady keeps the horizon level during camera rotation in ways GoPro cannot match without an accessory. The verdict from multiple independent reviewers: "Both produce footage that's hard to tell apart in a blind test at 4K."
The honest answer? For 90% of users, the stabilization differences are invisible. All three systems produce gimbal-equivalent results in normal conditions. The differences only emerge in specific edge cases:
Where Each Stabilization System Breaks Down
| Scenario | Winner | Why |
| High-speed MTB | GoPro | Predictive NPU anticipates trajectory, pre-corrects |
| Engine vibration (motorcycle) | DJI | Dual-IMU optical flow detects micro-vibrations faster |
| Horizon lock during tumbles | DJI | HorizonSteady maintains level beyond ยฑ45ยฐ |
| Low light | Insta360 | 1/1.3" sensor (vs GoPro 1/1.9") + PureVideo AI denoising |
| Post-production flexibility | Insta360 (360ยฐ cams) | Full sphere = infinite reframing = infinite stabilization |
| Battery under load | DJI | ~4 hrs (Action 5 Pro) vs ~90 min (GoPro Hero 13) |
A one-year long-term review of the Hero 13 found HyperSmooth 6.0 "produces gimbal-like smoothness during intense activities" but noted that "stabilization struggles in low light, where slight shakiness becomes noticeable." The same review confirmed the 1,900mAh Enduro battery delivers 1.6-1.8 hours at 4K/30fps โ roughly half what DJI's Action 5 Pro achieves.
The TS2 Tech three-way shootout crystallized it: "GoPro leads in resolution and ecosystem. DJI leads in battery life and waterproofing [20m vs 10m]. Both produce footage that's hard to tell apart in a blind test at 4K."
The Diffusion Model at the Gate
Everything described above โ multi-frame fusion, semantic segmentation, per-object neural networks, predictive stabilization โ uses discriminative AI. Models trained to recognize, classify, and enhance. The next wave is generative. And it's already here.
On May 29, 2025, researchers published DarkDiff: a framework that retasks a pre-trained Stable Diffusion model for camera ISP low-light enhancement. Instead of training a diffusion model from scratch on low-light data, DarkDiff takes an existing generative model โ the same architecture used to create AI art โ and repurposes it to enhance raw sensor data from extreme low-light captures. The paper demonstrates that it "outperforms the state-of-the-art in perceptual quality across three challenging low-light raw image benchmarks." The authors' affiliations include institutions with deep ties to Apple's computational photography research.
The implications are severe. A diffusion model doesn't just denoise โ it generates plausible detail. When it enhances a shadow area, it's not recovering photons that were captured. It's predicting what photons should have been there, based on patterns learned from millions of training images. Samsung's moon craters were an early, crude version of this. DarkDiff is the sophisticated one.
Qualcomm demonstrated on-device Stable Diffusion running in under one second on the Snapdragon 8 Gen 3's 73-TOPS Hexagon NPU. The 8 Elite can generate images from text prompts โ 512ร512, 20 inference steps โ without touching a cloud server. This means the hardware to run a generative model inside the ISP pipeline already exists in hundreds of millions of pockets. Nobody has shipped it in the capture pipeline yet. The question is when, not whether.
The DLSS Parallel: Gamers Got There First
If you want to see how the photography world will react to generative enhancement, look at what happened to NVIDIA in March 2026.
DLSS 5 introduced neural rendering โ AI that enhances game visuals in real time, adding photorealistic lighting and material detail that the game engine didn't actually render. Early demos showed DLSS 5 dramatically altering the appearance of existing games, giving them a more photorealistic look. The backlash was immediate and brutal.
On Reddit: "paves over the original art direction." On X: "AI filter." On gaming forums: "AI slop." Users posted side-by-side comparisons showing facial features subtly changing, lighting appearing overly harsh, materials looking generic. The complaint wasn't that it looked worse โ it often looked more technically impressive. The complaint was that it wasn't real. The game's artists didn't make those choices. An AI did.
Bethesda and NVIDIA were forced to issue a joint statement: "Developers will retain full, detailed artistic control over DLSS 5's effects." The effect is adjustable and optional. But the damage was done.
Photography hasn't had its DLSS moment yet. But when DarkDiff or something like it ships in a consumer ISP, and a photographer realizes that the beautiful shadow detail in their low-light capture was hallucinated by a diffusion model โ not recovered from photons โ the same backlash will follow. And unlike gaming, photography has institutions with rules.
The Institutions Drawing Lines
The Associated Press updated its standards in 2023: "We do not alter any elements of our photos, video or audio. Therefore, we do not allow the use of generative AI to add or subtract any elements." Reuters maintains the same position. The World Press Photo Foundation disqualified a Pulitzer Prize-winning photographer in April 2023 for AI manipulation.
These rules were written for the editing stage โ the moment after capture when a human decides to alter an image. They were not written for the ISP. Nobody at the AP asked whether the photon-to-JPEG pipeline inside a Canon or Nikon constitutes "alteration," because for a century it didn't. A chemical emulsion or a CCD array records light. Processing develops or interpolates it. The output is a photograph.
But if a diffusion model runs inside the ISP, the "photograph" contains hallucinated detail that was never captured as light. Is that image AP-publishable? Nobody has a policy for this because nobody had to think about it until Samsung put moon craters into a blurred circle and Apple's researchers published a framework for doing the same thing to shadows.
In-Sensor Intelligence: The Next Frontier
Sony's IMX500 hints at where this ends. It's the world's first image sensor with an integrated AI processor directly on the CMOS die. The neural network runs on-sensor โ zero-latency classification, object detection, and scene analysis happening at the point of photon capture, before data ever leaves the sensor package. Currently deployed in industrial applications (barcode scanning, robotics, automated inspection), the architecture's implications for consumer photography are staggering: AI processing at the speed of light capture, with no data bus bottleneck between sensor and SoC.
GoPro's GP3, announced March 2026, represents the action camera industry's bet on this future. The 5nm SoC with dedicated AI NPU delivers "more than 2x pixel processing power compared to GP2" with "real-time scene recognition, subject detection, and automatic setting adjustments." GoPro's CEO called 2026 "the year of GP3" โ the company's first serious silicon investment since its founding. Sample images already show dramatic improvements in low-light and overall image quality. The GP3 will debut in the Hero14 Black and across 360 cameras, vlogging devices, and what GoPro describes as "ultra-premium compact cinema-grade cameras."
Insta360's current Ace Pro 2 already packs a 5nm AI chip delivering 11 TOPS of compute โ remarkable for a 179-gram action camera. PureVideo runs real-time AI denoising on the NPU. PureShot does the same for stills. The Leica-branded 1/1.3-inch sensor with f/2.6 lens outperforms GoPro's smaller 1/1.9-inch sensor in low light, and reviewers consistently confirm: "Among best nighttime footage; users praise its noise-free night video." The architectural lesson: Insta360 bet on sensor size + AI processing while GoPro bet on optical modularity (HB-Series lenses). Both bets are paying off in different use cases.
Vivo's Dedicated ISP: The Chinese Approach
While Western companies integrate AI into general-purpose SoCs, Vivo designed a standalone ISP chip โ the V3 โ dedicated entirely to image processing. Paired with Qualcomm's or MediaTek's SoC, the V3 handles 4K cinematic portrait video, AI-powered night mode, and real-time subject detection independently of the main processor. Xiaomi's HyperAI pipeline takes a similar philosophy: overlay proprietary AI processing on top of Qualcomm's Spectra ISP, creating a dual-layer computational photography stack.
The Chinese smartphone market has become the most aggressive computational photography battleground on Earth, with Vivo, Xiaomi, Oppo, and Honor each fielding custom AI photography pipelines that push further than Apple or Google in terms of automated enhancement. The results are polarizing: technically impressive, often unrealistically flattering, and occasionally veering into territory that makes Samsung's moon craters look restrained.
Limitations
This analysis has significant gaps. TOPS ratings are not directly comparable across architectures โ Apple's Neural Engine at 35 TOPS and Qualcomm's Hexagon NPU at 73 TOPS run fundamentally different workloads with different precision modes. The stabilization comparisons aggregate independent reviewer findings but no single test controlled all variables (same trail, same mount, same lighting, all three cameras simultaneously). Samsung's Scene Optimizer training data and full object class list remain proprietary โ the moon is the only confirmed case, but the system is designed for general object recognition. DarkDiff has not been independently replicated outside the original research team. GoPro's GP3 performance claims are pre-launch marketing from a March 2026 announcement, not independent benchmarks.
The Strongest Counterargument
The case for aggressive computational photography is that it democratized quality. Before HDR+, nighttime phone photos were unusable. Before multi-frame fusion, a phone sensor's tiny photosites produced images dominated by shot noise. Before AI denoising, you needed a full-frame camera and a fast lens to shoot a birthday party indoors. Computational photography didn't replace real photography โ it made real photography possible on hardware that physics says shouldn't be capable of it. A 1/1.9-inch sensor on a phone produces images that rival cameras with 10x the sensor area. That's not fabrication. That's engineering.
And the Samsung moon? A user who photographs the moon with a phone and shares it on Instagram is not committing journalism. They're sharing an experience. If the AI makes that experience look closer to what they saw with their eyes โ a bright, detailed moon against a dark sky โ that's arguably more authentic than the noisy, blurry, overexposed blob the sensor actually captured. The raw data is the lie. The AI output is the truth as experienced.
It's a strong argument. It becomes less strong the moment the same technology is used by Reuters, in a courtroom, on a medical scan, or in any context where "what the camera captured" matters more than "what looked right."
The Bottom Line
The ISP is no longer a signal processing unit. It's an inference engine. Every flagship phone, every action camera, every wearable with a lens is now making dozens of AI-driven decisions about what your photo should look like โ decisions you don't see, can't audit, and didn't request. Samsung's moon was the canary. Apple's DarkDiff research is the mine shaft. NVIDIA's DLSS 5 backlash is the template for what happens when consumers realize the output isn't the input plus processing โ it's the input plus imagination.
The technology is extraordinary. GoPro's HyperSmooth and Insta360's FlowState produce footage that would have required a $3,000 gimbal setup five years ago. Google's Night Sight creates images from darkness. Apple's Photonic Engine preserves texture detail that physics says the sensor shouldn't resolve. These are genuine engineering achievements.
But achievements and questions are not mutually exclusive. When a diffusion model ships inside a consumer ISP โ and the DarkDiff paper suggests it's a matter of months, not years โ every photograph will contain some percentage of generated content. The AP has a rule against that. Reuters has a rule against that. The courts have evidentiary standards against that. And a billion phone users will never know it's happening, because nobody reads the Scene Optimizer settings.
The camera doesn't lie. But the ISP might. And we're about to lose the ability to tell the difference.