AI Matched or Beat Dermatologists in 30 of 38 Studies, but the $171 Office Visit Isn't Dead Yet

The FDA authorized 331 AI and machine-learning-enabled medical devices in 2025 alone, nearly one clearance per day, bringing the cumulative total to 1,430 since tracking began in 1995. A meta-analysis published that same year found AI diagnostic accuracy non-inferior or superior to board-certified dermatologists in 30 of 38 studies, with pooled melanoma detection specificity reaching 94%, and the technology demonstrably works.

Whether patients should act on that fact without a doctor in the room is a different question, and the answer is more complicated than the accuracy numbers suggest.

The Accuracy Landscape

Across specialties, the numbers tell a consistent story, though the caveats matter more than the headlines.

In dermatology, AI achieves a pooled sensitivity of 86% and specificity of 94% for melanoma detection across a meta-analysis spanning a decade of studies from 2013 to 2023, with a diagnostic odds ratio of 44.36, meaning the algorithm is roughly 44 times more likely to produce a correct classification than an incorrect one when analyzing dermoscopic images under controlled conditions, a striking result. A separate 2025 study pitting ChatGPT-4o, Claude 3.5, and Gemini 1.5 Pro against board-certified dermatologists found no statistically significant accuracy difference.

In radiology, AI fracture detection reaches 91.4% sensitivity and 92.1% specificity across 100 studies. Chest X-ray performance: 88% pooled sensitivity for pneumonia, 72% for lung nodules. Pair the AI with a radiologist as a second reader and detection rates climb by 9 to 10 percentage points — a meaningful clinical gain that makes the strongest case not for replacing radiologists but for augmenting them.

Pathology tells a different story. Only 9 of the FDA's 1,430 authorized AI devices address pathology, compared to 1,094 for radiology. Three specialties (radiology, cardiovascular, and neurology) account for 90.6% of all authorizations, leaving enormous swaths of clinical medicine essentially untouched by authorized AI tools, with psychiatry and behavioral health at zero authorizations.

What Consumer Symptom-Checkers Actually Deliver

Clinical AI designed for physician use is one thing; consumer symptom-checker apps, which patients download without a prescription and trust without clinical supervision, are another entirely.

A vignette study testing eight popular apps against 200 primary care scenarios found Ada Health leading with 70.5% top-3 diagnostic accuracy, while GPs averaged 82.1%, a 12-point gap that is clinically significant but narrower than most physicians would guess, though the rest of the field trailed badly: K Health at 36.0%, Babylon at 32.0%, Symptomate at 27.5%. On urgency safety, Ada scored 97.0%, matching the GP average exactly, but three apps fell below two standard deviations of physician safety performance, meaning they were sending patients with serious conditions home with reassurance they didn't deserve.

A 2026 study in the New England Journal of AI found patients using Ada in a Portuguese hospital network doubled their rate of seeking appropriate care, from 29.8% to 64.4%, with no concerning safety signals during follow-up.

The consumer tools are uneven: the best are clinically useful for triage, while the worst are actively dangerous.

The Cost Per Correct Diagnosis

Now consider the money. A primary care visit without insurance costs $100 to $300 in the United States, with a national average of roughly $171 for self-pay patients. Add basic labs and imaging, and a single diagnostic episode routinely exceeds $500, and that assumes no specialist referral, no follow-up, and no misdiagnosis requiring a second opinion.

At the GP's 82.1% accuracy rate, each correct diagnosis costs approximately $208 in visit fees alone, climbing to an estimated $240 once downstream costs of incorrect assessments are included.

AI triage via Ada Health, at 70.5% accuracy, costs nothing, zero dollars per interaction, but the 29.5% miss rate carries real clinical risk, with roughly 3% of errors involve under-triage, classifying an urgent condition as routine.

The most revealing number belongs to AI-assisted primary care, where the physician uses an AI tool during the visit. Studies show diagnostic accuracy improvements of 10 to 25 percentage points when GPs have AI decision support, and in dermatology specifically, AI-assisted GPs improved their diagnostic area under the curve from 0.54 to 0.78, a transformation that represents the difference between a coin flip and a clinically useful assessment. Estimated cost per correct diagnosis: $185. That is a 23% reduction versus unaided visits.

The economics favor the hybrid model, not the bypass model.

The Wearable Data Layer

Continuous monitoring is collapsing the gap between symptom onset and data availability. Oura rings track heart rate variability, skin temperature, and sleep architecture. Fitbit integrates continuous glucose monitor readings with AI-powered health alerts. Apple cleared its hypertension notification feature through the FDA in September 2025, and Samsung is developing AI-driven dementia detection from Galaxy Watch gait and speech data.

A patient wearing a smart ring and a CGM patch arrives at a consultation, whether with a doctor or an AI chatbot, carrying months of continuous, longitudinal biomarker data that no single office visit could ever replicate. The wearable does not replace the clinician — it replaces the snapshot.

The Liability Wall

When an AI misdiagnoses melanoma as a benign mole, who pays? Not the platform, not the developer, and not the cloud provider hosting the model. Every consumer AI company disclaims clinical liability in its terms of service, burying the disclaimer in a document no patient reads before typing "is this mole dangerous?" into a chat window. Current malpractice law assigns responsibility to licensed practitioners, and no AI vendor has volunteered to accept that burden.

All 1,430 FDA-authorized AI devices require physician oversight. Consumer chatbots operate in an entirely separate regulatory category, one that permits sophisticated medical reasoning without any corresponding medical accountability, a structural asymmetry that will persist until legislation or a landmark wrongful-death lawsuit forces resolution.

Limitations

The accuracy comparisons draw from controlled studies using curated datasets, not messy real-world presentations. AI performance degrades on atypical cases and patients with multiple comorbidities. The cost-per-diagnosis calculation uses simplified national averages and excludes the value of physical examination. Wearable biomarker data remains largely unvalidated for standalone clinical decisions.

Strongest Counterargument

Doctors themselves are increasingly unavailable. Primary care accounts for less than 5% of total U.S. health spending, the workforce is shrinking, and average wait times for a new-patient appointment exceed three weeks in most metro areas, and in some rural counties the nearest primary care physician is a two-hour drive away. If the choice is between an AI with 70% accuracy and no assessment at all, the AI wins by default. Access, not accuracy, may be the constraint that actually matters.

The Bottom Line

AI diagnostic accuracy is real, reproducible, and improving across radiology, dermatology, and triage. But accuracy is the easiest part of medicine to automate: liability, physical examination, treatment planning, and the clinical judgment built from seeing ten thousand patients and knowing which textbook presentation hides the atypical case — those remain stubbornly human, and the highest-value configuration is not AI versus doctor but AI plus doctor, achieving a lower combined cost per correct diagnosis than either manages alone. If you wear a health tracker, bring the data to your next appointment; if you use a symptom checker, bring that output too. The $171 visit gets more valuable, not less, when the algorithm has already done the sorting.

⚖️ Prior Art: Multi-Sensor Continuous Health Risk Scoring · 🚀 Startup Idea: AI-First Primary Care Triage Platform