Back

    The global race to regulate AI medical devices has produced five distinct frameworks. None of them fully solves the same three problems.

    Laura Ahonen

    Published 5/5/2026

    A new decade-spanning review in Frontiers in Medicine - authored by researchers from Shenyang Pharmaceutical University and Peking Union Medical College Hospital - maps AI medical device regulation across the US, EU, China, Japan, and South Korea from 2015 to 2025. It is the kind of comparative synthesis the field has needed, and what it surfaces is instructive: the regulatory diversity is real, the progress is genuine, and the shared blind spots are harder to escape than any single jurisdiction wants to admit.

    The core tension the paper documents is between pre-market assurance and post-market flexibility. The EU model, anchored in the MDR and now layered with the AI Act, demands extensive clinical evidence before market entry and classifies most AI medical software as high-risk. This is good for catching problems early, but the compliance costs have reportedly pushed manufacturers to withdraw products from EU markets entirely - one analysis cited in the paper suggests up to 75% of medical devices in some EU countries face availability risk under the current framework. The US model takes the opposite bet: roughly 85% of AI medical devices clear via the 510(k) pathway, which relies on comparisons to predicate devices rather than prospective validation of the AI system itself. Faster to market, more innovation-friendly, and it places the evidentiary burden on post-market surveillance that, as multiple papers in this review confirm, remains underdeveloped in practice.

    Japan and South Korea occupy an interesting middle position. Japan codified the Post-Approval Change Management Protocol before the US finalized its analogous Predetermined Change Control Plan framework - making it one of the few countries to address algorithm update management in law rather than guidance. South Korea, notably, published the world's first safety evaluation guidelines specifically for generative AI medical software in 2023, acting ahead of both the FDA and EU on that emerging question. Both represent hybrid approaches that attempt to capture pre-market rigor while building in enough flexibility to accommodate the iterative nature of machine learning systems.

    China's trajectory is distinct. The paper describes a stepwise approach through successive NMPA guidelines since 2015 - moving from foundational software regulation to deep learning-specific guidance to a comprehensive registration review framework finalized in 2022. China now classifies AI-assisted diagnostic software as Class III, the highest risk category, and as of December 2024 has approved 126 such devices, predominantly in pulmonary nodule detection and ECG analysis. The localized clinical trial requirement and algorithm locking mandate create their own constraints, but the regulatory pace has been notably faster than many observers expected a decade ago.

    Three problems run through every framework the paper examines, and none of them has been adequately solved anywhere. The first is the frozen AI problem. Most regulatory systems require algorithm locking at market authorization - a design choice that ensures predictability but strips out the adaptive learning that makes these systems clinically valuable in the first place. Researchers cited in the paper found that 78% of manufacturers suspended algorithm updates due to re-certification costs. The result is a class of products frozen at their development-era performance, quietly degrading relative to clinical reality while displaying no regulatory signal of distress.

    The second is the demographic data problem. Fewer than 30% of AI medical devices globally disclose demographic diversity in their training datasets, and only 3.6% of FDA-approved devices reported racial composition data. The performance consequences are not abstract: one study cited found a 12.7% higher missed-diagnosis rate for African Americans versus white patients in a diabetes screening algorithm. Current regulations in both the US and EU address this through generic risk management requirements - language too vague to force meaningful change in how training data is assembled or how subgroup performance is tested before authorization.

    The third is the clinical validation gap. Approximately 43% of FDA-authorized AI medical devices lack clinical validation data, and only 28% have undergone prospective device validation. Approval based on diagnostic accuracy metrics does not answer the question that matters to patients: does this tool improve outcomes? An imaging AI that detects more small lesions is not self-evidently beneficial - the clinical causal chain is more complicated than sensitivity and specificity scores suggest, and no current regulatory framework systematically requires that chain to be demonstrated.

    The paper's forward-looking section is careful not to overstate how quickly any of this will change. Dynamic regulatory frameworks, real-world performance monitoring, and international harmonization through IMDRF are presented as genuine directions - but the authors note that the foundational conflicts between process-control and comprehensive-validation paradigms continue to block substantive international convergence. Regulatory sandboxes in the UK and Singapore offer one model for testing adaptive approaches. The EU AI Act's transparency requirements for high-risk AI systems may eventually force explainability standards into medical device regulation by the back door. The FDA's Predetermined Change Control Plan framework, finalized in December 2024, represents the most concrete recent advance on the algorithm update problem - though it remains early in implementation.

    What this decade of global regulatory activity makes clear is that no single model is obviously correct. The EU's safety-first approach has real costs in access and availability. The US's flexibility-first approach has real costs in evidentiary quality. The emerging hybrid models in East Asia are pragmatic but untested at scale. What all of them share is a post-market surveillance infrastructure that is not yet adequate for the task of monitoring AI systems that can change, degrade, or perform differently across populations than they did in validation datasets.

    That gap - between what the regulatory architecture assumes about these systems and what they actually do in clinical practice - is the central unresolved problem. A decade of framework-building has made it smaller. It has not made it go away.

    Full paper: https://pmc.ncbi.nlm.nih.gov/articles/PMC12310608/