Mental health AI is already in your pocket. Is anyone making sure it actually works for you?

Jarno Peltokangas

Published 4/28/2026

A new Comment in Nature Computational Science from Stanford HAI's Dr. Nicole Martinez-Martin asks that question directly - and the answer, from a regulatory standpoint, is deeply unsettling.

Mental health is one of the fastest-growing areas for AI deployment. Millions of people are already using apps and chatbots for therapy support, crisis intervention, mood tracking, and diagnosis. Yet this is precisely the domain where the regulatory gaps are most dangerous.

Poorly designed AI doesn't just fail - it actively worsens disparities. Mental health AI is frequently trained on non-representative datasets, which means it can perform well for some groups and dangerously poorly for others. We already know that human clinicians systematically misdiagnose mental illness across racial and ethnic lines. AI trained on that same data doesn't fix the problem - it scales it.

Mental health data is also uniquely sensitive, and current protections are inadequate. Few areas of personal information carry greater stigma risk. Yet much of the data fueling these tools - extracted from apps, biosensors, and consumer platforms - sits largely outside the protections we'd expect for clinical health records. GDPR and HIPAA offer partial coverage at best, and the gaps are significant.

Then there is the question of direct harm. A 2025 preliminary report on chatbot iatrogenic dangers is among the references Martinez-Martin cites. Mental health chatbots have been shown to give advice contrary to clinical guidelines and to fail users during acute crises. Despite this, most direct-to-consumer mental health apps face no mandatory safety evaluation before reaching millions of users. There is no requirement that they work, only that they don't make obviously false advertising claims.

The paper also raises what might be called context collapse - a problem that current regulatory frameworks are largely blind to. A tool validated in one setting, say with English-speaking, digitally literate adults at an academic medical center, may behave completely differently when deployed with elderly populations, non-native speakers, or communities with different cultural frameworks around mental health and help-seeking. Generalizability is assumed rather than demonstrated, and no one is checking.

The solutions Martinez-Martin calls for are not technically exotic. Training data needs to actually reflect the diversity of who will use these tools. Bias evaluation needs to be built into the design process from the start, not treated as a post-hoc audit. Affected communities - particularly those most at risk of being harmed - need to be genuine participants in development, not just subjects of deployment. These are known principles. The problem is that nothing in the current regulatory environment requires them.

The deeper irony is hard to ignore. A clinical AI tool for detecting diabetic retinopathy or flagging sepsis faces substantial FDA scrutiny before it reaches a clinician. A chatbot that might be the only mental health resource someone in crisis reaches for on a Friday night at 2am may have never undergone any formal safety review at all.

Mental health is where trust, human judgment, and cultural context matter most. It is also the domain where consumer AI is moving fastest with the least oversight. That combination should concern anyone working at the intersection of technology, health, and governance.

Full paper: https://www.nature.com/articles/s43588-025-00882-x