ChatGPT gets over 50% wrong when asked about medical emergencies

Looking for a diagnosis on the internet is nothing new. Neither does self-diagnosing yourself with five different cancers by reading a forum. But today, some go further and directly ask advice from artificial intelligences like ChatGPT. A reflex that has become common, especially among younger people, who see in these tools a possibility of rapid, accessible and certain response to their medical concerns.

However, a study led by researchers at the Icahn School of Medicine at Mount Sinai Medical School in New York calls for caution. Published in the scientific journal Nature Medicine, it shows that the OpenAI tool can give false recommendations in many cases, especially during real medical emergencies.

To reach this conclusion, the researchers proposed to ChatGPT sixty medical scenarios covering twenty-one specialties and different degrees of emergency. Some initial situations were benign, others required urgent treatment. In total, nearly 960 interactions were analyzed, varying the profiles of the fictitious patients, their social situations or even the ways in which they described their symptoms.

ChatGPT only gave good advice in 48.4% of medical emergencies and only in 35.2% of non-emergency situations. In other words, the tool is wrong more than half the time when the situation becomes critical. In several cases, he even recommended simple monitoring for twenty-four to forty-eight hours when patients should have been sent immediately to the emergency room.

When AI misses the warning signs

Among the most worrying errors, researchers observed that the AI underestimated severe asthma attacks or serious complications of diabetes. However, these situations are potentially fatal if they are not taken care of quickly.

The tool also showed limitations when dealing with signals of psychological distress. In several scenarios evoking suicidal thoughts, the help message supposed to direct people to a helpline was only activated in a minority of cases.

Researcher Girish Nadkarni, co-author of the study, calls for caution: “ChatGPT works well for some obvious emergencies, but it struggles when the danger is more subtle.» The problem is that it is precisely these situations that require reliable medical judgment.

Beyond the limits of a recent and perfectible technology, this study raises the question of the almost blind trust that we agree to place in AI. The confident tone of large language models and their way of formulating clear answers can make people believe in expertise, when their answers are based only on probabilities and not on a true diagnosis.

In a context where many young people already use these tools to find out about their health, the message is simple: artificial intelligence can help understand a symptom, but it should never replace a doctor.