Artificial intelligence as a health guide?

Is the red place on my skin dangerous? Should I prefer to go to the doctor? With questions like these, people are increasingly turning to digital helpers. In addition to specialized symptom checker apps, generative artificial intelligence such as chatgpt also promise quick answers to medical problems. But how useful are the health advice of “Dr. Ki”? Two studies show that some apps can actually help with self -diagnosis and treatment. Chatgpt, on the other hand, tends to classify harmless symptoms as threatening. This could motivate the AI to seek medical help unnecessarily – and continue to contribute to the overload of the health system.

Many mild illnesses disappear on their own without medical help. In some symptoms, however, it makes sense to have them clarified at an early stage so as not to overlook a possibly dangerous illness. For many people, it is challenging to distinguish between these cases. AI applications, including large voice models such as chatt, promise this, but also specialized symptom checker apps such as ADA and HealthWise. But how reliable is artificial intelligence when evaluating the symptoms? And can she actually help to make medical laypersons better decisions when it comes to whether you should see a doctor or not?

Chatgpt tends to overestimate

To answer these questions, a team led by Marvin Kopka from the Technical University of Berlin tested various AI applications with real patient cases in a study. These included large voice models such as Chatgpt from Openai and Llama 2 from Meta as well as twelve specialized symptom checker apps. On the one hand, the cases described included medical emergencies such as severe concussion and serious illnesses such as cancer, but on the other hand, but also less complaints that require treatment such as muscle pain, stomach upset and skin problems.

“You can see our standardized method as a kind of ‘Stiftung Warentest’ because we can compare the accuracy of various apps, but also find their strengths and weaknesses,” explains Kopka. For comparison, the researchers also presented the case examples of the case examples of human medical laypersons, which should also decide on the basis of the symptoms described, which reaction is appropriate: Wait, consult the family doctor promptly or drive directly into the emergency room.

The result: While many symptom checker apps actually deliver appropriate recommendations and at least in the case of many patient cases, the large language models performed significantly worse whether they are harmless or potentially dangerous symptoms. In particular, Chatgpt rated almost every case as an emergency and recommended that they were to consult a doctor as a precaution. “It is harmful to the health system that more and more people use chatt for medical advice,” says Kopka. “The AI often recommends visiting a doctor or emergency room for the smallest symptoms. This can lead to a massive overload.”

Influence on human decision

The study also showed that the medical laypersons were usually good at recognizing real emergencies and deciding to inform the emergency services if necessary. If it was less serious symptoms, it was often difficult for them to distinguish whether they should wait or see a family doctor. Can artificial intelligence improve the decision in these cases?

In order to find out, Kopka and his team again presented 600 other volunteers in real patient cases in a second study and asked them to make the decision between waiting, family doctor and emergency room. This time, however, the researchers also provided their test subjects a AI-generated assessment that came from either chatt or from the symptom checker app ADA. It turned out that the subjects of the AI did not blindly trust, but only included them as a source in their decision -making.

The test subjects who had Chatgpt’s assessment made no better decisions than without AI help. They were correct both with and without a chatt. In contrast, the subjects who had received information from the symptom checker app improved to 64.5 percent. Specialized apps can therefore help to classify correctly in which cases self -sufficiency is sufficient. “In most cases, people are already making safe and reasonable decisions,” summarizes Kopka. “In some situations, however, you can benefit from the apps.”

Sources: Marvin Kopka (Technical University of Berlin) et al., Scientific Reports, DOI: 10.1038/S41598-024-83844-Z

Artificial intelligence as a health guide?

Chatgpt tends to overestimate

Influence on human decision

Recent Articles

The Best iPhone Wallpapers of the Week (Wallpaper Weekly #44 2025)

4 iPhone tips for features in iOS 26 you’ve missed so far

WhatsApp is working on a popular feature (which will change a lot in chats)

Waiting for the iPhone 18 is a very bad idea (for this reason)

This is how you block unwanted senders in Mail on iPhone, iPad and Mac!

Related Stories