Making medical diagnoses and designing treatment plans has so far been the responsibility of doctors. But an advanced AI model could be able to do this just as well or even better. This is shown by a study that compared medical and AI-generated diagnoses and decisions for numerous real patient cases. AI proved to be superior to human doctors, especially when it came to making quick decisions in the emergency room based on limited information. From the researchers’ perspective, the results do not imply that AI could replace doctors. However, as a support for healthcare professionals, they could potentially improve patient care.
AI language models are trained on large amounts of data in order to give answers that are as human-like as possible. They can analyze and summarize information, respond to questions, and appear to respond empathetically to human problems. But especially in sensitive areas, the motto is always that you should not rely on AI-generated information. Since the language models only evaluate and reproduce patterns, errors can easily occur. Health information is considered particularly critical. Even though more and more private individuals are now asking ChatGPT and Co about their symptoms and hoping for medical advice, previous studies have shown that the AI can sound convincing and helpful, but sometimes provides dangerous misinformation.
Better than doctors?
At the same time, AI models that are confronted with real or constructed medical case reports are becoming increasingly better at generating correct diagnoses and treatment suggestions. A study by a team led by Peter Brodeur from the Beth Israel Deaconess Medical Center in Boston shows how good they are at this. The researchers had OpenAI’s o1 language model evaluate numerous standardized clinical cases as well as real emergency room cases and compared its performance with both other AI models and that of human doctors.
For the study, other doctors evaluated the respective diagnoses and decisions without knowing whether they came from a human or an AI. The result: “In all experiments, OpenAI o1 exceeded the reference values of human doctors and showed continuous improvement over previous generations of clinical AI decision aids,” reports the research team. Compared to human doctors, o1 delivered correct diagnoses more often and made the right decisions for the further course of treatment more often.
Advantage in acute situations
“The differences in performance were particularly pronounced in emergency room cases, where there is the least information about the patient and the greatest urgency to make the right decision,” report Brodeur and his colleagues. While doctors only made correct decisions in about half of the cases due to the limited information, the AI was correct in about two thirds of the cases. She was able to effectively use even fragmented, unstructured data from medical records.
From the researchers’ perspective, these results could have serious implications for future medical care. “Although the use of AI to support clinical decisions is sometimes viewed as a high-risk endeavor, greater use of these tools could help mitigate the human and financial costs of diagnostic errors, delays, and lack of access,” they write.
No replacement for people
From Brodeur and his team’s point of view, “Doctor AI” cannot be a replacement for human doctors. “Diagnoses are important, but they are not everything in medicine,” explains Brodeur’s colleague Adam Rodman. For example, in the emergency room it is not necessarily important to immediately know the appropriate diagnosis. Instead, the aim is first to stabilize critically endangered patients and then to initiate further treatment steps. AI may be able to provide helpful support, but it cannot act alone. Additionally, the study only tested text-based information, leaving out areas such as auditory and visual information, which are also important in clinical practice.
“I don’t want any AI doctor companies to try to push doctors out of the process or provide minimal clinical oversight. As one of the lead authors of this study, I don’t believe our results support that,” says Rodman. “What these results support, however, is a robust and ambitious research agenda to explore how we can use these technologies to improve patients’ lives.” Future studies that test the extent to which doctors with AI support actually make better decisions that are safe, effective and fair are therefore important.
Source: Peter Brodeur (Beth Israel Deaconess Medical Center, Boston, Massachusetts, USA) et al., Science, doi: 10.1126/science.adz4433