ChatGPT in Emergency Care: Overprescription and the Road to Improvement

AI models like ChatGPT show potential but require refinement to match human decision-making in the Emergency Department.

Generative AI, particularly large language models (LLMs) such as ChatGPT, are rapidly making their way into healthcare applications. Generative AI refers to systems that create new content, while AI broadly refers to machines that mimic human intelligence.

However, new research shows that while these AI models offer promise, they are not yet ready for full implementation in critical settings like the Emergency Department (ED).

A study from the University of California, San Francisco (UCSF) reveals that ChatGPT tends to overprescribe medical interventions, leading to potential harm for patients and higher costs for healthcare systems.

The research team wrote about their study and findings in the prestigious, peer-reviewed journal Nature Communications (citation below).

Image of emergency care and another of generative AI.
Image created by Market Business News.

AI and Clinical Decision-Making: The Study

A research team led by Chris Williams, MB BChir, at UCSF set out to determine whether AI could replicate the decision-making processes of human doctors in the ED.

The team evaluated ChatGPT-3.5 and ChatGPT-4’s recommendations in three critical areas:

  • patient admissions,
  • radiological investigations (such as x-rays),
  • and antibiotic prescriptions.

The AI models were tested on 1,000 anonymized patient visits, with doctors’ notes as the only input.

  • AI vs. Resident Physicians

The results were clear: both versions of ChatGPT were less accurate than resident physicians. Specifically, ChatGPT-4 was 8% less accurate, while ChatGPT-3.5 lagged behind by 24%.

The AI models also tended to err on the side of caution, recommending more interventions than were necessary.


Why Overprescription Happens

One key issue is that AI models like ChatGPT are trained on vast amounts of data from the internet.

Much of this data comes from sources aimed at the general public, where the advice is often designed to encourage users to seek medical help.

While this cautious approach might be useful in less critical contexts, it can cause problems in the ED, where overprescription can lead to unnecessary interventions, increased costs, and strained resources.

For example, ordering an unnecessary x-ray or prescribing antibiotics when they aren’t needed can expose patients to risks and overload medical staff.


The Balance Between Caution and Precision

Williams and his team emphasize that AI must strike a balance between being too cautious and missing critical health conditions.

While it is essential for AI to err on the side of caution in certain situations, the Emergency Department requires more nuanced decision-making.

In these high-stakes environments, overprescribing treatments or tests can have serious consequences for both patients and healthcare systems.

Currently, LLMs like ChatGPT lack the clinical frameworks necessary to evaluate complex medical scenarios effectively.

Developers need to refine these models to better mimic the decision-making processes that doctors use, ensuring that they neither under-diagnose nor overprescribe.


Looking Ahead

  • ChatGPT & Simple Diagnoses

Despite these limitations, the potential for AI in emergency care is vast. ChatGPT performed better than human doctors in simple diagnostic situations, such as identifying which of two patients was more acutely ill. This shows that AI has the capacity to enhance healthcare when used correctly.

However, before LLMs can be fully integrated into ED workflows, substantial improvements are needed.

  • More Collaboration Crucial

The study’s findings underline the importance of collaboration between AI researchers, healthcare professionals, and the public to determine how best to deploy these technologies in clinical settings.

Striking the right balance between safety and efficacy will be key to integrating AI into emergency medicine.


Final Thoughts

While generative AI holds great potential for supporting healthcare professionals, current models like ChatGPT still fall short of the expertise of human doctors, especially in complex environments like the Emergency Department.

Overprescription remains a major challenge, and AI systems must be carefully fine-tuned before they can be trusted with critical healthcare decisions.

The journey toward fully integrating AI into emergency care is ongoing, but with further research and development, it could revolutionize the field in the near future.


Citation

Williams, C.Y.K., Miao, B.Y., Kornblith, A.E., & Butte, A.J. (2024). Evaluating the use of large language models to provide clinical recommendations in the Emergency Department. Nature Communications, 15, 8236. https://doi.org/10.1038/s41467-024-52415-1