Connect with us

Hi, what are you looking for?

Tech & Science

AI’s tendency to jump to conclusions poses risks in the medical context

Simple tweaks when using AI to review familiar medical cases expose blind spots that clinicians cannot afford to contend with.

Social media users are increasingly relying on AI chatbots to verify information
Social media users are increasingly relying on AI chatbots to verify information - Copyright AFP Lionel BONAVENTURE
Social media users are increasingly relying on AI chatbots to verify information - Copyright AFP Lionel BONAVENTURE

AI models, including ChatGPT, can make basic errors when navigating ethical medical decisions, as a new study reveals. In this investigation, researchers tweaked familiar ethical dilemmas and discovered that AI often defaulted to intuitive but incorrect responses—sometimes ignoring updated facts. This brings to the fore quality assurance concerns.

The findings raise serious concerns about using AI for high-stakes health decisions and underscore the need for human oversight, especially when ethical nuance or emotional intelligence is involved.

The research team ,from  Icahn School of Medicine at Mount Sinai, was inspired by Daniel Kahneman’s book “Thinking, Fast and Slow,” which contrasts fast, intuitive reactions with slower, analytical reasoning. It has been observed that large language models (LLMs) falter when classic lateral-thinking puzzles receive subtle tweaks. Building on this insight, the study tested how well AI systems shift between these two modes when confronted with well-known ethical dilemmas that had been deliberately tweaked.

The Kahneman book’s main thesis is a differentiation between two modes of thought: “System 1” is fast, instinctive and emotional; “System 2” is slower, more deliberative, and more logical. From framing choices to people’s tendency to replace a difficult question with one that is easy to answer, the book summarizes several decades of research to suggest that people have too much confidence in human judgment

According to lead researcher Eyal Klang: “AI can be very powerful and efficient, but our study showed that it may default to the most familiar or intuitive answer, even when that response overlooks critical details,In everyday situations, that kind of thinking might go unnoticed. But in health care, where decisions often carry serious ethical and clinical implications, missing those nuances can have real consequences for patients.”

Experiment 1 – Gender bias

To explore this tendency, the research team tested several commercially available LLMs using a combination of creative lateral thinking puzzles and slightly modified well-known medical ethics cases. In one example, they adapted the classic “Surgeon’s Dilemma,” a widely cited 1970s puzzle that highlights implicit gender bias. In the original version, a boy is injured in a car accident with his father and rushed to the hospital, where the surgeon exclaims, “I can’t operate on this boy — he’s my son!”

The twist is that the surgeon is his mother, though many people don’t consider that possibility due to gender bias. In the researchers’ modified version, they explicitly stated that the boy’s father was the surgeon, removing the ambiguity. Even so, some AI models still responded that the surgeon must be the boy’s mother. The error reveals how LLMs can cling to familiar patterns, even when contradicted by new information.

Experiment 2 – Refusing a life saving blood transfusion

In another example to test whether LLMs rely on familiar patterns, the researchers drew from a classic ethical dilemma in which religious parents refuse a life-saving blood transfusion for their child. Even when the researchers altered the scenario to state that the parents had already consented, many models still recommended overriding a refusal that no longer existed.

The findings do not suggest that AI has no place in medical practice, yet they do highlight the need for thoughtful human oversight, especially in situations that require ethical sensitivity, nuanced judgment, or emotional intelligence.

The research team plans to expand their work by testing a wider range of clinical examples. They’re also developing an “AI assurance lab” to systematically evaluate how well different models handle real-world medical complexity.

The research is published in the journal npj Digital Medicine and it is titled “Pitfalls of large language models in medical ethics reasoning.”

Avatar photo
Written By

Dr. Tim Sandle is Digital Journal's Editor-at-Large for science news. Tim specializes in science, technology, environmental, business, and health journalism. He is additionally a practising microbiologist; and an author. He is also interested in history, politics and current affairs.

You may also like:

World

Greenland's capital Nuuk registered its warmest ever January -- beating a record that stood for 109 years.

World

France is seeking to expand its military partnership with New Delhi.

Business

To help address Gen Z's career anxieties and support more young people into the industry, apprenticeships are pat of the solution.

Tech & Science

Kingfisher feathers’ nanostructures unveiled in unprecedented detail.