Doctors Gave AI Chatbots a Cognitive Test. Their Diagnosis? Early Dementia.

Israeli neurologists gave leading AI chatbots the same cognitive exam used to assess U.S. presidents’ mental fitness. Their December study was intended as a wisecrack for the jovial Christmas edition of The BMJ—but it found “real flaws” in the technology increasingly used to guide clinical decision-making, Dr. Roy Dayan, one of its authors, told Newsweek.

The BMJ is one of the world’s most rigorous medical journals, subjecting all articles to a thorough peer-review process. It holds those same standards for its Christmas edition, but features more creative, “light-hearted fare”—serious research with quirky or sarcastic undertones. This year, features examined how bedtime stories affect children’s health, why Norwegian emergency helicopters occasionally pick up polar bears and the inspiration for a British country love song to the United Kingdom’s National Health Service.

Sometimes, the articles give a deeper look at the social issues that are top-of-mind for medical professionals. That was certainly the case for Dayan’s study, according to the senior neurologist at Jerusalem’s Hadassah Medical Center.

Dayan said he and his colleagues were inspired by countless studies about AI’s outperformance of doctors. Over the past two years, research has shown that ChatGPT can ace the MCAT and the United States Medical Licensing Exam. Large language models (LLMs) can produce more accurate diagnoses than physicians in certain specialties and even garner higher patient satisfaction scores when responding to digital inquiries.

MRI Skull — MRIs like these can help diagnose dementia, along with cognitive tests like the MoCA.

Peter Garrard Beck, Getty Images

Reports from international medical journals and major media organizations have deliberated whether AI will eventually replace doctors. It’s not an unreasonable question: 10 percent of consumers believe AI should replace doctors in the foreseeable future, according to a June survey from IT services and consulting firm CustomerTimes.

If AI is going to take charge, it should be put through its paces, Dayan reasoned: “We thought it would be interesting to examine ChatGPT with our tools, the way we check patients if we suspect cognitive degeneration.”

Table of Contents

Does AI Have Dementia?

Dayan and his colleagues—Dr. Benjamin Uliel, senior neurologist and cognitive specialist at Hadassah Medical Center, and Gal Koplewitz, senior data scientist at Tel Aviv University and London-based QuantumBlack Analytics—administered the Montreal Cognitive Assessment (MoCA) to five leading LLMs (ChatGPT 4, GPT-4o, Claude, Gemini 1 and Gemini 1.5).

The MoCA assesses cognitive impairment by asking patients to perform a variety of simple tasks. For example, copy this drawing of a cube. Give as many words as you can that begin with the letter “F.” Subtract seven from 100 until you reach zero.

To Dayan’s surprise, none of the models obtained the full score of 30 points. Most scored between 18 and 25, indicating mild cognitive impairment associated with early dementia.

Every model outperformed the average person in attention and memory-related tasks. But they all faltered on visuospatial tasks, such as those that asked them to draw or orient themselves in the universe.

Researchers also showed the chatbots the “cookie theft” picture from the Boston Diagnostic Aphasia Examination, an image of a boy standing on a stool to steal cookies while his mother washes dishes. Patients are asked to describe the drawing while analysts assess their speech and language function. All the models correctly interpreted parts of the drawing, according to the study—however, none of them expressed concern that the boy was about to fall.

This lack of empathy is commonly associated with frontotemporal dementia, according to the study’s authors.

Hadassah Medical Center in Jerusalem, Israel, where Dr. Roy Dayan and Dr. Benjamin Uliel work as senior neurologists.

Moshe Einhorn, Getty Images

Notably, older AI models performed worse on the MoCA than newer versions. The authors drew a resemblance between the “dementia” risk in aging AI and the dementia risk in human brains.

The study was written more “tongue-in-cheek” for The BMJ’s Christmas edition, Dayan said. Methodologically, LLMs should not be examined by methods meant for people. However, he hopes the results spark conversations about the differences between AI and human doctors—and the important roles that both can play.

Visuospatial awareness is important when making a diagnosis, especially in specialties like neurology when answers may be hidden under the surface, according to Dayan. He uses the patients’ body language and intonation to inform his diagnoses. AI can respond to what a patient says, but how they say it is equally important.

Empathy is also a critical part of health care. Research has demonstrated the positive effects of empathy on patient health and recovery. A 2024 study found that for chronic pain patients, physician empathy was more strongly associated with favorable outcomes than opioid therapy, lumbar spine surgery and nonpharmacological treatments.

Amid the churn of articles showing how ChatGPT can outperform doctors on board exams, “people immediately said, ‘Okay, so doctors are obsolete,'” Dayan said. “We tried to show that still, sometimes you need a person-to-person interaction.”

What Medical Leaders Are Saying About AI and Empathy

The study has elicited a range of reactions from physicians and health care executives.

Dr. Robert Pearl—the former CEO of the Permanente Medical Group, who currently serves as a clinical professor of plastic surgery at Stanford University School of Medicine and a faculty member at Stanford Graduate School of Business—came to a different conclusion than the study’s authors. The LLMs’ shortcomings did not remind him of cognitive decline in the elderly, but of cognitive development in children.

AI has made significant improvements in a short period of time, Pearl told Newsweek. ChatGPT was released just over two years ago. If it’s this smart at age two, it’s likely to be a prodigal five-year-old.

He treats AI as a medical student who is still learning. While he would never trust a student to make a definitive diagnosis and prescribe treatment, he trusts it as a research aide and assistant—but always makes sure to double-check its work.

In fact, Pearl wrote his book, ChatGPT, MD: How AI-Empowered Patients & Doctors Can Take Back Control of American Medicine, published in April 2024, by collaborating with ChatGPT as he would a medical student. Ninety-eight percent of the information ChatGPT provided was “superb,” but it hallucinated the other two percent, Pearl said.

Still, he believes this technology is only growing more powerful and will eventually save hundreds of thousands of lives each year.

“One of my great concerns is that we ignore, as a society, so many failures of medicine today,” Pearl said. “Four hundred thousand people die every year from misdiagnosis. I want us to ask a question: How can this technology reduce that number?”

AI Savings — AI could make care more for affordable for patients and save time for doctors, according to Dr. Robert Pearl.

Getty Images

AI could also reduce the rampant burnout among doctors—shifting the daily duties of their profession and allowing them to lean into the human side of their practice.

“Patients value very much your expertise,” Pearl said, “but for the most part, they also want to have the empathy of the doctor, the face-to-face relationship, the metaphorical holding of the hands.”

Dr. Thomas Thesen, an associate professor of neuroscience at Dartmouth’s Geisel School of Medicine and in the college’s Department of Computer Science, drew similar conclusions from the study.

“Asking those models to do these multimodal tests of how we actually test humans is a little bit like asking your calculator to do pushups,” Thesen told Newsweek. “It can’t do it, but it can do other things well—what it’s been trained to do or constructed to do.”

However, the study raises important questions that medical faculty at Dartmouth have been mulling, Thesen said. The school’s curriculum teaches medical students how to responsibly deal with the growing body of digital health and AI tools.

In some cases, AI has been helpful in empathy-building, Thesen said. He uses an AI model to train medical students by simulating patient interactions. The AI gives feedback on the student’s bedside manner, prompting them to acknowledge the patient’s pain or ask more open-ended questions.

But there is a level of empathy that robots will never be able to emulate, according to Thesen.

“The idea that that ‘there’s somebody who cares for me,’ has a big influence on people’s behaviors, patients’ compliance and their general outlook on the therapeutic relationship,” Thesen said. “My feeling is that we will lose this effect if we only outsource this to AI.”

Dr. Roshini Pinto-Powell, associate dean of admissions at Dartmouth’s Geisel School of Medicine, elaborated on Thesen’s concerns.

Patients often report that AI responds to their inquiries with more empathy than doctors, studies have shown. But there’s a vital difference between human and technological expressions of empathy, per Pinto-Powell.

Cognitive empathy is an understanding of a person’s distress, while affective empathy allows you to actually feel their distress, according to Pinto-Powell. Clinical empathy takes affective empathy a step further—it motivates a doctor to do something about a person’s distress.

AI will never be able to grasp affective or clinical empathy, Pinto-Powell said: “And I think clinical empathy is critical.” For that reason, she agrees with The BMJ study’s conclusions that AI is not coming for her job anytime soon.

When doctors see ChatGPT outperform them, they tend to worry. It’s a common response to the unknown, according to Pinto-Powell. But when she pores over medical school applications, she is not looking for high MCAT scores. She’s looking for effort, service, clinical work, coachability.

In Pinto-Powell’s eyes, AI does not stand a chance against the applicants who care deeply about people.

“You take a brilliant student who thinks they know it all…I don’t want them,” Pinto-Powell said. “That’s the deadliest kind of student to have.”

link

Doctors Gave AI Chatbots a Cognitive Test. Their Diagnosis? Early Dementia.

Does AI Have Dementia?

What Medical Leaders Are Saying About AI and Empathy

More Stories

Wolters Kluwer aims to power agentic AI in medication workflows

Should artificial intelligence be used for therapy?

AI could have helped her

New Transportation Technology Coming to Cobb Schools

North Texas emerges as testing hub for groundbreaking transportation technology

Air transport is accelerating the adoption of contactless technology internationally

How AI in Transportation Is Revolutionizing Traffic Management and Smart Mobility

Does AI Have Dementia?

What Medical Leaders Are Saying About AI and Empathy

More Stories

Wolters Kluwer aims to power agentic AI in medication workflows

Should artificial intelligence be used for therapy?

AI could have helped her

You may have missed

New Transportation Technology Coming to Cobb Schools

North Texas emerges as testing hub for groundbreaking transportation technology

Air transport is accelerating the adoption of contactless technology internationally

How AI in Transportation Is Revolutionizing Traffic Management and Smart Mobility