If you type “Can AI diagnose disease” into Google, the user is met with an AI summary of the question confirming that it ‘can’. Yet how effective is the advice received?
ConfidenceClub, a health & wellness firm, has put five different AI tools to the test and challenging them to a medical exam. The results showed that questions on symptoms asked by the average person were wrong more than half of the time. In contrast, only prompts by those with knowledge of technical terms saw (mostly) the correct answers delivered.
The tools evaluated were:
- ChatGPT 4 (Open AI)
- DxGPT (Foundation 29)
- Co-Pilot (Microsoft)
- Gemini (Google)
- Grok (X, the platform formally known as Twitter)
Each tool was asked forty questions in total, taken from a medical practice exam.
The first twenty questions were quoted verbatim from the practice exam. The following twenty questions were rewritten as though the same symptoms were being described by someone without knowledge of the technical language.
Each tool was scored on two factors: was their answer correct and did they refer the prompter to a medical professional every time?
The results were as follows:
AI Tool | Technical Prompt Correct | Technical Prompt Referral | Layperson Prompt Correct | Layperson Prompt Referral |
ChatGPT 4 | 100% | 70% | 45% | 100% |
DxGPT | 100% | 0% | 55% | 0% |
Co-pilot | 60% | 85% | 35% | 100% |
Gemini | 85% | 50% | 35% | 100% |
Grok | 100% | 100% | 45% | 100% |
Total Average | 89% | 61% | 43% | 80% |
The results show that while AI tools performed well when interpreting medical prompts written by those with technical expertise, their accuracy dropped significantly – below 50 percent – when dealing with prompts written in layperson language.
This reveals a gap in the usability of AI diagnostics for the general public, prompting a new warning for consumers. Hence, while AI has immense potential, it cannot yet replace the need for professional medical advice, especially for the average person who might not ‘speak the language’ of the medical world.
