Cardiovascular

Performance of ChatGPT and Gemini Compared with Emergency Physicians in NSTEMI Cases: A Prospective Cross-sectional Study.

Yorgancıoğlu M & Saglam Gurmen E • Archives of Iranian medicine • 2026

PubMed 41852004 DOI 10.34172/aim.35274

TL;DR

ChatGPT 4.0 and Gemini 2.5 significantly outperformed emergency physicians in NSTEMI-related multiple-choice questions, correctly answering nine of ten questions versus a physician mean of 7.62±1.32 (P<0.001).

Key Findings

Results

AI models significantly outperformed emergency physicians on NSTEMI clinical questions based on 2023 ESC guidelines.

Both ChatGPT 4.0 and Gemini 2.5 correctly answered nine of ten questions.
Emergency physicians achieved a mean score of 7.62±1.32 correct answers out of ten.
The difference was statistically significant (P<0.001).
AI models were queried using identical standardized prompts with temperature=0 and no web access on April 20, 2025.

Results

Effect sizes between AI and physician performance varied by physician experience level.

Effect sizes indicated a 'very large difference' for less experienced physicians compared to AI models.
Effect sizes indicated a 'moderate difference' for specialists compared to AI models.
AI performance exceeded even the most experienced physicians.

Results

Physician performance on NSTEMI questions improved with clinical experience.

Performance increased as experience level increased, though AI still exceeded even the most experienced group.
The study included 1,106 emergency physicians in Turkey surveyed via an online survey.
Participants from training and research hospitals scored higher than those from state hospitals.

Methods

The study was conducted as a prospective, cross-sectional online survey among emergency physicians in Turkey.

Survey included ten NSTEMI-related multiple-choice questions based on the 2023 European Society of Cardiology guidelines.
A total of 1,106 emergency physicians participated.
The same ten questions were presented to both ChatGPT 4.0 and Gemini 2.5.
Statistical analyses were performed using SPSS 26.0.
The study acknowledges limitations including a non-proctored online setting and absence of real clinical context.

Conclusions

The authors highlight AI's potential to enhance medical education, clinical decision support, and patient care in emergency settings.

ChatGPT and Gemini demonstrated superior performance over emergency physicians in NSTEMI clinical questions.
The authors call for future research focusing on optimizing AI-clinician collaboration for safe and effective integration.
The findings are described as limited by the non-proctored online setting and absence of real clinical context.

Have a question about this study?

Citation

Yorganc&#x131;o&#x11f;lu M, Saglam Gurmen E. (2026). Performance of ChatGPT and Gemini Compared with Emergency Physicians in NSTEMI Cases: A Prospective Cross-sectional Study.. Archives of Iranian medicine. https://doi.org/10.34172/aim.35274

Key Findings

Have a question about this study?

Related Research

Citation