Cardiovascular

Performance of ChatGPT and Gemini Compared with Emergency Physicians in NSTEMI Cases: A Prospective Cross-sectional Study.

TL;DR

ChatGPT 4.0 and Gemini 2.5 significantly outperformed emergency physicians in NSTEMI-related multiple-choice questions, correctly answering nine of ten questions versus a physician mean of 7.62±1.32 (P<0.001).

Key Findings

AI models significantly outperformed emergency physicians on NSTEMI clinical questions based on 2023 ESC guidelines.

  • Both ChatGPT 4.0 and Gemini 2.5 correctly answered nine of ten questions.
  • Emergency physicians achieved a mean score of 7.62±1.32 correct answers out of ten.
  • The difference was statistically significant (P<0.001).
  • AI models were queried using identical standardized prompts with temperature=0 and no web access on April 20, 2025.

Effect sizes between AI and physician performance varied by physician experience level.

  • Effect sizes indicated a 'very large difference' for less experienced physicians compared to AI models.
  • Effect sizes indicated a 'moderate difference' for specialists compared to AI models.
  • AI performance exceeded even the most experienced physicians.

Physician performance on NSTEMI questions improved with clinical experience.

  • Performance increased as experience level increased, though AI still exceeded even the most experienced group.
  • The study included 1,106 emergency physicians in Turkey surveyed via an online survey.
  • Participants from training and research hospitals scored higher than those from state hospitals.

The study was conducted as a prospective, cross-sectional online survey among emergency physicians in Turkey.

  • Survey included ten NSTEMI-related multiple-choice questions based on the 2023 European Society of Cardiology guidelines.
  • A total of 1,106 emergency physicians participated.
  • The same ten questions were presented to both ChatGPT 4.0 and Gemini 2.5.
  • Statistical analyses were performed using SPSS 26.0.
  • The study acknowledges limitations including a non-proctored online setting and absence of real clinical context.

The authors highlight AI's potential to enhance medical education, clinical decision support, and patient care in emergency settings.

  • ChatGPT and Gemini demonstrated superior performance over emergency physicians in NSTEMI clinical questions.
  • The authors call for future research focusing on optimizing AI-clinician collaboration for safe and effective integration.
  • The findings are described as limited by the non-proctored online setting and absence of real clinical context.

Have a question about this study?

Citation

Yorganc&#x131;o&#x11f;lu M, Saglam Gurmen E. (2026). Performance of ChatGPT and Gemini Compared with Emergency Physicians in NSTEMI Cases: A Prospective Cross-sectional Study.. Archives of Iranian medicine. https://doi.org/10.34172/aim.35274