Cardiovascular

Development and validation of an interpretable machine learning model for early risk prediction of acute myocardial infarction.

TL;DR

An interpretable machine learning model using XGBoost with 108-dimensional clinical features effectively predicted acute myocardial infarction risk, achieving accuracy of 0.864 on test data and 0.932 on external validation, with SHAP analysis identifying Hs-cTnI as the primary predictor alongside nine additional clinical features.

Key Findings

The weighted XGBoost model achieved the best overall performance among all ML algorithms tested for AMI risk prediction.

  • Accuracy of 0.864 on the test set
  • F1-score of 0.797 on the test set
  • Prediction uncertainty was lower than 0.01 on the test set
  • Model was tuned using GridSearchCV hyperparameter optimization and evaluated via 5-fold cross-validation

The model was externally validated on an independent cohort collected from January 2025 to April 2025, demonstrating strong generalizability.

  • External validation cohort consisted of 532 patients
  • 488 patients fell within the applicability domain of the model
  • Accuracy on the independent validation dataset (within applicability domain) was 0.932
  • The external cohort was collected from a separate time period (January–April 2025) relative to the training cohort (January 2020–January 2024)

SHAP analysis identified Hs-cTnI (high-sensitivity cardiac troponin I) as the primary predictor of AMI risk.

  • SHAP (SHapley Additive exPlanations) method was used to interpret the model and rank feature importance
  • The top 10 features selected were: Hs-cTnI, NT-proBNP, LDL-C, CG (creatinine-based glomerular filtration), D-dimer, AST, PLT (platelet count), GLU (glucose), female sex, and BMI
  • SHAP analysis revealed nonlinear interactions among metabolic profile, coagulation status, and demographic factors
  • The top 10 features were used to simplify the model while maintaining predictive performance

The study enrolled 7,939 patients from a single hospital using a retrospective cohort design with 108-dimensional clinical features.

  • Patients were enrolled from the Second Hospital of Shandong University from January 2020 to January 2024
  • 108-dimensional clinical features were collected, composed of epidemiological data and biochemical data
  • Data preprocessing was applied prior to model construction
  • Multiple ML algorithms were tested with GridSearchCV hyperparameter tuning

An interactive web server embedding the optimal model was developed and made publicly accessible to facilitate clinical use.

  • The web server is available at https://www.mips.net.cn
  • The server is described as enhancing practicability of the model in clinical settings
  • The model was described as offering 'wide applicability and strong robustness'

Traditional AMI risk assessment tools were identified as limited due to reliance on a restricted number of variables and static thresholds.

  • The paper notes that AMI remains a leading cause of global morbidity and mortality
  • Early prediction was identified as critical for timely intervention
  • The ML approach was proposed to overcome the limitations of traditional tools by integrating multimodal clinical data

What This Means

This research suggests that a machine learning model can predict the risk of acute myocardial infarction (heart attack) more effectively than traditional methods by analyzing a wide range of patient data. Using records from nearly 8,000 hospital patients, researchers trained several types of machine learning algorithms and found that an XGBoost-based model performed best, correctly classifying patients about 86% of the time on test data and 93% of the time when tested on an entirely new group of patients collected later. The model analyzed 108 different clinical measurements, including blood tests and patient characteristics, to make its predictions. To make the model understandable to clinicians, the researchers used a technique called SHAP analysis, which explains which factors most influenced each prediction. The ten most important predictors identified were: a highly sensitive heart enzyme test (Hs-cTnI), a heart failure marker (NT-proBNP), bad cholesterol (LDL-C), a kidney function measure (CG), a clotting marker (D-dimer), a liver enzyme (AST), platelet count, blood glucose, female sex, and BMI. This combination of factors reflects how heart attack risk involves interacting metabolic, clotting, and demographic factors in complex, non-linear ways that simple scoring systems may miss. The researchers also built a publicly available website (https://www.mips.net.cn) where clinicians can enter patient data and receive a risk prediction from the model. This research suggests that integrating a broad range of routine clinical data into an interpretable machine learning tool could help healthcare providers identify patients at risk for heart attacks earlier, potentially enabling faster and more targeted interventions. However, as this is a single-center retrospective study, further validation across diverse populations and healthcare settings would be important before widespread adoption.

Have a question about this study?

Citation

Cui S, Gao L, Zhang N, Zhang H, Gong N. (2026). Development and validation of an interpretable machine learning model for early risk prediction of acute myocardial infarction.. International journal of medical informatics. https://doi.org/10.1016/j.ijmedinf.2026.106489