Cardiovascular

Development and validation of an interpretable machine learning model for early risk prediction of acute myocardial infarction.

Cui S, Gao L, et al. • International journal of medical informatics • 2026

PubMed 42172727 DOI 10.1016/j.ijmedinf.2026.106489

TL;DR

An interpretable machine learning model using XGBoost with 108-dimensional clinical features effectively predicted acute myocardial infarction risk, achieving accuracy of 0.864 on test data and 0.932 on external validation, with SHAP analysis identifying Hs-cTnI as the primary predictor alongside nine additional clinical features.

Key Findings

Results

The weighted XGBoost model achieved the best overall performance among all ML algorithms tested for AMI risk prediction.

Accuracy of 0.864 on the test set
F1-score of 0.797 on the test set
Prediction uncertainty was lower than 0.01 on the test set
Model was tuned using GridSearchCV hyperparameter optimization and evaluated via 5-fold cross-validation

Results

The model was externally validated on an independent cohort collected from January 2025 to April 2025, demonstrating strong generalizability.

External validation cohort consisted of 532 patients
488 patients fell within the applicability domain of the model
Accuracy on the independent validation dataset (within applicability domain) was 0.932
The external cohort was collected from a separate time period (January–April 2025) relative to the training cohort (January 2020–January 2024)

Results

SHAP analysis identified Hs-cTnI (high-sensitivity cardiac troponin I) as the primary predictor of AMI risk.

SHAP (SHapley Additive exPlanations) method was used to interpret the model and rank feature importance
The top 10 features selected were: Hs-cTnI, NT-proBNP, LDL-C, CG (creatinine-based glomerular filtration), D-dimer, AST, PLT (platelet count), GLU (glucose), female sex, and BMI
SHAP analysis revealed nonlinear interactions among metabolic profile, coagulation status, and demographic factors
The top 10 features were used to simplify the model while maintaining predictive performance

Methods

The study enrolled 7,939 patients from a single hospital using a retrospective cohort design with 108-dimensional clinical features.

Patients were enrolled from the Second Hospital of Shandong University from January 2020 to January 2024
108-dimensional clinical features were collected, composed of epidemiological data and biochemical data
Data preprocessing was applied prior to model construction
Multiple ML algorithms were tested with GridSearchCV hyperparameter tuning

Results

An interactive web server embedding the optimal model was developed and made publicly accessible to facilitate clinical use.

The web server is available at https://www.mips.net.cn
The server is described as enhancing practicability of the model in clinical settings
The model was described as offering 'wide applicability and strong robustness'

Background

Traditional AMI risk assessment tools were identified as limited due to reliance on a restricted number of variables and static thresholds.

The paper notes that AMI remains a leading cause of global morbidity and mortality
Early prediction was identified as critical for timely intervention
The ML approach was proposed to overcome the limitations of traditional tools by integrating multimodal clinical data

What This Means

This research suggests that a machine learning model can predict the risk of acute myocardial infarction (heart attack) more effectively than traditional methods by analyzing a wide range of patient data. Using records from nearly 8,000 hospital patients, researchers trained several types of machine learning algorithms and found that an XGBoost-based model performed best, correctly classifying patients about 86% of the time on test data and 93% of the time when tested on an entirely new group of patients collected later. The model analyzed 108 different clinical measurements, including blood tests and patient characteristics, to make its predictions. To make the model understandable to clinicians, the researchers used a technique called SHAP analysis, which explains which factors most influenced each prediction. The ten most important predictors identified were: a highly sensitive heart enzyme test (Hs-cTnI), a heart failure marker (NT-proBNP), bad cholesterol (LDL-C), a kidney function measure (CG), a clotting marker (D-dimer), a liver enzyme (AST), platelet count, blood glucose, female sex, and BMI. This combination of factors reflects how heart attack risk involves interacting metabolic, clotting, and demographic factors in complex, non-linear ways that simple scoring systems may miss. The researchers also built a publicly available website (https://www.mips.net.cn) where clinicians can enter patient data and receive a risk prediction from the model. This research suggests that integrating a broad range of routine clinical data into an interpretable machine learning tool could help healthcare providers identify patients at risk for heart attacks earlier, potentially enabling faster and more targeted interventions. However, as this is a single-center retrospective study, further validation across diverse populations and healthcare settings would be important before widespread adoption.

Have a question about this study?

Citation

Cui S, Gao L, Zhang N, Zhang H, Gong N. (2026). Development and validation of an interpretable machine learning model for early risk prediction of acute myocardial infarction.. International journal of medical informatics. https://doi.org/10.1016/j.ijmedinf.2026.106489

Key Findings

What This Means

Have a question about this study?

Related Research

Citation