Cardiovascular

Machine Learning-Based Risk Prediction for Coronary Heart Disease Complicated by Hyperhomocysteinemia: Retrospective Study.

Du M, Lyu M, et al. • JMIR medical informatics • 2026

TL;DR

The LightGBM model demonstrated high accuracy and interpretability in forecasting CHD risk among patients with HHcy, with age and activated partial thromboplastin time identified as the most influential predictors via SHAP analysis.

Key Findings

Results

The LightGBM model achieved superior performance among seven machine learning models for predicting CHD risk in patients with hyperhomocysteinemia.

LightGBM achieved an area under the receiver operating characteristic curve of 0.807 and F1-score of 0.606 on the test set
The model demonstrated good calibration with a Brier score of 0.2415
Seven models were compared: logistic regression, k-nearest neighbor, decision tree, random forest, extreme gradient boost, LightGBM, and stacking
LightGBM also yielded high clinical net benefit as assessed by decision curve analysis

Results

SHAP analysis identified age and activated partial thromboplastin time as the most influential predictors of CHD risk in patients with hyperhomocysteinemia.

SHAP (Shapley Additive Explanation) algorithms were applied to interpret the optimal LightGBM model
The ranked order of predictor importance was: age, activated partial thromboplastin time, hypertension, weight, carotid plaque, and continuous drinking history
Six core variables were used as model inputs in total

Methods

Six core variables were selected as inputs for the machine learning models predicting CHD risk in HHcy patients.

The six variables were: age, weight, hypertension, continuous drinking history, activated partial thromboplastin time (APTT), and carotid plaque
A correlation heat map illustrated low collinearity among the selected variables, ensuring model stability
Variable selection was performed prior to model construction

Methods

This single-center retrospective study used electronic medical records from patients diagnosed with hyperhomocysteinemia, split into training, validation, and test sets.

Total dataset was randomly divided into training (n=364, 70%), validation (n=78, 15%), and test (n=78, 15%) sets
Data were collected from a single center via electronic medical records
The study design was retrospective
Performance evaluation metrics included AUC-ROC, accuracy, F1-score, calibration curve, Brier score, and decision curve analysis

Have a question about this study?

Citation

Du M, Lyu M, Liu H, Li Y, Yan H, Li X. (2026). Machine Learning-Based Risk Prediction for Coronary Heart Disease Complicated by Hyperhomocysteinemia: Retrospective Study.. JMIR medical informatics. https://doi.org/10.2196/80809

Key Findings

Have a question about this study?

Related Research

Citation