Cardiovascular

Machine Learning-Based Risk Prediction for Coronary Heart Disease Complicated by Hyperhomocysteinemia: Retrospective Study.

TL;DR

The LightGBM model demonstrated high accuracy and interpretability in forecasting CHD risk among patients with HHcy, with age and activated partial thromboplastin time identified as the most influential predictors via SHAP analysis.

Key Findings

The LightGBM model achieved superior performance among seven machine learning models for predicting CHD risk in patients with hyperhomocysteinemia.

  • LightGBM achieved an area under the receiver operating characteristic curve of 0.807 and F1-score of 0.606 on the test set
  • The model demonstrated good calibration with a Brier score of 0.2415
  • Seven models were compared: logistic regression, k-nearest neighbor, decision tree, random forest, extreme gradient boost, LightGBM, and stacking
  • LightGBM also yielded high clinical net benefit as assessed by decision curve analysis

SHAP analysis identified age and activated partial thromboplastin time as the most influential predictors of CHD risk in patients with hyperhomocysteinemia.

  • SHAP (Shapley Additive Explanation) algorithms were applied to interpret the optimal LightGBM model
  • The ranked order of predictor importance was: age, activated partial thromboplastin time, hypertension, weight, carotid plaque, and continuous drinking history
  • Six core variables were used as model inputs in total

Six core variables were selected as inputs for the machine learning models predicting CHD risk in HHcy patients.

  • The six variables were: age, weight, hypertension, continuous drinking history, activated partial thromboplastin time (APTT), and carotid plaque
  • A correlation heat map illustrated low collinearity among the selected variables, ensuring model stability
  • Variable selection was performed prior to model construction

This single-center retrospective study used electronic medical records from patients diagnosed with hyperhomocysteinemia, split into training, validation, and test sets.

  • Total dataset was randomly divided into training (n=364, 70%), validation (n=78, 15%), and test (n=78, 15%) sets
  • Data were collected from a single center via electronic medical records
  • The study design was retrospective
  • Performance evaluation metrics included AUC-ROC, accuracy, F1-score, calibration curve, Brier score, and decision curve analysis

Have a question about this study?

Citation

Du M, Lyu M, Liu H, Li Y, Yan H, Li X. (2026). Machine Learning-Based Risk Prediction for Coronary Heart Disease Complicated by Hyperhomocysteinemia: Retrospective Study.. JMIR medical informatics. https://doi.org/10.2196/80809