Body Composition

Enhancing Hypertension Risk Diagnosis Using a Hybrid Machine Learning Framework: Leveraging Body Composition Data.

TL;DR

A dual-scenario hybrid machine learning framework integrating unsupervised clustering with supervised classification using noninvasive body composition features achieved up to 98.23% accuracy for hypertension risk prediction, with clustering augmentation improving generalization particularly for ensemble-based learners.

Key Findings

K-Means clustering identified five physiological subgroups among hypertensive individuals with validated cluster quality metrics.

  • Clustering was performed exclusively on hypertensive individuals using an unsupervised approach inspired by self-labeling principles.
  • Cluster quality was validated using Silhouette index (0.3371), Davies-Bouldin index (1.0094), and Calinski-Harabasz index (720.10).
  • Significant intercluster variability was observed across key indicators including FATP, RLFATP, LLFATP, FATM, and age (p < 0.001).

In Scenario 1, SVM with random oversampling achieved the best performance for hypertensive subgroup discrimination.

  • SVM with random oversampling achieved accuracy = 99.08%, F1 = 98.04%, and AUC = 99.98%.
  • Five models were evaluated for subgroup classification within the hypertensive population.
  • This scenario prioritized interpretability through subgroup discovery rather than binary classification.

In Scenario 2, the ExtraTrees classifier on the cluster-augmented dataset achieved superior binary classification performance between healthy and hypertensive subjects.

  • ExtraTrees achieved accuracy = 98.23%, recall = 98.30%, precision = 98.17%, F1 = 98.23%, and AUC = 99.87%.
  • Five models were tested: ExtraTrees, KNN, SVM, Gaussian Naive Bayes, and Decision Tree, across multiple configurations.
  • The cluster-augmented dataset yielded the best results compared to non-augmented configurations.
  • Scenario 2 demonstrated the highest predictive accuracy and stability overall.

Clustering and feature selection both improved model generalization, particularly for ensemble-based learners.

  • The cluster-augmented dataset consistently outperformed non-augmented datasets across configurations.
  • Ensemble-based learners such as ExtraTrees showed the greatest benefit from clustering augmentation.
  • Feature selection combined with clustering contributed to improved generalization in Scenario 2.

The study used noninvasive body composition features as predictors for hypertension risk, including fat-related physiological measures.

  • Key features included FATP (fat percentage), RLFATP (right leg fat percentage), LLFATP (left leg fat percentage), FATM (fat mass), and age.
  • The framework was designed to be noninvasive, aimed at enhancing both interpretability and predictive reliability.
  • The dual-scenario design allowed both subgroup discovery (Scenario 1) and binary classification (Scenario 2) using the same body composition data.

Integrating unsupervised clustering with supervised classification produced a robust and explainable framework for personalized hypertension risk prediction.

  • Scenario 1 provided interpretability through physiological subgroup discovery among hypertensive individuals.
  • Scenario 2 provided higher predictive accuracy and stability for distinguishing healthy from hypertensive subjects.
  • The authors conclude this approach contributes to early detection and precision healthcare for hypertension management.

Have a question about this study?

Citation

Mirzaye A, Saadatfar H, Nematollahi M, Banerjee B. (2026). Enhancing Hypertension Risk Diagnosis Using a Hybrid Machine Learning Framework: Leveraging Body Composition Data.. BioMed research international. https://doi.org/10.1155/bmri/6335947