A dual-scenario hybrid machine learning framework integrating unsupervised clustering with supervised classification using noninvasive body composition features achieved superior hypertension risk prediction, with ExtraTrees classifier on cluster-augmented data yielding accuracy of 98.23% and AUC of 99.87%.
Key Findings
Results
Five physiological subgroups were identified among hypertensive individuals via K-Means clustering with validated cluster quality metrics.
Clustering was performed exclusively on hypertensive individuals using an unsupervised approach inspired by self-labeling principles.
Cluster quality was validated using Silhouette index (0.3371), Davies-Bouldin index (1.0094), and Calinski-Harabasz index (720.10).
Significant intercluster variability was observed across key indicators including FATP, RLFATP, LLFATP, FATM, and age (p < 0.001).
Results
In Scenario 1, the SVM model with random oversampling achieved the best performance for hypertensive subgroup discrimination.
SVM with random oversampling achieved accuracy = 99.08%, F1 = 98.04%, and AUC = 99.98%.
Five models were tested for subgroup classification within the hypertensive population.
This scenario prioritized interpretability through subgroup discovery rather than binary healthy vs. hypertensive classification.
Results
In Scenario 2, the ExtraTrees classifier on a cluster-augmented dataset achieved the best binary classification performance between healthy and hypertensive subjects.
ExtraTrees achieved accuracy = 98.23%, recall = 98.30%, precision = 98.17%, F1 = 98.23%, and AUC = 99.87%.
Five models were evaluated: ExtraTrees, KNN, SVM, Gaussian Naive Bayes, and Decision Tree, across multiple configurations.
The cluster-augmented dataset outperformed non-augmented configurations, confirming the benefit of integrating clustering information.
Results
Clustering and feature selection both improved model generalization, particularly for ensemble-based learners.
The cluster-augmented dataset yielded the best overall results in Scenario 2.
Ensemble-based learners such as ExtraTrees showed the greatest benefit from clustering augmentation and feature selection.
Scenario 2 demonstrated the highest predictive accuracy and stability compared to Scenario 1.
Methods
The study used noninvasive body composition features as inputs for hypertension risk prediction.
Key features included FATP (fat percentage), RLFATP (right leg fat percentage), LLFATP (left leg fat percentage), FATM (fat mass), and age.
The noninvasive nature of body composition measurements was highlighted as contributing to clinical applicability.
The framework was designed to enhance both interpretability and predictive reliability using these features.
Conclusions
Integrating unsupervised clustering with supervised classification was found to offer a robust and explainable framework for personalized hypertension risk prediction.
Scenario 1 provided interpretability through subgroup discovery among hypertensive individuals.
Scenario 2 provided higher predictive accuracy and stability through binary classification.
The combined dual-scenario approach was presented as contributing to early detection and precision healthcare.
Mirzaye A, Saadatfar H, Nematollahi M. (2026). Enhancing Hypertension Risk Diagnosis Using a Hybrid Machine Learning Framework: Leveraging Body Composition Data.. BioMed research international. https://doi.org/10.1155/bmri/6335947