A dual-scenario hybrid machine learning framework integrating unsupervised clustering with supervised classification using noninvasive body composition features achieved up to 98.23% accuracy for hypertension risk prediction, with clustering augmentation improving generalization particularly for ensemble-based learners.
Key Findings
Results
K-Means clustering identified five physiological subgroups among hypertensive individuals with validated cluster quality metrics.
Clustering was performed exclusively on hypertensive individuals using an unsupervised approach inspired by self-labeling principles.
Cluster quality was validated using Silhouette index (0.3371), Davies-Bouldin index (1.0094), and Calinski-Harabasz index (720.10).
Significant intercluster variability was observed across key indicators including FATP, RLFATP, LLFATP, FATM, and age (p < 0.001).
Results
In Scenario 1, SVM with random oversampling achieved the best performance for hypertensive subgroup discrimination.
SVM with random oversampling achieved accuracy = 99.08%, F1 = 98.04%, and AUC = 99.98%.
Five models were evaluated for subgroup classification within the hypertensive population.
This scenario prioritized interpretability through subgroup discovery rather than binary classification.
Results
In Scenario 2, the ExtraTrees classifier on the cluster-augmented dataset achieved superior binary classification performance between healthy and hypertensive subjects.
ExtraTrees achieved accuracy = 98.23%, recall = 98.30%, precision = 98.17%, F1 = 98.23%, and AUC = 99.87%.
Five models were tested: ExtraTrees, KNN, SVM, Gaussian Naive Bayes, and Decision Tree, across multiple configurations.
The cluster-augmented dataset yielded the best results compared to non-augmented configurations.
Scenario 2 demonstrated the highest predictive accuracy and stability overall.
Results
Clustering and feature selection both improved model generalization, particularly for ensemble-based learners.
The cluster-augmented dataset consistently outperformed non-augmented datasets across configurations.
Ensemble-based learners such as ExtraTrees showed the greatest benefit from clustering augmentation.
Feature selection combined with clustering contributed to improved generalization in Scenario 2.
Methods
The study used noninvasive body composition features as predictors for hypertension risk, including fat-related physiological measures.
Key features included FATP (fat percentage), RLFATP (right leg fat percentage), LLFATP (left leg fat percentage), FATM (fat mass), and age.
The framework was designed to be noninvasive, aimed at enhancing both interpretability and predictive reliability.
The dual-scenario design allowed both subgroup discovery (Scenario 1) and binary classification (Scenario 2) using the same body composition data.
Conclusions
Integrating unsupervised clustering with supervised classification produced a robust and explainable framework for personalized hypertension risk prediction.
Scenario 1 provided interpretability through physiological subgroup discovery among hypertensive individuals.
Scenario 2 provided higher predictive accuracy and stability for distinguishing healthy from hypertensive subjects.
The authors conclude this approach contributes to early detection and precision healthcare for hypertension management.
Mirzaye A, Saadatfar H, Nematollahi M, Banerjee B. (2026). Enhancing Hypertension Risk Diagnosis Using a Hybrid Machine Learning Framework: Leveraging Body Composition Data.. BioMed research international. https://doi.org/10.1155/bmri/6335947