Body Composition

Body composition phenotyping of obesity in children aged 6-18 years: multi-strategy clustering and interpretable machine learning.

Wang Y, Shi S, Cai J • Annals of human biology • 2026

PubMed 42233406 DOI 10.1080/03014460.2026.2661592

TL;DR

Four body composition-based phenotypes of childhood obesity were consistently identified using complementary clustering approaches, with cross-validated random forest classification achieving 85.9% accuracy and macro AUC 0.953, and SHAP analysis prioritizing fat mass and BMI as the most discriminative features.

Key Findings

Results

Four distinct body composition phenotypes of obesity were identified in children aged 6-18 years using multiple unsupervised clustering strategies.

The four phenotypes were: low-fat/high-muscle, balanced, high-fat/low-muscle, and mixed high-fat/high-muscle.
Three complementary clustering approaches were used: PCA-K-means, HCPC, and UMAP-DBSCAN.
The study enrolled 78 obese children from a single-centre outpatient clinic in a cross-sectional design.
Thirteen BIA-derived indices were Z-standardised for analysis.

Results

Cluster stability was high across all four phenotypes as measured by Jaccard bootstrap resampling.

Jaccard bootstrap was performed with 100 resamples.
Cluster-wise Jaccard values were 0.789, 0.812, 0.835, and 0.858, with a range of 0.78-0.86.
The adjusted Rand index (ARI) between clustering solutions was 0.94, indicating strong agreement across methods.
Mean silhouette score was 0.41, reflecting moderate cluster separability.

Results

A random forest classifier achieved high cross-validated performance in distinguishing the four body composition phenotypes.

Out-of-fold accuracy was 85.9% using five-fold cross-validation.
Macro AUC was 0.953 and micro AUC was 0.965.
The classifier was trained on BIA-derived body composition indices.
SHAP (SHapley Additive exPlanations) was used to interpret feature contributions.

Results

SHAP analysis identified fat mass and BMI as the most important features for discriminating between phenotypes.

Fat mass and BMI were prioritized as the top SHAP-ranked features.
BIA-derived basal metabolic indices contributed as device-estimated outputs.
Lean mass and skeletal muscle mass showed opposing contributions in SHAP values.
SHAP was applied to provide interpretability for the random forest classifier.

Background

BMI alone does not fully capture body composition heterogeneity in childhood obesity, motivating the use of BIA-derived measures for phenotyping.

The study used 13 BIA-derived indices beyond BMI to characterize body composition.
The existence of four distinct phenotypes—including a 'mixed high-fat/high-muscle' group and a 'low-fat/high-muscle' group—demonstrates heterogeneity not detectable by BMI alone.
The authors state that 'body composition heterogeneity in childhood obesity is not fully captured by BMI.'

Conclusions

The authors explicitly characterize this study as exploratory and hypothesis-generating, requiring external validation before any clinical application.

The framework is described as 'exploratory, single-centre' and 'hypothesis-generating.'
External validation with 'outcome-linked markers and imaging' is stated as necessary before clinical applicability.
The sample size was limited to 78 participants from a single outpatient clinic.
The cross-sectional design precludes causal or longitudinal inference.

What This Means

This research suggests that children with obesity are not all the same when it comes to their body composition—that is, how much fat, muscle, and other tissue they have. Using body measurements taken with a device called a bioelectrical impedance analyzer (BIA), researchers studied 78 obese children aged 6 to 18 and found four distinct groups: one with relatively low fat and high muscle, one with a balanced profile, one with high fat and low muscle, and one with both high fat and high muscle. These groupings were found consistently across three different mathematical clustering methods and were stable when tested repeatedly, suggesting the patterns are real and not just random noise. To confirm that these four groups were genuinely distinguishable, the researchers trained a machine learning model (random forest) to classify children into the groups, and it performed very well—correctly identifying group membership about 86% of the time. An interpretability tool called SHAP revealed that fat mass and BMI were the most important factors for telling the groups apart, while lean mass and skeletal muscle mass also played a role but in opposing directions depending on the group. This suggests that while BMI is relevant, it misses important differences in how fat and muscle are distributed in obese children. This research suggests that childhood obesity may be better understood as several distinct body composition subtypes rather than a single condition. This could eventually matter for tailoring health interventions—for example, a child with high fat and low muscle might have different health risks and needs than one with high fat and high muscle. However, the authors are clear that this is a small, single-clinic study and that these findings are preliminary. Much larger studies tracking health outcomes over time, and using imaging methods to verify the BIA measurements, would be needed before these phenotypes could be used in clinical practice.

Check Your Own Numbers

Upload your bloodwork. We'll cross-reference your results against this study and 4,700 others.

Upload Your Labs

Have a question about this study?

Citation

Wang Y, Shi S, Cai J. (2026). Body composition phenotyping of obesity in children aged 6-18 years: multi-strategy clustering and interpretable machine learning.. Annals of human biology. https://doi.org/10.1080/03014460.2026.2661592

Key Findings

What This Means

Check Your Own Numbers

Have a question about this study?

Related Research

Citation