Body Composition

Integrating body composition analysis and machine learning for non-invasive identification of metabolic dysfunction-associated fatty liver disease: a large-scale health examination-based study.

TL;DR

Tree-based machine learning models integrating body composition parameters, particularly visceral fat rating, achieved high discriminative performance (AUC > 0.96 internal, > 0.95 external) for non-invasive identification of MAFLD in a large health examination cohort.

Key Findings

Tree-based machine learning algorithms achieved the highest discriminative performance for MAFLD classification among eight models evaluated.

  • Extreme gradient boosting, gradient boosting decision tree, and LightGBM achieved the highest performance
  • Internal validation AUC values exceeded 0.96 for these tree-based models
  • External validation AUC values were above 0.95 for these models
  • Performance was evaluated using tenfold cross-validation internally and an independent external cohort
  • Eight machine learning models in total were constructed and compared

Visceral fat rating was consistently the most important predictor of MAFLD across all machine learning models and subgroups.

  • Visceral fat rating ranked as the top predictor in model-based importance analysis
  • It was followed by waist circumference and body mass index as the next most important features
  • Visceral fat remained a robust predictor across all stratified subgroups including sex, age, and BMI groups
  • Logistic regression confirmed independent associations of visceral fat rating with MAFLD after adjustment for key confounders

The study used a large retrospective cohort of 23,348 adults for model development with an independent external validation cohort of 3,357 participants.

  • Primary cohort included 23,348 adults who underwent health check-ups between 2017 and 2021 at a tertiary hospital in China
  • External validation cohort comprised 3,357 participants from 2022 to 2023
  • Body composition was assessed via bioelectrical impedance analysis (BIA)
  • MAFLD was diagnosed based on hepatic steatosis plus metabolic risk criteria
  • A total of 13 features including body composition indicators and basic demographics were initially considered

Feature selection was guided by multicollinearity diagnostics and model-based importance analysis, resulting in a refined set of predictors.

  • Initial feature pool included 13 variables comprising body composition indicators and basic demographics
  • Multicollinearity diagnostics were applied to identify and address redundant features
  • Model-based importance analysis further guided the final feature selection
  • The final selected features included visceral fat rating, waist circumference, and body mass index as top contributors

Stratified analyses revealed variable patterns in MAFLD prediction across sex, age, and BMI groups.

  • Stratified analyses were conducted across sex, age, and body mass index subgroups
  • Patterns of predictor importance varied across these demographic and anthropometric strata
  • Visceral fat rating remained a robust predictor in all subgroups despite variable patterns in other predictors
  • Logistic regression confirmed independent associations with MAFLD after adjustment for key confounders within subgroups

Body composition analysis via bioelectrical impedance analysis was evaluated as a non-invasive approach for MAFLD screening in routine health examination settings.

  • MAFLD is described as closely linked to obesity, insulin resistance, and metabolic syndrome
  • Conventional indicators used in routine health examinations often fail to capture deeper metabolic disturbances
  • BIA-derived body composition parameters were used as the primary input features for the machine learning models
  • The authors concluded these parameters support 'scalable screening and aiding diagnostic assessment in routine health examination, clinical, and public health settings'

Model performance was evaluated using multiple metrics including AUC, accuracy, recall, F1 score, and calibration metrics.

  • Area under the receiver operating characteristic curve (AUC) was a primary performance metric
  • Additional metrics included accuracy, recall, F1 score, and calibration metrics
  • Tenfold cross-validation was used for internal validation
  • An independent external cohort from 2022 to 2023 was used to assess generalizability

Have a question about this study?

Citation

He Y, Cao Y, Chen Z, Xiang R, Wang F. (2026). Integrating body composition analysis and machine learning for non-invasive identification of metabolic dysfunction-associated fatty liver disease: a large-scale health examination-based study.. Scientific reports. https://doi.org/10.1038/s41598-026-37852-w