By modeling disease outcomes from the UK Biobank to predict pseudo-outcomes in the Human Phenotype Project cohort, the authors identified individual biomarkers across gut microbiome, liver ultrasound, and other modalities, recapitulating known biomarkers and revealing less-attested ones, while also identifying systemic and sex-specific biomarkers correlated with many diseases.
Key Findings
Background
Integrating UK Biobank disease models with Human Phenotype Project data enabled identification of biomarkers across multiple modalities not available in large longitudinal cohorts.
The Human Phenotype Project (HPP) contains data modalities not found in the UK Biobank (UKBB), including microbiome, liver ultrasound, and continuous glucose monitoring.
The UKBB provides a much larger cohort and longer follow-up durations with large numbers of disease outcomes already tracked.
Disease outcomes were modeled in the UKBB and used to predict 'pseudo-outcomes' in the HPP, which were then correlated with unique HPP measurements.
The framework enables transfer of knowledge from large longitudinal cohorts to smaller, more deeply phenotyped cohorts.
Results
The pseudo-outcome modeling approach successfully recapitulated known biomarkers across the spectrum of diseases studied.
Known biomarkers were recapitulated 'across the spectrum of diseases studied.'
Biomarkers were identified from gut microbiome, liver ultrasound, and other modalities.
The method also revealed 'less-attested biomarkers in a range of different modalities.'
Results
Multivariate analysis identified the contribution of each measurement modality in predicting each disease pseudo-outcome.
Multivariate analysis was applied to assess the relative contribution of each modality.
Modalities analyzed included gut microbiome, liver ultrasound, continuous glucose monitoring, and others available in the HPP.
This analysis provided 'a broad perspective across the landscape of many diseases through the lens of many modalities.'
Results
Systemic biomarkers correlated with many diseases were identified across the cohort.
The study explicitly identified 'systemic biomarkers correlated with many diseases' as a distinct category of finding.
These systemic biomarkers were identified through correlation with predicted pseudo-outcomes across multiple disease models.
Results
Sex-specific biomarkers with higher correlation to a pseudo-outcome for one sex compared to the other were identified.
The study identified 'sex-specific biomarkers with higher correlation to a pseudo-outcome for one sex as compared to the other.'
Sex-specific biomarkers were identified across the range of disease pseudo-outcomes modeled.
This represents a distinct analytical output of the pseudo-outcome correlation framework.
Pellow D, Geva G, Godneva A, Reisner Y, Talmor-Barkan Y, Segal E. (2026). Analysis of biomarkers in the Human Phenotype Project using disease models from UK Biobank.. Med (New York, N.Y.). https://doi.org/10.1016/j.medj.2025.100993