Sleep

A multimodal sleep foundation model for disease prediction.

TL;DR

SleepFM, a multimodal sleep foundation model trained on over 585,000 hours of PSG recordings, accurately predicts 130 conditions with a C-Index of at least 0.75 from a single night of sleep, including all-cause mortality, dementia, and cardiovascular diseases.

Key Findings

SleepFM accurately predicts 130 conditions with a C-Index of at least 0.75 from a single night of sleep.

  • Bonferroni-corrected P < 0.01 for all 130 conditions
  • Predictions include all-cause mortality (C-Index 0.84), dementia (0.85), myocardial infarction (0.81), heart failure (0.80), chronic kidney disease (0.79), stroke (0.78), and atrial fibrillation (0.78)
  • The model was trained on over 585,000 hours of PSG recordings from approximately 65,000 participants across several cohorts
  • Disease risk predictions are derived from latent sleep representations that capture physiological and temporal structure of sleep

SleepFM was trained using a new contrastive learning approach that accommodates multiple PSG configurations.

  • The model is described as a multimodal sleep foundation model
  • Training data came from a curated dataset across several cohorts totaling approximately 65,000 participants
  • The contrastive learning approach was designed to address challenges in standardization, generalizability, and multimodal integration
  • The approach enables scalable, label-efficient analysis of polysomnography data

SleepFM demonstrates strong transfer learning performance on the Sleep Heart Health Study dataset, which was excluded from pretraining.

  • The Sleep Heart Health Study dataset was held out entirely from the pretraining process
  • Transfer learning performance on this out-of-distribution dataset was described as 'strong'
  • This demonstrates the model's generalizability to novel datasets not seen during training
  • Transfer learning capability supports the model's potential for broad clinical application

SleepFM performs competitively with specialized sleep-staging models on common sleep analysis tasks.

  • Achieved mean F1 scores of 0.70–0.78 for sleep staging
  • Achieved accuracy of 0.69 for classifying sleep apnea severity
  • Achieved accuracy of 0.87 for classifying sleep apnea presence
  • Compared favorably against specialized models U-Sleep and YASA
  • These results were achieved despite SleepFM being a general-purpose foundation model rather than task-specific

Polysomnography is underutilized as a clinical tool due to challenges in standardization, generalizability, and multimodal integration.

  • PSG is described as 'the gold standard for sleep analysis'
  • PSG captures rich physiological signals across multiple modalities
  • Prior to SleepFM, these challenges limited the broader application of PSG data for disease prediction
  • The complex relationship between sleep and disease remains 'poorly understood' according to the authors

SleepFM produces latent sleep representations that capture the physiological and temporal structure of sleep.

  • These representations are described as enabling 'accurate prediction of future disease risk'
  • The representations are derived from a single night of PSG recording
  • The model integrates multiple PSG signal modalities into unified latent representations
  • The authors describe this as the model learning 'the language of sleep from multimodal sleep recordings'

What This Means

This research suggests that a single night of sleep, as measured by polysomnography (a comprehensive sleep study that records brain waves, heart rate, breathing, and other signals), contains enough information to predict a person's risk of developing dozens of serious health conditions years into the future. The researchers developed an artificial intelligence system called SleepFM, trained on sleep recordings from approximately 65,000 people totaling over 585,000 hours of data. The model learned patterns across all the physiological signals recorded during sleep simultaneously, rather than analyzing each signal in isolation. The AI was able to predict 130 different health conditions with meaningful accuracy from just one night of sleep data. For example, it predicted all-cause mortality with a C-Index of 0.84 and dementia with a C-Index of 0.85 (where 0.5 would be random chance and 1.0 would be perfect prediction). It also performed well on standard sleep analysis tasks like identifying sleep stages and detecting sleep apnea, performing comparably to specialized AI tools built specifically for those purposes. Importantly, the model generalized well to a dataset it had never seen during training, suggesting it learned broadly applicable features of sleep physiology rather than just memorizing patterns from its training data. This research suggests that sleep studies, which are already collected for diagnosing sleep disorders, could potentially be repurposed as a broad health screening tool. Rather than sleep data being used only to diagnose conditions like sleep apnea, AI models like SleepFM could extract much richer health information from the same recordings, potentially identifying people at elevated risk for heart disease, kidney disease, stroke, dementia, and many other conditions before symptoms appear. This could make sleep studies more valuable in clinical practice and support earlier, more targeted preventive care.

Have a question about this study?

Citation

Thapa R, Kjaer M, He B, Covert I, Moore Iv H, Hanif U, et al.. (2026). A multimodal sleep foundation model for disease prediction.. Nature medicine. https://doi.org/10.1038/s41591-025-04133-4