SleepGPT, a time-frequency foundation model based on generative pretrained transformer, sets a new benchmark for sleep decoding tasks by achieving superior performance in sleep staging, pathology classification, data generation, and spindle detection across diverse polysomnography datasets.
Key Findings
Methods
SleepGPT was pretrained on a large-scale polysomnography dataset comprising 86,335 hours of recordings from 8,377 subjects using a multi-pretext pretraining strategy.
The model is based on a generative pretrained transformer architecture adapted for sleep decoding.
Pretraining used polysomnography (PSG) data, which typically includes multiple physiological channels recorded during sleep.
A multi-pretext pretraining strategy was employed to enable the model to learn diverse representations from the data.
The scale of pretraining data (86,335 hours from 8,377 subjects) represents a large foundation for a sleep-specific model.
Methods
SleepGPT incorporates a unified time-frequency fusion module that enables deep cross-domain interaction between time-domain and frequency-domain information.
Prior prevailing deep-learning models used dual encoders that isolate time-domain and frequency-domain information, limiting generalizability.
The unified time-frequency fusion module contrasts with the dual-encoder approach by enabling joint processing of both domains.
This architectural choice was designed to overcome limitations of task-specific designs in existing models.
The module is described as enabling 'deep cross-domain interaction' rather than parallel but separate processing.
Methods
SleepGPT includes a channel-adaptive mechanism that accommodates variable channel configurations across different PSG datasets.
Different PSG datasets may record different numbers and types of physiological channels (e.g., EEG, EOG, EMG).
The channel-adaptive mechanism allows the model to handle these varying configurations without requiring dataset-specific retraining.
This feature contributes to the model's generalizability across diverse PSG datasets.
The mechanism is presented as a key architectural innovation enabling the 'all-in-one' nature of SleepGPT.
Results
SleepGPT achieved superior performance in sleep staging compared to existing approaches across diverse PSG datasets.
Evaluations were conducted across diverse PSG datasets to assess generalizability.
Sleep staging involves classifying sleep into standard stages (e.g., Wake, N1, N2, N3, REM).
SleepGPT is described as setting 'a new benchmark' for sleep staging performance.
The model outperformed supervised, task-specific deep-learning models in this task.
Results
SleepGPT demonstrated superior performance in sleep-related pathology classification.
Sleep-related pathology classification is one of the four evaluated sleep decoding tasks.
The model's performance on pathology classification was part of its benchmark-setting evaluation across diverse datasets.
This task tests the model's ability to generalize beyond sleep staging to clinically relevant disorder identification.
The model's unified architecture enabled strong performance on this task without task-specific design modifications.
Results
SleepGPT achieved superior performance in sleep data generation as evaluated across diverse PSG datasets.
Sleep data generation is identified as one of the four primary sleep decoding tasks evaluated.
The generative pretrained transformer architecture is particularly suited to data generation tasks.
Strong generation performance suggests the model learned meaningful representations of sleep physiology.
This capability is noted as part of the model's 'all-in-one' functionality.
Results
SleepGPT demonstrated superior performance in sleep spindle detection.
Sleep spindle detection is a fine-grained event detection task within sleep recordings.
Sleep spindles are brief bursts of oscillatory neural activity occurring during NREM sleep with relevance to memory and health.
The model's ability to detect spindles alongside other tasks demonstrates its versatility as a foundation model.
This is one of the four main sleep decoding tasks on which SleepGPT set a new benchmark.
Results
SleepGPT revealed channel- and stage-specific physiological patterns underlying sleep decoding.
The model's analysis uncovered patterns specific to individual recording channels (e.g., specific EEG leads).
Stage-specific patterns were identified, meaning different sleep stages have distinct physiological signatures captured by the model.
These findings provide interpretability insights into what the model learned during pretraining.
The discovery of these patterns is presented as an additional scientific contribution beyond benchmark performance.
Background
Prevailing deep-learning models for sleep decoding rely on supervised, task-specific designs and dual encoders that isolate time-domain and frequency-domain information, limiting generalizability and scalability.
This limitation motivated the development of SleepGPT as a foundation model approach.
Dual encoders process time-domain and frequency-domain information separately rather than jointly.
Task-specific designs require separate models or significant modification for different sleep decoding tasks.
The supervised-only approach limits the ability to leverage large unlabeled PSG datasets.
What This Means
This research introduces SleepGPT, an artificial intelligence system designed to analyze sleep recordings (polysomnography, or PSG) in a more flexible and comprehensive way than previous approaches. The system was trained on over 86,000 hours of sleep data from more than 8,000 people, allowing it to learn a broad range of patterns related to how the brain and body behave during sleep. Unlike previous AI models that were built separately for each specific task and processed different types of signal information (like brain waves over time versus brain wave frequencies) in isolation, SleepGPT combines all of this information in a unified way and can adapt to different types of sleep recording setups.
This research suggests that SleepGPT outperforms existing methods across four important sleep analysis tasks: identifying sleep stages (like deep sleep, light sleep, and REM sleep), classifying sleep-related disorders, generating realistic synthetic sleep data, and detecting specific sleep events called sleep spindles. The model also revealed interpretable patterns tied to specific recording channels and sleep stages, offering new insights into the physiology of sleep. These capabilities all come from a single model rather than requiring separate specialized tools for each task.
The practical significance of this work is that a single, adaptable AI model could potentially assist clinicians and researchers in analyzing sleep studies more efficiently and accurately, even when different hospitals or research centers use different recording setups. This research suggests that applying large-scale foundation model approaches — similar to those behind modern language AI — to sleep medicine could substantially improve how sleep disorders are detected and understood, and may support the development of better sleep health monitoring tools.
Huang W, Wang Y, Cheng H, Xu W, Li T, Wu X, et al.. (2026). A unified time-frequency foundation model for sleep decoding.. Nature communications. https://doi.org/10.1038/s41467-025-67970-4