Using machine learning for wearables to understand the association of sleep with future morbidity

Abstract

Sleep is essential to life and is structurally complex. Our understanding of how sleep is associated with health and morbidity primarily draws on studies that use self-report sleep diaries, which capture the subjective experience. However, sleep diaries are limited because they are measured at one point in time, single dimensional, often only capturing sleep duration, and have a low correlation with objective device-measured sleep parameters. The accepted standard for sleep measurement is laboratory-based polysomnography, which is not feasible for use at scale due to its high cost and technical complexity. Instead, wrist-worn accelerometers are more viable to deploy in large-scale epidemiological studies because of their portability and low user burden. Therefore, I aimed to develop a machine learning method for sleep stage classification and evaluate its utility to provide insights into the association between sleep and health outcomes. I conducted a systematic review to assess the agreement between accelerometer-based sleep staging and polysomnography with a secondary aim of understanding the agreement level from different methods used. This review found that existing sleep staging methods were limited by a reliance on hand-crafted features and the use of labelled datasets of small sizes, highlighting the need to improve accelerometer-based sleep staging. A self-supervised deep neural network was first developed to automatically extract features from 700,000 person-days of unlabelled raw accelerometry. To systematically evaluate the generalisability of the self-supervised features, the pre-trained network was tested in seven human activity recognition datasets, for which more open-access benchmark datasets were available. The self-supervised network showed generalisability across activity classes, devices, device placements and populations with an F1 relative improvement of 2.5%-100% (median: 18.4%) compared to the network without self-supervision. The self-supervised feature extractor was then used to develop a sleep stage classifier (SleepNet) using a deep recurrent neural network. The SleepNet was able to obtain a state-of-the-art performance for the classification of sleep and the stages of sleep using ~1,500 nights of multi-centre polysomnography as the ground truth. Overall, the derived sleep parameters had a fair agreement with polysomnography with a Kappa score of 0.37 (SD: 0.16) for three-class classification between wake, rapid-eye-movement sleep (REM), and non-rapid-eye-movement sleep (NREM). The difference between polysomnography and the model classifications on the external validation was 34.7 minutes (95% limits of agreement (LoA): -37.8 to 107.2 minutes) for total sleep duration, 2.6 minutes for REM duration (95% LoA: -68.4 to 73.4 minutes) and 32.1 minutes (95% LoA: -54.4 to 118.5 minutes) for NREM duration. Finally, SleepNet was used to infer overnight sleep duration and sleep efficiency (the proportion of time asleep when in bed) in the UK Biobank accelerometer dataset to understand the associations with mortality outcomes. Short sleepers (<6 hours) had a higher risk of mortality compared to participants with normal sleep duration 6 to 7.9 hours, regardless of whether they had low sleep efficiency (Hazard ratios (HRs): 1.58; 95% confidence intervals (CIs): 1.19 to 2.11) or high sleep efficiency (HRs: 1.45; 95% CIs: 1.16 to 1.81). In conclusion, accelerometer-based sleep classification had a fair agreement with polysomnography. Using the derived sleep parameters on datasets with longitudinal follow-up could transform our understanding of how sleep contributes to human health and well-being

    Similar works