Advances in camera-based physiological monitoring have enabled the robust,
non-contact measurement of respiration and the cardiac pulse, which are known
to be indicative of the sleep stage. This has led to research into camera-based
sleep monitoring as a promising alternative to "gold-standard" polysomnography,
which is cumbersome, expensive to administer, and hence unsuitable for
longer-term clinical studies. In this paper, we introduce SleepVST, a
transformer model which enables state-of-the-art performance in camera-based
sleep stage classification (sleep staging). After pre-training on contact
sensor data, SleepVST outperforms existing methods for cardio-respiratory sleep
staging on the SHHS and MESA datasets, achieving total Cohen's kappa scores of
0.75 and 0.77 respectively. We then show that SleepVST can be successfully
transferred to cardio-respiratory waveforms extracted from video, enabling
fully contact-free sleep staging. Using a video dataset of 50 nights, we
achieve a total accuracy of 78.8\% and a Cohen's κ of 0.71 in four-class
video-based sleep staging, setting a new state-of-the-art in the domain.Comment: CVPR 2024 Highlight Pape