Video-based remote physiological measurement utilizes facial videos to
measure the blood volume change signal, which is also called remote
photoplethysmography (rPPG). Supervised methods for rPPG measurements have been
shown to achieve good performance. However, the drawback of these methods is
that they require facial videos with ground truth (GT) physiological signals,
which are often costly and difficult to obtain. In this paper, we propose
Contrast-Phys+, a method that can be trained in both unsupervised and
weakly-supervised settings. We employ a 3DCNN model to generate multiple
spatiotemporal rPPG signals and incorporate prior knowledge of rPPG into a
contrastive loss function. We further incorporate the GT signals into
contrastive learning to adapt to partial or misaligned labels. The contrastive
loss encourages rPPG/GT signals from the same video to be grouped together,
while pushing those from different videos apart. We evaluate our methods on
five publicly available datasets that include both RGB and Near-infrared
videos. Contrast-Phys+ outperforms the state-of-the-art supervised methods,
even when using partially available or misaligned GT signals, or no labels at
all. Additionally, we highlight the advantages of our methods in terms of
computational efficiency, noise robustness, and generalization