283 research outputs found
Heart rates estimation using rPPG methods in challenging imaging conditions
Abstract. The cardiovascular system plays a crucial role in maintaining the body’s equilibrium by regulating blood flow and oxygen supply to different organs and tissues. While contact-based techniques like electrocardiography and photoplethysmography are commonly used in healthcare and clinical monitoring, they are not practical for everyday use due to their skin contact requirements. Therefore, non-contact alternatives like remote photoplethysmography (rPPG) have gained significant attention in recent years. However, extracting accurate heart rate information from rPPG signals under challenging imaging conditions, such as image degradation and occlusion, remains a significant challenge. Therefore, this thesis aims to investigate the effectiveness of rPPG methods in extracting heart rate information from rPPG signals in these imaging conditions. It evaluates the effectiveness of both traditional rPPG approaches and rPPG pre-trained deep learning models in the presence of real-world image transformations, such as occlusion of the faces by sunglasses or facemasks, as well as image degradation caused by noise artifacts and motion blur. The study also explores various image restoration techniques to enhance the performance of the selected rPPG methods and experiments with various fine-tuning methods of the best-performing pre-trained model. The research was conducted on three databases, namely UBFC-rPPG, UCLA-rPPG, and UBFC-Phys, and includes comprehensive experiments. The results of this study offer valuable insights into the efficacy of rPPG in practical scenarios and its potential as a non-contact alternative to traditional cardiovascular monitoring techniques
Detecting and removing visual distractors for video aesthetic enhancement
Personal videos often contain visual distractors, which are objects that are accidentally captured that can distract viewers from focusing on the main subjects. We propose a method to automatically detect and localize these distractors through learning from a manually labeled dataset. To achieve spatially and temporally coherent detection, we propose extracting features at the Temporal-Superpixel (TSP) level using a traditional SVM-based learning framework. We also experiment with end-to-end learning using Convolutional Neural Networks (CNNs), which achieves slightly higher performance than other methods. The classification result is further refined in a post-processing step based on graph-cut optimization. Experimental results show that our method achieves an accuracy of 81% and a recall of 86%. We demonstrate several ways of removing the detected distractors to improve the video quality, including video hole filling; video frame replacement; and camera path re-planning. The user study results show that our method can significantly improve the aesthetic quality of videos
Self-supervised Spatio-temporal Representation Learning for Videos by Predicting Motion and Appearance Statistics
We address the problem of video representation learning without
human-annotated labels. While previous efforts address the problem by designing
novel self-supervised tasks using video data, the learned features are merely
on a frame-by-frame basis, which are not applicable to many video analytic
tasks where spatio-temporal features are prevailing. In this paper we propose a
novel self-supervised approach to learn spatio-temporal features for video
representation. Inspired by the success of two-stream approaches in video
classification, we propose to learn visual features by regressing both motion
and appearance statistics along spatial and temporal dimensions, given only the
input video data. Specifically, we extract statistical concepts (fast-motion
region and the corresponding dominant direction, spatio-temporal color
diversity, dominant color, etc.) from simple patterns in both spatial and
temporal domains. Unlike prior puzzles that are even hard for humans to solve,
the proposed approach is consistent with human inherent visual habits and
therefore easy to answer. We conduct extensive experiments with C3D to validate
the effectiveness of our proposed approach. The experiments show that our
approach can significantly improve the performance of C3D when applied to video
classification tasks. Code is available at
https://github.com/laura-wang/video_repres_mas.Comment: CVPR 201
Recent Advances in Image Restoration with Applications to Real World Problems
In the past few decades, imaging hardware has improved tremendously in terms of resolution, making widespread usage of images in many diverse applications on Earth and planetary missions. However, practical issues associated with image acquisition are still affecting image quality. Some of these issues such as blurring, measurement noise, mosaicing artifacts, low spatial or spectral resolution, etc. can seriously affect the accuracy of the aforementioned applications. This book intends to provide the reader with a glimpse of the latest developments and recent advances in image restoration, which includes image super-resolution, image fusion to enhance spatial, spectral resolution, and temporal resolutions, and the generation of synthetic images using deep learning techniques. Some practical applications are also included
Driver-centric Risk Object Identification
A massive number of traffic fatalities are due to driver errors. To reduce
fatalities, developing intelligent driving systems assisting drivers to
identify potential risks is in urgent need. Risky situations are generally
defined based on collision prediction in existing research. However, collisions
are only one type of risk in traffic scenarios. We believe a more generic
definition is required. In this work, we propose a novel driver-centric
definition of risk, i.e., risky objects influence driver behavior. Based on
this definition, a new task called risk object identification is introduced. We
formulate the task as a cause-effect problem and present a novel two-stage risk
object identification framework, taking inspiration from models of situation
awareness and causal inference. A driver-centric Risk Object Identification
(ROI) dataset is curated to evaluate the proposed system. We demonstrate
state-of-the-art risk object identification performance compared with strong
baselines on the ROI dataset. In addition, we conduct extensive ablative
studies to justify our design choices.Comment: Submitted to TPAM
Video Transformers: A Survey
Transformer models have shown great success handling long-range interactions,
making them a promising tool for modeling video. However they lack inductive
biases and scale quadratically with input length. These limitations are further
exacerbated when dealing with the high dimensionality introduced with the
temporal dimension. While there are surveys analyzing the advances of
Transformers for vision, none focus on an in-depth analysis of video-specific
designs. In this survey we analyze main contributions and trends of works
leveraging Transformers to model video. Specifically, we delve into how videos
are handled as input-level first. Then, we study the architectural changes made
to deal with video more efficiently, reduce redundancy, re-introduce useful
inductive biases, and capture long-term temporal dynamics. In addition we
provide an overview of different training regimes and explore effective
self-supervised learning strategies for video. Finally, we conduct a
performance comparison on the most common benchmark for Video Transformers
(i.e., action classification), finding them to outperform 3D ConvNets even with
less computational complexity
- …