689 research outputs found
Deep Learning for Crowd Anomaly Detection
Today, public areas across the globe are monitored by an increasing amount of surveillance cameras. This widespread usage has presented an ever-growing volume of data that cannot realistically be examined in real-time. Therefore, efforts to understand crowd dynamics have brought light to automatic systems for the detection of anomalies in crowds. This thesis explores the methods used across literature for this purpose, with a focus on those fusing dense optical flow in a feature extraction stage to the crowd anomaly detection problem. To this extent, five different deep learning architectures are trained using optical flow maps estimated by three deep learning-based techniques. More specifically, a 2D convolutional network, a 3D convolutional network, and LSTM-based convolutional recurrent network, a pre-trained variant of the latter, and a ConvLSTM-based autoencoder is trained using both regular frames and optical flow maps estimated by LiteFlowNet3, RAFT, and GMA on the UCSD Pedestrian 1 dataset. The experimental results have shown that while prone to overfitting, the use of optical flow maps may improve the performance of supervised spatio-temporal architectures
Saliency-based Video Summarization for Face Anti-spoofing
Due to the growing availability of face anti-spoofing databases, researchers
are increasingly focusing on video-based methods that use hundreds to thousands
of images to assess their impact on performance. However, there is no clear
consensus on the exact number of frames in a video required to improve the
performance of face anti-spoofing tasks. Inspired by the visual saliency
theory, we present a video summarization method for face anti-spoofing tasks
that aims to enhance the performance and efficiency of deep learning models by
leveraging visual saliency. In particular, saliency information is extracted
from the differences between the Laplacian and Wiener filter outputs of the
source images, enabling identification of the most visually salient regions
within each frame. Subsequently, the source images are decomposed into base and
detail layers, enhancing representation of important information. The weighting
maps are then computed based on the saliency information, indicating the
importance of each pixel in the image. By linearly combining the base and
detail layers using the weighting maps, the method fuses the source images to
create a single representative image that summarizes the entire video. The key
contribution of our proposed method lies in demonstrating how visual saliency
can be used as a data-centric approach to improve the performance and
efficiency of face presentation attack detection models. By focusing on the
most salient images or regions within the images, a more representative and
diverse training set can be created, potentially leading to more effective
models. To validate the method's effectiveness, a simple deep learning
architecture (CNN-RNN) was used, and the experimental results showcased
state-of-the-art performance on five challenging face anti-spoofing datasets
- …