2,309 research outputs found
State of the Art in Face Recognition
Notwithstanding the tremendous effort to solve the face recognition problem, it is not possible yet to design a face recognition system with a potential close to human performance. New computer vision and pattern recognition approaches need to be investigated. Even new knowledge and perspectives from different fields like, psychology and neuroscience must be incorporated into the current field of face recognition to design a robust face recognition system. Indeed, many more efforts are required to end up with a human like face recognition system. This book tries to make an effort to reduce the gap between the previous face recognition research state and the future state
Object-Centric Learning with Slot Attention
Learning object-centric representations of complex scenes is a promising step
towards enabling efficient abstract reasoning from low-level perceptual
features. Yet, most deep learning approaches learn distributed representations
that do not capture the compositional properties of natural scenes. In this
paper, we present the Slot Attention module, an architectural component that
interfaces with perceptual representations such as the output of a
convolutional neural network and produces a set of task-dependent abstract
representations which we call slots. These slots are exchangeable and can bind
to any object in the input by specializing through a competitive procedure over
multiple rounds of attention. We empirically demonstrate that Slot Attention
can extract object-centric representations that enable generalization to unseen
compositions when trained on unsupervised object discovery and supervised
property prediction tasks
Video Anomaly Detection by Solving Decoupled Spatio-Temporal Jigsaw Puzzles
Video Anomaly Detection (VAD) is an important topic in computer vision.
Motivated by the recent advances in self-supervised learning, this paper
addresses VAD by solving an intuitive yet challenging pretext task, i.e.,
spatio-temporal jigsaw puzzles, which is cast as a multi-label fine-grained
classification problem. Our method exhibits several advantages over existing
works: 1) the spatio-temporal jigsaw puzzles are decoupled in terms of spatial
and temporal dimensions, responsible for capturing highly discriminative
appearance and motion features, respectively; 2) full permutations are used to
provide abundant jigsaw puzzles covering various difficulty levels, allowing
the network to distinguish subtle spatio-temporal differences between normal
and abnormal events; and 3) the pretext task is tackled in an end-to-end manner
without relying on any pre-trained models. Our method outperforms
state-of-the-art counterparts on three public benchmarks. Especially on
ShanghaiTech Campus, the result is superior to reconstruction and
prediction-based methods by a large margin.Comment: Accepted by ECCV'2022; Code is available at
https://github.com/gdwang08/Jigsaw-VA
- …