5,730 research outputs found
Trespassing the Boundaries: Labeling Temporal Bounds for Object Interactions in Egocentric Video
Manual annotations of temporal bounds for object interactions (i.e. start and
end times) are typical training input to recognition, localization and
detection algorithms. For three publicly available egocentric datasets, we
uncover inconsistencies in ground truth temporal bounds within and across
annotators and datasets. We systematically assess the robustness of
state-of-the-art approaches to changes in labeled temporal bounds, for object
interaction recognition. As boundaries are trespassed, a drop of up to 10% is
observed for both Improved Dense Trajectories and Two-Stream Convolutional
Neural Network.
We demonstrate that such disagreement stems from a limited understanding of
the distinct phases of an action, and propose annotating based on the Rubicon
Boundaries, inspired by a similarly named cognitive model, for consistent
temporal bounds of object interactions. Evaluated on a public dataset, we
report a 4% increase in overall accuracy, and an increase in accuracy for 55%
of classes when Rubicon Boundaries are used for temporal annotations.Comment: ICCV 201
Face recognition technologies for evidential evaluation of video traces
Human recognition from video traces is an important task in forensic investigations and evidence evaluations. Compared with other biometric traits, face is one of the most popularly used modalities for human recognition due to the fact that its collection is non-intrusive and requires less cooperation from the subjects. Moreover, face images taken at a long distance can still provide reasonable resolution, while most biometric modalities, such as iris and fingerprint, do not have this merit. In this chapter, we discuss automatic face recognition technologies for evidential evaluations of video traces. We first introduce the general concepts in both forensic and automatic face recognition , then analyse the difficulties in face recognition from videos . We summarise and categorise the approaches for handling different uncontrollable factors in difficult recognition conditions. Finally we discuss some challenges and trends in face recognition research in both forensics and biometrics . Given its merits tested in many deployed systems and great potential in other emerging applications, considerable research and development efforts are expected to be devoted in face recognition in the near future
Automated pharyngeal phase detection and bolus localization in videofluoroscopic swallowing study: Killing two birds with one stone?
The videofluoroscopic swallowing study (VFSS) is a gold-standard imaging
technique for assessing swallowing, but analysis and rating of VFSS recordings
is time consuming and requires specialized training and expertise. Researchers
have recently demonstrated that it is possible to automatically detect the
pharyngeal phase of swallowing and to localize the bolus in VFSS recordings via
computer vision, fostering the development of novel techniques for automatic
VFSS analysis. However, training of algorithms to perform these tasks requires
large amounts of annotated data that are seldom available. We demonstrate that
the challenges of pharyngeal phase detection and bolus localization can be
solved together using a single approach. We propose a deep-learning framework
that jointly tackles pharyngeal phase detection and bolus localization in a
weakly-supervised manner, requiring only the initial and final frames of the
pharyngeal phase as ground truth annotations for the training. Our approach
stems from the observation that bolus presence in the pharynx is the most
prominent visual feature upon which to infer whether individual VFSS frames
belong to the pharyngeal phase. We conducted extensive experiments with
multiple convolutional neural networks (CNNs) on a dataset of 1245 bolus-level
clips from 59 healthy subjects. We demonstrated that the pharyngeal phase can
be detected with an F1-score higher than 0.9. Moreover, by processing the class
activation maps of the CNNs, we were able to localize the bolus with promising
results, obtaining correlations with ground truth trajectories higher than 0.9,
without any manual annotations of bolus location used for training purposes.
Once validated on a larger sample of participants with swallowing disorders,
our framework will pave the way for the development of intelligent tools for
VFSS analysis to support clinicians in swallowing assessment
Multiple object tracking using a neural cost function
This paper presents a new approach to the tracking of multiple objects in CCTV surveillance using a combination of simple neural cost functions based on Self-Organizing Maps, and a greedy assignment algorithm. Using a reference standard data set and an exhaustive search algorithm for benchmarking, we show that the cost function plays the most significant role in realizing high levels of performance. The neural cost function’s context-sensitive treatment of appearance, change of appearance and trajectory yield better tracking than a simple, explicitly designed cost function. The algorithm matches 98.8% of objects to within 15 pixels
- …