22,534 research outputs found
A data augmentation methodology for training machine/deep learning gait recognition algorithms
There are several confounding factors that can reduce the accuracy of gait recognition systems. These factors can reduce the distinctiveness, or alter the features used to characterise gait; they include variations in clothing, lighting, pose and environment, such as the walking surface. Full invariance to all confounding factors is challenging in the absence of high-quality labelled training data. We introduce a simulation-based methodology and a subject-specific dataset which can be used for generating synthetic video frames and sequences for data augmentation. With this methodology, we generated a multi-modal dataset. In addition, we supply simulation files that provide the ability to simultaneously sample from several confounding variables. The basis of the data is real motion capture data of subjects walking and running on a treadmill at different speeds. Results from gait recognition experiments suggest that information about the identity of subjects is retained within synthetically generated examples. The dataset and methodology allow studies into fully-invariant identity recognition spanning a far greater number of observation conditions than would otherwise be possible
Covariate conscious approach for Gait recognition based upon Zernike moment invariants
Gait recognition i.e. identification of an individual from his/her walking
pattern is an emerging field. While existing gait recognition techniques
perform satisfactorily in normal walking conditions, there performance tend to
suffer drastically with variations in clothing and carrying conditions. In this
work, we propose a novel covariate cognizant framework to deal with the
presence of such covariates. We describe gait motion by forming a single 2D
spatio-temporal template from video sequence, called Average Energy Silhouette
image (AESI). Zernike moment invariants (ZMIs) are then computed to screen the
parts of AESI infected with covariates. Following this, features are extracted
from Spatial Distribution of Oriented Gradients (SDOGs) and novel Mean of
Directional Pixels (MDPs) methods. The obtained features are fused together to
form the final well-endowed feature set. Experimental evaluation of the
proposed framework on three publicly available datasets i.e. CASIA dataset B,
OU-ISIR Treadmill dataset B and USF Human-ID challenge dataset with recently
published gait recognition approaches, prove its superior performance.Comment: 11 page
Face recognition technologies for evidential evaluation of video traces
Human recognition from video traces is an important task in forensic investigations and evidence evaluations. Compared with other biometric traits, face is one of the most popularly used modalities for human recognition due to the fact that its collection is non-intrusive and requires less cooperation from the subjects. Moreover, face images taken at a long distance can still provide reasonable resolution, while most biometric modalities, such as iris and fingerprint, do not have this merit. In this chapter, we discuss automatic face recognition technologies for evidential evaluations of video traces. We first introduce the general concepts in both forensic and automatic face recognition , then analyse the difficulties in face recognition from videos . We summarise and categorise the approaches for handling different uncontrollable factors in difficult recognition conditions. Finally we discuss some challenges and trends in face recognition research in both forensics and biometrics . Given its merits tested in many deployed systems and great potential in other emerging applications, considerable research and development efforts are expected to be devoted in face recognition in the near future
View-Invariant Object Category Learning, Recognition, and Search: How Spatial and Object Attention Are Coordinated Using Surface-Based Attentional Shrouds
Air Force Office of Scientific Research (F49620-01-1-0397); National Science Foundation (SBE-0354378); Office of Naval Research (N00014-01-1-0624
ICface: Interpretable and Controllable Face Reenactment Using GANs
This paper presents a generic face animator that is able to control the pose
and expressions of a given face image. The animation is driven by human
interpretable control signals consisting of head pose angles and the Action
Unit (AU) values. The control information can be obtained from multiple sources
including external driving videos and manual controls. Due to the interpretable
nature of the driving signal, one can easily mix the information between
multiple sources (e.g. pose from one image and expression from another) and
apply selective post-production editing. The proposed face animator is
implemented as a two-stage neural network model that is learned in a
self-supervised manner using a large video collection. The proposed
Interpretable and Controllable face reenactment network (ICface) is compared to
the state-of-the-art neural network-based face animation techniques in multiple
tasks. The results indicate that ICface produces better visual quality while
being more versatile than most of the comparison methods. The introduced model
could provide a lightweight and easy to use tool for a multitude of advanced
image and video editing tasks.Comment: Accepted in WACV-202
The Neural Representation Benchmark and its Evaluation on Brain and Machine
A key requirement for the development of effective learning representations
is their evaluation and comparison to representations we know to be effective.
In natural sensory domains, the community has viewed the brain as a source of
inspiration and as an implicit benchmark for success. However, it has not been
possible to directly test representational learning algorithms directly against
the representations contained in neural systems. Here, we propose a new
benchmark for visual representations on which we have directly tested the
neural representation in multiple visual cortical areas in macaque (utilizing
data from [Majaj et al., 2012]), and on which any computer vision algorithm
that produces a feature space can be tested. The benchmark measures the
effectiveness of the neural or machine representation by computing the
classification loss on the ordered eigendecomposition of a kernel matrix
[Montavon et al., 2011]. In our analysis we find that the neural representation
in visual area IT is superior to visual area V4. In our analysis of
representational learning algorithms, we find that three-layer models approach
the representational performance of V4 and the algorithm in [Le et al., 2012]
surpasses the performance of V4. Impressively, we find that a recent supervised
algorithm [Krizhevsky et al., 2012] achieves performance comparable to that of
IT for an intermediate level of image variation difficulty, and surpasses IT at
a higher difficulty level. We believe this result represents a major milestone:
it is the first learning algorithm we have found that exceeds our current
estimate of IT representation performance. We hope that this benchmark will
assist the community in matching the representational performance of visual
cortex and will serve as an initial rallying point for further correspondence
between representations derived in brains and machines.Comment: The v1 version contained incorrectly computed kernel analysis curves
and KA-AUC values for V4, IT, and the HT-L3 models. They have been corrected
in this versio
- …