574 research outputs found
Unsupervised Learning of Landmarks by Descriptor Vector Exchange
Equivariance to random image transformations is an effective method to learn
landmarks of object categories, such as the eyes and the nose in faces, without
manual supervision. However, this method does not explicitly guarantee that the
learned landmarks are consistent with changes between different instances of
the same object, such as different facial identities. In this paper, we develop
a new perspective on the equivariance approach by noting that dense landmark
detectors can be interpreted as local image descriptors equipped with
invariance to intra-category variations. We then propose a direct method to
enforce such an invariance in the standard equivariant loss. We do so by
exchanging descriptor vectors between images of different object instances
prior to matching them geometrically. In this manner, the same vectors must
work regardless of the specific object identity considered. We use this
approach to learn vectors that can simultaneously be interpreted as local
descriptors and dense landmarks, combining the advantages of both. Experiments
on standard benchmarks show that this approach can match, and in some cases
surpass state-of-the-art performance amongst existing methods that learn
landmarks without supervision. Code is available at
www.robots.ox.ac.uk/~vgg/research/DVE/.Comment: ICCV 201
Unsupervised landmark analysis for jump detection in molecular dynamics simulations
Molecular dynamics is a versatile and powerful method to study diffusion in
solid-state ionic conductors, requiring minimal prior knowledge of equilibrium
or transition states of the system's free energy surface. However, the analysis
of trajectories for relevant but rare events, such as a jump of the diffusing
mobile ion, is still rather cumbersome, requiring prior knowledge of the
diffusive process in order to get meaningful results. In this work, we present
a novel approach to detect the relevant events in a diffusive system without
assuming prior information regarding the underlying process. We start from a
projection of the atomic coordinates into a landmark basis to identify the
dominant features in a mobile ion's environment. Subsequent clustering in
landmark space enables a discretization of any trajectory into a sequence of
distinct states. As a final step, the use of the smooth overlap of atomic
positions descriptor allows distinguishing between different environments in a
straightforward way. We apply this algorithm to ten Li-ionic systems and
conduct in-depth analyses of cubic LiLaZrO, tetragonal
LiGePS, and the -eucryptite LiAlSiO. We
compare our results to existing methods, underscoring strong points,
weaknesses, and insights into the diffusive behavior of the ionic conduction in
the materials investigated
Enriched Long-term Recurrent Convolutional Network for Facial Micro-Expression Recognition
Facial micro-expression (ME) recognition has posed a huge challenge to
researchers for its subtlety in motion and limited databases. Recently,
handcrafted techniques have achieved superior performance in micro-expression
recognition but at the cost of domain specificity and cumbersome parametric
tunings. In this paper, we propose an Enriched Long-term Recurrent
Convolutional Network (ELRCN) that first encodes each micro-expression frame
into a feature vector through CNN module(s), then predicts the micro-expression
by passing the feature vector through a Long Short-term Memory (LSTM) module.
The framework contains two different network variants: (1) Channel-wise
stacking of input data for spatial enrichment, (2) Feature-wise stacking of
features for temporal enrichment. We demonstrate that the proposed approach is
able to achieve reasonably good performance, without data augmentation. In
addition, we also present ablation studies conducted on the framework and
visualizations of what CNN "sees" when predicting the micro-expression classes.Comment: Published in Micro-Expression Grand Challenge 2018, Workshop of 13th
IEEE Facial & Gesture 201
Artificial Intelligence Tools for Facial Expression Analysis.
Inner emotions show visibly upon the human face and are understood as a basic guide to an individual’s inner world. It is, therefore, possible to determine a person’s attitudes and the effects of others’ behaviour on their deeper feelings through examining facial expressions. In real world applications, machines that interact with people need strong facial expression recognition. This recognition is seen to hold advantages for varied applications in affective computing, advanced human-computer interaction, security, stress and depression analysis, robotic systems, and machine learning. This thesis starts by proposing a benchmark of dynamic versus static methods for facial Action Unit (AU) detection. AU activation is a set of local individual facial muscle parts that occur in unison constituting a natural facial expression event. Detecting AUs automatically can provide explicit benefits since it considers both static and dynamic facial features. For this research, AU occurrence activation detection was conducted by extracting features (static and dynamic) of both nominal hand-crafted and deep learning representation from each static image of a video. This confirmed the superior ability of a pretrained model that leaps in performance. Next, temporal modelling was investigated to detect the underlying temporal variation phases using supervised and unsupervised methods from dynamic sequences. During these processes, the importance of stacking dynamic on top of static was discovered in encoding deep features for learning temporal information when combining the spatial and temporal schemes simultaneously. Also, this study found that fusing both temporal and temporal features will give more long term temporal pattern information. Moreover, we hypothesised that using an unsupervised method would enable the leaching of invariant information from dynamic textures. Recently, fresh cutting-edge developments have been created by approaches based on Generative Adversarial Networks (GANs). In the second section of this thesis, we propose a model based on the adoption of an unsupervised DCGAN for the facial features’ extraction and classification to achieve the following: the creation of facial expression images under different arbitrary poses (frontal, multi-view, and in the wild), and the recognition of emotion categories and AUs, in an attempt to resolve the problem of recognising the static seven classes of emotion in the wild. Thorough experimentation with the proposed cross-database performance demonstrates that this approach can improve the generalization results. Additionally, we showed that the features learnt by the DCGAN process are poorly suited to encoding facial expressions when observed under multiple views, or when trained from a limited number of positive examples. Finally, this research focuses on disentangling identity from expression for facial expression recognition. A novel technique was implemented for emotion recognition from a single monocular image. A large-scale dataset (Face vid) was created from facial image videos which were rich in variations and distribution of facial dynamics, appearance, identities, expressions, and 3D poses. This dataset was used to train a DCNN (ResNet) to regress the expression parameters from a 3D Morphable Model jointly with a back-end classifier
Unsupervised Landmark Discovery Using Consistency Guided Bottleneck
We study a challenging problem of unsupervised discovery of object landmarks.
Many recent methods rely on bottlenecks to generate 2D Gaussian heatmaps
however, these are limited in generating informed heatmaps while training,
presumably due to the lack of effective structural cues. Also, it is assumed
that all predicted landmarks are semantically relevant despite having no ground
truth supervision. In the current work, we introduce a consistency-guided
bottleneck in an image reconstruction-based pipeline that leverages landmark
consistency, a measure of compatibility score with the pseudo-ground truth to
generate adaptive heatmaps. We propose obtaining pseudo-supervision via forming
landmark correspondence across images. The consistency then modulates the
uncertainty of the discovered landmarks in the generation of adaptive heatmaps
which rank consistent landmarks above their noisy counterparts, providing
effective structural information for improved robustness. Evaluations on five
diverse datasets including MAFL, AFLW, LS3D, Cats, and Shoes demonstrate
excellent performance of the proposed approach compared to the existing
state-of-the-art methods. Our code is publicly available at
https://github.com/MamonaAwan/CGB_ULD.Comment: Accepted ORAL at BMVC 2023 ; Code:
https://github.com/MamonaAwan/CGB_UL
- …