10,162 research outputs found
Representation Learning: A Review and New Perspectives
The success of machine learning algorithms generally depends on data
representation, and we hypothesize that this is because different
representations can entangle and hide more or less the different explanatory
factors of variation behind the data. Although specific domain knowledge can be
used to help design representations, learning with generic priors can also be
used, and the quest for AI is motivating the design of more powerful
representation-learning algorithms implementing such priors. This paper reviews
recent work in the area of unsupervised feature learning and deep learning,
covering advances in probabilistic models, auto-encoders, manifold learning,
and deep networks. This motivates longer-term unanswered questions about the
appropriate objectives for learning good representations, for computing
representations (i.e., inference), and the geometrical connections between
representation learning, density estimation and manifold learning
Variance Loss in Variational Autoencoders
In this article, we highlight what appears to be major issue of Variational
Autoencoders, evinced from an extensive experimentation with different network
architectures and datasets: the variance of generated data is significantly
lower than that of training data. Since generative models are usually evaluated
with metrics such as the Frechet Inception Distance (FID) that compare the
distributions of (features of) real versus generated images, the variance loss
typically results in degraded scores. This problem is particularly relevant in
a two stage setting, where we use a second VAE to sample in the latent space of
the first VAE. The minor variance creates a mismatch between the actual
distribution of latent variables and those generated by the second VAE, that
hinders the beneficial effects of the second stage. Renormalizing the output of
the second VAE towards the expected normal spherical distribution, we obtain a
sudden burst in the quality of generated samples, as also testified in terms of
FID.Comment: Article accepted at the Sixth International Conference on Machine
Learning, Optimization, and Data Science. July 19-23, 2020 - Certosa di
Pontignano, Siena, Ital
On Robust Face Recognition via Sparse Encoding: the Good, the Bad, and the Ugly
In the field of face recognition, Sparse Representation (SR) has received
considerable attention during the past few years. Most of the relevant
literature focuses on holistic descriptors in closed-set identification
applications. The underlying assumption in SR-based methods is that each class
in the gallery has sufficient samples and the query lies on the subspace
spanned by the gallery of the same class. Unfortunately, such assumption is
easily violated in the more challenging face verification scenario, where an
algorithm is required to determine if two faces (where one or both have not
been seen before) belong to the same person. In this paper, we first discuss
why previous attempts with SR might not be applicable to verification problems.
We then propose an alternative approach to face verification via SR.
Specifically, we propose to use explicit SR encoding on local image patches
rather than the entire face. The obtained sparse signals are pooled via
averaging to form multiple region descriptors, which are then concatenated to
form an overall face descriptor. Due to the deliberate loss spatial relations
within each region (caused by averaging), the resulting descriptor is robust to
misalignment & various image deformations. Within the proposed framework, we
evaluate several SR encoding techniques: l1-minimisation, Sparse Autoencoder
Neural Network (SANN), and an implicit probabilistic technique based on
Gaussian Mixture Models. Thorough experiments on AR, FERET, exYaleB, BANCA and
ChokePoint datasets show that the proposed local SR approach obtains
considerably better and more robust performance than several previous
state-of-the-art holistic SR methods, in both verification and closed-set
identification problems. The experiments also show that l1-minimisation based
encoding has a considerably higher computational than the other techniques, but
leads to higher recognition rates
3D reconstruction of ribcage geometry from biplanar radiographs using a statistical parametric model approach
Rib cage 3D reconstruction is an important prerequisite for thoracic spine modelling, particularly for studies of the deformed thorax in adolescent idiopathic scoliosis. This study proposes a new method for rib cage 3D reconstruction from biplanar radiographs, using a statistical parametric model approach. Simplified parametric models were defined at the hierarchical levels of rib cage surface, rib midline and rib surface, and applied on a database of 86 trunks. The resulting parameter database served to statistical models learning which were used to quickly provide a first estimate of the reconstruction from identifications on both radiographs. This solution was then refined by manual adjustments in order to improve the matching between model and image. Accuracy was assessed by comparison with 29 rib cages from CT scans in terms of geometrical parameter differences and in terms of line-to-line error distance between the rib midlines. Intra and inter-observer reproducibility were determined regarding 20 scoliotic patients. The first estimate (mean reconstruction time of 2’30) was sufficient to extract the main rib cage global parameters with a 95% confidence interval lower than 7%, 8%, 2% and 4° for rib cage volume, antero-posterior and lateral maximal diameters and maximal rib hump, respectively. The mean error distance was 5.4 mm (max 35mm) down to 3.6 mm (max 24 mm) after the manual adjustment step (+3’30). The proposed method will improve developments of rib cage finite element modeling and evaluation of clinical outcomes.This work was funded by Paris Tech BiomecAM chair on subject specific muscular skeletal modeling, and we express our acknowledgments to the chair founders: Cotrel foundation, Société générale, Protéor Company and COVEA consortium. We extend your acknowledgements to Alina Badina for medical imaging data, Alexandre Journé for his advices, and Thomas Joubert for his technical support
Learning deep dynamical models from image pixels
Modeling dynamical systems is important in many disciplines, e.g., control,
robotics, or neurotechnology. Commonly the state of these systems is not
directly observed, but only available through noisy and potentially
high-dimensional observations. In these cases, system identification, i.e.,
finding the measurement mapping and the transition mapping (system dynamics) in
latent space can be challenging. For linear system dynamics and measurement
mappings efficient solutions for system identification are available. However,
in practical applications, the linearity assumptions does not hold, requiring
non-linear system identification techniques. If additionally the observations
are high-dimensional (e.g., images), non-linear system identification is
inherently hard. To address the problem of non-linear system identification
from high-dimensional observations, we combine recent advances in deep learning
and system identification. In particular, we jointly learn a low-dimensional
embedding of the observation by means of deep auto-encoders and a predictive
transition model in this low-dimensional space. We demonstrate that our model
enables learning good predictive models of dynamical systems from pixel
information only.Comment: 10 pages, 11 figure
- …