10,162 research outputs found

    Representation Learning: A Review and New Perspectives

    Full text link
    The success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although specific domain knowledge can be used to help design representations, learning with generic priors can also be used, and the quest for AI is motivating the design of more powerful representation-learning algorithms implementing such priors. This paper reviews recent work in the area of unsupervised feature learning and deep learning, covering advances in probabilistic models, auto-encoders, manifold learning, and deep networks. This motivates longer-term unanswered questions about the appropriate objectives for learning good representations, for computing representations (i.e., inference), and the geometrical connections between representation learning, density estimation and manifold learning

    Variance Loss in Variational Autoencoders

    Get PDF
    In this article, we highlight what appears to be major issue of Variational Autoencoders, evinced from an extensive experimentation with different network architectures and datasets: the variance of generated data is significantly lower than that of training data. Since generative models are usually evaluated with metrics such as the Frechet Inception Distance (FID) that compare the distributions of (features of) real versus generated images, the variance loss typically results in degraded scores. This problem is particularly relevant in a two stage setting, where we use a second VAE to sample in the latent space of the first VAE. The minor variance creates a mismatch between the actual distribution of latent variables and those generated by the second VAE, that hinders the beneficial effects of the second stage. Renormalizing the output of the second VAE towards the expected normal spherical distribution, we obtain a sudden burst in the quality of generated samples, as also testified in terms of FID.Comment: Article accepted at the Sixth International Conference on Machine Learning, Optimization, and Data Science. July 19-23, 2020 - Certosa di Pontignano, Siena, Ital

    On Robust Face Recognition via Sparse Encoding: the Good, the Bad, and the Ugly

    Get PDF
    In the field of face recognition, Sparse Representation (SR) has received considerable attention during the past few years. Most of the relevant literature focuses on holistic descriptors in closed-set identification applications. The underlying assumption in SR-based methods is that each class in the gallery has sufficient samples and the query lies on the subspace spanned by the gallery of the same class. Unfortunately, such assumption is easily violated in the more challenging face verification scenario, where an algorithm is required to determine if two faces (where one or both have not been seen before) belong to the same person. In this paper, we first discuss why previous attempts with SR might not be applicable to verification problems. We then propose an alternative approach to face verification via SR. Specifically, we propose to use explicit SR encoding on local image patches rather than the entire face. The obtained sparse signals are pooled via averaging to form multiple region descriptors, which are then concatenated to form an overall face descriptor. Due to the deliberate loss spatial relations within each region (caused by averaging), the resulting descriptor is robust to misalignment & various image deformations. Within the proposed framework, we evaluate several SR encoding techniques: l1-minimisation, Sparse Autoencoder Neural Network (SANN), and an implicit probabilistic technique based on Gaussian Mixture Models. Thorough experiments on AR, FERET, exYaleB, BANCA and ChokePoint datasets show that the proposed local SR approach obtains considerably better and more robust performance than several previous state-of-the-art holistic SR methods, in both verification and closed-set identification problems. The experiments also show that l1-minimisation based encoding has a considerably higher computational than the other techniques, but leads to higher recognition rates

    3D reconstruction of ribcage geometry from biplanar radiographs using a statistical parametric model approach

    Get PDF
    Rib cage 3D reconstruction is an important prerequisite for thoracic spine modelling, particularly for studies of the deformed thorax in adolescent idiopathic scoliosis. This study proposes a new method for rib cage 3D reconstruction from biplanar radiographs, using a statistical parametric model approach. Simplified parametric models were defined at the hierarchical levels of rib cage surface, rib midline and rib surface, and applied on a database of 86 trunks. The resulting parameter database served to statistical models learning which were used to quickly provide a first estimate of the reconstruction from identifications on both radiographs. This solution was then refined by manual adjustments in order to improve the matching between model and image. Accuracy was assessed by comparison with 29 rib cages from CT scans in terms of geometrical parameter differences and in terms of line-to-line error distance between the rib midlines. Intra and inter-observer reproducibility were determined regarding 20 scoliotic patients. The first estimate (mean reconstruction time of 2’30) was sufficient to extract the main rib cage global parameters with a 95% confidence interval lower than 7%, 8%, 2% and 4° for rib cage volume, antero-posterior and lateral maximal diameters and maximal rib hump, respectively. The mean error distance was 5.4 mm (max 35mm) down to 3.6 mm (max 24 mm) after the manual adjustment step (+3’30). The proposed method will improve developments of rib cage finite element modeling and evaluation of clinical outcomes.This work was funded by Paris Tech BiomecAM chair on subject specific muscular skeletal modeling, and we express our acknowledgments to the chair founders: Cotrel foundation, Société générale, Protéor Company and COVEA consortium. We extend your acknowledgements to Alina Badina for medical imaging data, Alexandre Journé for his advices, and Thomas Joubert for his technical support

    Learning deep dynamical models from image pixels

    Get PDF
    Modeling dynamical systems is important in many disciplines, e.g., control, robotics, or neurotechnology. Commonly the state of these systems is not directly observed, but only available through noisy and potentially high-dimensional observations. In these cases, system identification, i.e., finding the measurement mapping and the transition mapping (system dynamics) in latent space can be challenging. For linear system dynamics and measurement mappings efficient solutions for system identification are available. However, in practical applications, the linearity assumptions does not hold, requiring non-linear system identification techniques. If additionally the observations are high-dimensional (e.g., images), non-linear system identification is inherently hard. To address the problem of non-linear system identification from high-dimensional observations, we combine recent advances in deep learning and system identification. In particular, we jointly learn a low-dimensional embedding of the observation by means of deep auto-encoders and a predictive transition model in this low-dimensional space. We demonstrate that our model enables learning good predictive models of dynamical systems from pixel information only.Comment: 10 pages, 11 figure
    • …
    corecore