20 research outputs found

    Efficient Methods for Unsupervised Learning of Probabilistic Models

    Full text link
    In this thesis I develop a variety of techniques to train, evaluate, and sample from intractable and high dimensional probabilistic models. Abstract exceeds arXiv space limitations -- see PDF

    Online Multi-Stage Deep Architectures for Feature Extraction and Object Recognition

    Get PDF
    Multi-stage visual architectures have recently found success in achieving high classification accuracies over image datasets with large variations in pose, lighting, and scale. Inspired by techniques currently at the forefront of deep learning, such architectures are typically composed of one or more layers of preprocessing, feature encoding, and pooling to extract features from raw images. Training these components traditionally relies on large sets of patches that are extracted from a potentially large image dataset. In this context, high-dimensional feature space representations are often helpful for obtaining the best classification performances and providing a higher degree of invariance to object transformations. Large datasets with high-dimensional features complicate the implementation of visual architectures in memory constrained environments. This dissertation constructs online learning replacements for the components within a multi-stage architecture and demonstrates that the proposed replacements (namely fuzzy competitive clustering, an incremental covariance estimator, and multi-layer neural network) can offer performance competitive with their offline batch counterparts while providing a reduced memory footprint. The online nature of this solution allows for the development of a method for adjusting parameters within the architecture via stochastic gradient descent. Testing over multiple datasets shows the potential benefits of this methodology when appropriate priors on the initial parameters are unknown. Alternatives to batch based decompositions for a whitening preprocessing stage which take advantage of natural image statistics and allow simple dictionary learners to work well in the problem domain are also explored. Expansions of the architecture using additional pooling statistics and multiple layers are presented and indicate that larger codebook sizes are not the only step forward to higher classification accuracies. Experimental results from these expansions further indicate the important role of sparsity and appropriate encodings within multi-stage visual feature extraction architectures

    Out of equilibrium Statistical Physics of learning

    Get PDF
    In the study of hard optimization problems, it is often unfeasible to achieve a full analytic control on the dynamics of the algorithmic processes that find solutions efficiently. In many cases, a static approach is able to provide considerable insight into the dynamical properties of these algorithms: in fact, the geometrical structures found in the energetic landscape can strongly affect the stationary states and the optimal configurations reached by the solvers. In this context, a classical Statistical Mechanics approach, relying on the assumption of the asymptotic realization of a Boltzmann Gibbs equilibrium, can yield misleading predictions when the studied algorithms comprise some stochastic components that effectively drive these processes out of equilibrium. Thus, it becomes necessary to develop some intuition on the relevant features of the studied phenomena and to build an ad hoc Large Deviation analysis, providing a more targeted and richer description of the geometrical properties of the landscape. The present thesis focuses on the study of learning processes in Artificial Neural Networks, with the aim of introducing an out of equilibrium statistical physics framework, based on the introduction of a local entropy potential, for supporting and inspiring algorithmic improvements in the field of Deep Learning, and for developing models of neural computation that can carry both biological and engineering interest

    Stabilizing and Enhancing Learning for Deep Complex and Real Neural Networks

    Get PDF
    Dans cette thèse nous proposons un ensemble de contributions originales sous la forme de trois articles relatifs aux réseaux de neurones profonds réels et complexes. Nous abordons à la fois des problèmes théoriques et pratiques liés à leur apprentissage. Les trois articles traitent des méthodes conçues pour apporter des solutions aux problèmes de l’instabilité observée au cours de l’entrainement des réseaux, notamment le problème notoire de dilution et d’explosion des gradients ou «vanishing and exploding gradients » lors de l’entrainement des réseaux de neurones profonds. Nous proposons dans un premier temps la conception de modules d’entrainement appropriés, désignés par «building blocks», pour les réseaux de neurones profonds à valeurs complexes. Notre proposition comporte des méthodes d’initialisation et de normalisation ainsi que des fonctions d’activation des unités neuronales. Les modules conçus sont par la suite utilisés pour la spécification d’architectures profondes à valeurs complexes dédiées à accomplir diverses tâches. Ceci comprend des tâches de vision par ordinateur, de transcription musicale, de prédiction du spectre de la parole, d’extraction des signaux et de séparation des sources audio. Finalement nous procédons à une analyse détaillée de l’utilité de l’hypothèse contraignante d’orthogonalité généralement adoptée pour le paramétrage de la matrice de transition à travers les couches des réseaux de neurones réels récurrents.----------ABSTRACT : This thesis presents a set of original contributions in the form of three chapters on real and complex-valued deep neural networks. We address both theoretical issues and practical challenges related to the training of both real and complex-valued neural networks. First, we investigate the design of appropriate building blocks for deep complex-valued neural networks, such as initialization methods, normalization techniques and elementwise activation functions. We apply our theoretical insights to design building blocks for the construction of deep complex-valued architectures. We use them to perform various tasks in computer vision, music transcription, speech spectrum prediction, signal retrieval and audio source separation. We also perform an analysis of the usefulness of orthogonality for the hidden transition matrix in a real-valued recurrent neural network. Each of the three chapters are dedicated to dealing with methods designed to provide solutions to problems causing training instability, among them, the notorious problem of vanishing and exploding gradients during the training of deep neural networks. Throughout this manuscript we show the usefulness of the methods we propose in the context of well known challenges and clearly identifiable objectives. We provide below a summary of the contributions within each chapter. At present, the vast majority of building blocks, techniques, and architectures for training deep neural networks are based on real-valued computations and representations. However, representations based on complex numbers have started to receive increased attention. Despite their compelling properties complex-valued deep neural networks have been neglected due in part to the absence of the building blocks required to design and train this type of network. The lack of such a framework represents a noticeable gap in deep learning tooling

    Adaptive Methods for Robust Document Image Understanding

    Get PDF
    A vast amount of digital document material is continuously being produced as part of major digitization efforts around the world. In this context, generic and efficient automatic solutions for document image understanding represent a stringent necessity. We propose a generic framework for document image understanding systems, usable for practically any document types available in digital form. Following the introduced workflow, we shift our attention to each of the following processing stages in turn: quality assurance, image enhancement, color reduction and binarization, skew and orientation detection, page segmentation and logical layout analysis. We review the state of the art in each area, identify current defficiencies, point out promising directions and give specific guidelines for future investigation. We address some of the identified issues by means of novel algorithmic solutions putting special focus on generality, computational efficiency and the exploitation of all available sources of information. More specifically, we introduce the following original methods: a fully automatic detection of color reference targets in digitized material, accurate foreground extraction from color historical documents, font enhancement for hot metal typesetted prints, a theoretically optimal solution for the document binarization problem from both computational complexity- and threshold selection point of view, a layout-independent skew and orientation detection, a robust and versatile page segmentation method, a semi-automatic front page detection algorithm and a complete framework for article segmentation in periodical publications. The proposed methods are experimentally evaluated on large datasets consisting of real-life heterogeneous document scans. The obtained results show that a document understanding system combining these modules is able to robustly process a wide variety of documents with good overall accuracy

    Probabilistic characterization and synthesis of complex driven systems

    Get PDF
    Thesis (Ph.D.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2000.Includes bibliographical references (leaves 194-204).Real-world systems that have characteristic input-output patterns but don't provide access to their internal states are as numerous as they are difficult to model. This dissertation introduces a modeling language for estimating and emulating the behavior of such systems given time series data. As a benchmark test, a digital violin is designed from observing the performance of an instrument. Cluster-weighted modeling (CWM), a mixture density estimator around local models, is presented as a framework for function approximation and for the prediction and characterization of nonlinear time series. The general model architecture and estimation algorithm are presented and extended to system characterization tools such as estimator uncertainty, predictor uncertainty and the correlation dimension of the data set. Furthermore a real-time implementation, a Hidden-Markov architecture, and function approximation under constraints are derived within the framework. CWM is then applied in the context of different problems and data sets, leading to architectures such as cluster-weighted classification, cluster-weighted estimation, and cluster-weighted sampling. Each application relies on a specific data representation, specific pre and post-processing algorithms, and a specific hybrid of CWM. The third part of this thesis introduces data-driven modeling of acoustic instruments, a novel technique for audio synthesis. CWM is applied along with new sensor technology and various audio representations to estimate models of violin-family instruments. The approach is demonstrated by synthesizing highly accurate violin sounds given off-line input data as well as cello sounds given real-time input data from a cello player.by Bernd Schoner.Ph.D
    corecore