100 research outputs found

    Deep Neural Networks for Visual Reasoning, Program Induction, and Text-to-Image Synthesis.

    Full text link
    Deep neural networks excel at pattern recognition, especially in the setting of large scale supervised learning. A combination of better hardware, more data, and algorithmic improvements have yielded breakthroughs in image classification, speech recognition and other perception problems. The research frontier has shifted towards the weak side of neural networks: reasoning, planning, and (like all machine learning algorithms) creativity. How can we advance along this frontier using the same generic techniques so effective in pattern recognition; i.e. gradient descent with backpropagation? In this thesis I develop neural architectures with new capabilities in visual reasoning, program induction and text-to-image synthesis. I propose two models that disentangle the latent visual factors of variation that give rise to images, and enable analogical reasoning in the latent space. I show how to augment a recurrent network with a memory of programs that enables the learning of compositional structure for more data-efficient and generalizable program induction. Finally, I develop a generative neural network that translates descriptions of birds, flowers and other categories into compelling natural images.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/135763/1/reedscot_1.pd

    Feedforward deep architectures for classification and synthesis

    Full text link
    Cette thèse par article présente plusieurs contributions au domaine de l'apprentissage de représentations profondes, avec des applications aux problèmes de classification et de synthèse d'images naturelles. Plus spécifiquement, cette thèse présente plusieurs nouvelles techniques pour la construction et l'entraînment de réseaux neuronaux profonds, ainsi q'une étude empirique de la technique de «dropout», une des approches de régularisation les plus populaires des dernières années. Le premier article présente une nouvelle fonction d'activation linéaire par morceau, appellée «maxout», qui permet à chaque unité cachée d'un réseau de neurones d'apprendre sa propre fonction d'activation convexe. Nous démontrons une performance améliorée sur plusieurs tâches d'évaluation du domaine de reconnaissance d'objets, et nous examinons empiriquement les sources de cette amélioration, y compris une meilleure synergie avec la méthode de régularisation «dropout» récemment proposée. Le second article poursuit l'examen de la technique «dropout». Nous nous concentrons sur les réseaux avec fonctions d'activation rectifiées linéaires (ReLU) et répondons empiriquement à plusieurs questions concernant l'efficacité remarquable de «dropout» en tant que régularisateur, incluant les questions portant sur la méthode rapide de rééchelonnement au temps de l´évaluation et la moyenne géometrique que cette méthode approxime, l'interprétation d'ensemble comparée aux ensembles traditionnels, et l'importance d'employer des critères similaires au «bagging» pour l'optimisation. Le troisième article s'intéresse à un problème pratique de l'application à l'échelle industrielle de réseaux neuronaux profonds au problème de reconnaissance d'objets avec plusieurs etiquettes, nommément l'amélioration de la capacité d'un modèle à discriminer entre des étiquettes fréquemment confondues. Nous résolvons le problème en employant la prédiction du réseau des sous-composantes dédiées à chaque sous-ensemble de la partition. Finalement, le quatrième article s'attaque au problème de l'entraînment de modèles génératifs adversariaux (GAN) récemment proposé. Nous présentons une procédure d'entraînment améliorée employant un auto-encodeur débruitant, entraîné dans un espace caractéristiques abstrait appris par le discriminateur, pour guider le générateur à apprendre un encodage qui s'aligne de plus près aux données. Nous évaluons le modèle avec le score «Inception» récemment proposé.This thesis by articles makes several contributions to the field of deep learning, with applications to both classification and synthesis of natural images. Specifically, we introduce several new techniques for the construction and training of deep feedforward networks, and present an empirical investigation into dropout, one of the most popular regularization strategies of the last several years. In the first article, we present a novel piece-wise linear parameterization of neural networks, maxout, which allows each hidden unit of a neural network to effectively learn its own convex activation function. We demonstrate improvements on several object recognition benchmarks, and empirically investigate the source of these improvements, including an improved synergy with the recently proposed dropout regularization method. In the second article, we further interrogate the dropout algorithm in particular. Focusing on networks of the popular rectified linear units (ReLU), we empirically examine several questions regarding dropout’s remarkable effectiveness as a regularizer, including questions surrounding the fast test-time rescaling trick and the geometric mean it approximates, interpretations as an ensemble as compared with traditional ensembles, and the importance of using a bagging-like criterion for optimization. In the third article, we address a practical problem in industrial-scale application of deep networks for multi-label object recognition, namely improving an existing model’s ability to discriminate between frequently confused classes. We accomplish this by using the network’s own predictions to inform a partitioning of the label space, and augment the network with dedicated discriminative capacity addressing each of the partitions. Finally, in the fourth article, we tackle the problem of fitting implicit generative models of open domain collections of natural images using the recently introduced Generative Adversarial Networks (GAN) paradigm. We introduce an augmented training procedure which employs a denoising autoencoder, trained in a high-level feature space learned by the discriminator, to guide the generator towards feature encodings which more closely resemble the data. We quantitatively evaluate our findings using the recently proposed Inception score

    Symmetry and Complexity

    Get PDF
    Symmetry and complexity are the focus of a selection of outstanding papers, ranging from pure Mathematics and Physics to Computer Science and Engineering applications. This collection is based around fundamental problems arising from different fields, but all of them have the same task, i.e. breaking the complexity by the symmetry. In particular, in this Issue, there is an interesting paper dealing with circular multilevel systems in the frequency domain, where the analysis in the frequency domain gives a simple view of the system. Searching for symmetry in fractional oscillators or the analysis of symmetrical nanotubes are also some important contributions to this Special Issue. More papers, dealing with intelligent prognostics of degradation trajectories for rotating machinery in engineering applications or the analysis of Laplacian spectra for categorical product networks, show how this subject is interdisciplinary, i.e. ranging from theory to applications. In particular, the papers by Lee, based on the dynamics of trapped solitary waves for special differential equations, demonstrate how theory can help us to handle a practical problem. In this collection of papers, although encompassing various different fields, particular attention has been paid to the common task wherein the complexity is being broken by the search for symmetry

    Ensemble learning with discrete classifiers on small devices

    Get PDF
    Machine learning has become an integral part of everyday life ranging from applications in AI-powered search queries to (partial) autonomous driving. Many of the advances in machine learning and its application have been possible due to increases in computation power, i.e., by reducing manufacturing sizes while maintaining or even increasing energy consumption. However, 2-3 nm manufacturing is within reach, making further miniaturization increasingly difficult while thermal design power limits are simultaneously reached, rendering entire parts of the chip useless for certain computational loads. In this thesis, we investigate discrete classifier ensembles as a resource-efficient alternative that can be deployed to small devices that only require small amounts of energy. Discrete classifiers are classifiers that can be applied -- and oftentimes also trained -- without the need for costly floating-point operations. Hence, they are ideally suited for deployment to small devices with limited resources. The disadvantage of discrete classifiers is that their predictive performance often lacks behind their floating-point siblings. Here, the combination of multiple discrete classifiers into an ensemble can help to improve the predictive performance while still having a manageable resource consumption. This thesis studies discrete classifier ensembles from a theoretical point of view, an algorithmic point of view, and a practical point of view. In the theoretical investigation, the bias-variance decomposition and the double-descent phenomenon are examined. The bias-variance decomposition of the mean-squared error is re-visited and generalized to an arbitrary twice-differentiable loss function, which serves as a guiding tool throughout the thesis. Similarly, the double-descent phenomenon is -- for the first time -- studied comprehensively in the context of tree ensembles and specifically random forests. Contrary to established literature, the experiments in this thesis indicate that there is no double-descent in random forests. While the training of ensembles is well-studied in literature, the deployment to small devices is often neglected. Additionally, the training of ensembles on small devices has not been considered much so far. Hence, the algorithmic part of this thesis focuses on the deployment of discrete classifiers and the training of ensembles on small devices. First, a novel combination of ensemble pruning (i.e., removing classifiers from the ensemble) and ensemble refinement (i.e., re-training of classifiers in the ensemble) is presented, which uses a novel proximal gradient descent algorithm to minimize a combined loss function. The resulting algorithm removes unnecessary classifiers from an already trained ensemble while improving the performance of the remaining classifiers at the same time. Second, this algorithm is extended to the more challenging setting of online learning in which the algorithm receives training examples one by one. The resulting shrub ensembles algorithm allows the training of ensembles in an online fashion while maintaining a strictly bounded memory consumption. It outperforms existing state-of-the-art algorithms under resource constraints and offers competitive performance in the general case. Last, this thesis studies the deployment of decision tree ensembles to small devices by optimizing their memory layout. The key insight here is that decision trees have a probabilistic inference time because different observations can take different paths from the root to a leaf. By estimating the probability of visiting a particular node in the tree, one can place it favorably in the memory to maximize the caching behavior and, thus, increase its performance without changing the model. Last, several real-world applications of tree ensembles and Binarized Neural Networks are presented

    A Global Integration Platform for Optimizing Cooperative Modeling and Simultaneous Joint Inversion of Multi-domain Geophysical Data

    Get PDF
    This paper reviews the theoretical aspects and the practical issues of different types of geophysical integration approaches. Moreover it shows how these approaches can be combined and optimized into the same platform. We discuss both cooperative modeling and Simultaneous Joint Inversion (SJI) as complementary methods for integration of multi-domain geophysical data: these data can be collected at surface (seismic, electromagnetic, gravity) as well as in borehole (composite well logs). The main intrinsic difficulties of any SJI approach are the high computational requirements, the non-uniqueness of the final models, the proper choice of the relations between the different geophysical domains, the quantitative evaluation of reliability indicators. In order to face efficiently all these problems we propose and describe here a “systemic approach”: the algorithms of modeling and SJI are merged with an integration architecture that permits the selection of workflows and links between different algorithms, the management of data and models coming from different domains, the smart visualization of partial and final results. This Quantitative Integration System (QUIS) has been implemented into a complex software and hardware platform, comprising many advanced codes working in cooperation and running on powerful computer clusters. The paper is divided into two main parts. First we discuss the theoretical formulation of SJI and the key concepts of the QUIS platform. In the second part we present a synthetic SJI test and a case history of QUIS application to a real exploration problem

    Kernel Methods for Machine Learning with Life Science Applications

    Get PDF

    Assessing, testing, and challenging the computational power of quantum devices

    Get PDF
    Randomness is an intrinsic feature of quantum theory. The outcome of any measurement will be random, sampled from a probability distribution that is defined by the measured quantum state. The task of sampling from a prescribed probability distribution therefore seems to be a natural technological application of quantum devices. And indeed, certain random sampling tasks have been proposed to experimentally demonstrate the speedup of quantum over classical computation, so-called “quantum computational supremacy”. In the research presented in this thesis, I investigate the complexity-theoretic and physical foundations of quantum sampling algorithms. Using the theory of computational complexity, I assess the computational power of natural quantum simulators and close loopholes in the complexity-theoretic argument for the classical intractability of quantum samplers (Part I). In particular, I prove anticoncentration for quantum circuit families that give rise to a 2-design and review methods for proving average-case hardness. I present quantum random sampling schemes that are tailored to large-scale quantum simulation hardware but at the same time rise up to the highest standard in terms of their complexity-theoretic underpinning. Using methods from property testing and quantum system identification, I shed light on the question, how and under which conditions quantum sampling devices can be tested or verified in regimes that are not simulable on classical computers (Part II). I present a no-go result that prevents efficient verification of quantum random sampling schemes as well as approaches using which this no-go result can be circumvented. In particular, I develop fully efficient verification protocols in what I call the measurement-device-dependent scenario in which single-qubit measurements are assumed to function with high accuracy. Finally, I try to understand the physical mechanisms governing the computational boundary between classical and quantum computing devices by challenging their computational power using tools from computational physics and the theory of computational complexity (Part III). I develop efficiently computable measures of the infamous Monte Carlo sign problem and assess those measures both in terms of their practicability as a tool for alleviating or easing the sign problem and the computational complexity of this task. An overarching theme of the thesis is the quantum sign problem which arises due to destructive interference between paths – an intrinsically quantum effect. The (non-)existence of a sign problem takes on the role as a criterion which delineates the boundary between classical and quantum computing devices. I begin the thesis by identifying the quantum sign problem as a root of the computational intractability of quantum output probabilities. It turns out that the intricate structure of the probability distributions the sign problem gives rise to, prohibits their verification from few samples. In an ironic twist, I show that assessing the intrinsic sign problem of a quantum system is again an intractable problem
    • …
    corecore