107 research outputs found

    Towards computationally efficient neural networks with adaptive and dynamic computations

    Full text link
    Ces dernières années, l'intelligence artificielle a été considérablement avancée et l'apprentissage en profondeur, où des réseaux de neurones profonds sont utilisés pour tenter d'imiter vaguement le cerveau humain, y a contribué de manière significative. Les réseaux de neurones profonds sont désormais capables d'obtenir un grand succès sur la base d'une grande quantité de données et de ressources de calcul suffisantes. Malgré leur succès, leur capacité à s'adapter rapidement à de nouveaux concepts, tâches et environnements est assez limitée voire inexistante. Dans cette thèse, nous nous intéressons à la façon dont les réseaux de neurones profonds peuvent s'adapter à des circonstances en constante évolution ou totalement nouvelles, de la même manière que l'intelligence humaine, et introduisons en outre des modules architecturaux adaptatifs et dynamiques ou des cadres de méta-apprentissage pour que cela se produise de manière efficace sur le plan informatique. Cette thèse consiste en une série d'études proposant des méthodes pour utiliser des calculs adaptatifs et dynamiques pour aborder les problèmes d'adaptation qui sont étudiés sous différentes perspectives telles que les adaptations au niveau de la tâche, au niveau temporel et au niveau du contexte. Dans le premier article, nous nous concentrons sur l'adaptation rapide des tâches basée sur un cadre de méta-apprentissage. Plus précisément, nous étudions l'incertitude du modèle induite par l'adaptation rapide à une nouvelle tâche avec quelques exemples. Ce problème est atténué en combinant un méta-apprentissage efficace basé sur des gradients avec une inférence variationnelle non paramétrique dans un cadre probabiliste fondé sur des principes. C'est une étape importante vers un méta-apprentissage robuste que nous développons une méthode d'apprentissage bayésienne à quelques exemples pour éviter le surapprentissage au niveau des tâches. Dans le deuxième article, nous essayons d'améliorer les performances de la prédiction de la séquence (c'est-à-dire du futur) en introduisant une prédiction du futur sauteur basée sur la taille du pas adaptatif. C'est une capacité critique pour un agent intelligent d'explorer un environnement qui permet un apprentissage efficace avec une imagination sauteur futur. Nous rendons cela possible en introduisant le modèle hiérarchique d'espace d'état récurrent (HRSSM) qui peut découvrir la structure temporelle latente (par exemple, les sous-séquences) tout en modélisant ses transitions d'état stochastiques de manière hiérarchique. Enfin, dans le dernier article, nous étudions un cadre qui peut capturer le contexte global dans les données d'image de manière adaptative et traiter davantage les données en fonction de ces informations. Nous implémentons ce cadre en extrayant des concepts visuels de haut niveau à travers des modules d'attention et en utilisant un raisonnement basé sur des graphes pour en saisir le contexte global. De plus, des transformations au niveau des caractéristiques sont utilisées pour propager le contexte global à tous les descripteurs locaux de manière adaptative.Over the past few years, artificial intelligence has been greatly advanced, and deep learning, where deep neural networks are used to attempt to loosely emulate the human brain, has significantly contributed to it. Deep neural networks are now able to achieve great success based on a large amount of data and sufficient computational resources. Despite their success, their ability to quickly adapt to new concepts, tasks, and environments is quite limited or even non-existent. In this thesis, we are interested in how deep neural networks can become adaptive to continually changing or totally new circumstances, similarly to human intelligence, and further introduce adaptive and dynamic architectural modules or meta-learning frameworks to make it happen in computationally efficient ways. This thesis consists of a series of studies proposing methods to utilize adaptive and dynamic computations to tackle adaptation problems that are investigated from different perspectives such as task-level, temporal-level, and context-level adaptations. In the first article, we focus on task-level fast adaptation based on a meta-learning framework. More specifically, we investigate the inherent model uncertainty that is induced from quickly adapting to a new task with a few examples. This problem is alleviated by combining the efficient gradient-based meta-learning with nonparametric variational inference in a principled probabilistic framework. It is an important step towards robust meta-learning that we develop a Bayesian few-shot learning method to prevent task-level overfitting. In the second article, we attempt to improve the performance of sequence (i.e. future) prediction by introducing a jumpy future prediction that is based on the adaptive step size. It is a critical ability for an intelligent agent to explore an environment that enables efficient option-learning and jumpy future imagination. We make this possible by introducing the Hierarchical Recurrent State Space Model (HRSSM) that can discover the latent temporal structure (e.g. subsequences) while also modeling its stochastic state transitions hierarchically. Finally, in the last article, we investigate a framework that can capture the global context in image data in an adaptive way and further process the data based on that information. We implement this framework by extracting high-level visual concepts through attention modules and using graph-based reasoning to capture the global context from them. In addition, feature-wise transformations are used to propagate the global context to all local descriptors in an adaptive way

    A Survey of Deep Meta-Learning

    Get PDF
    Deep neural networks can achieve great successes when presented with large data sets and sufficient computational resources. However, their ability to learn new concepts quickly is quite limited. Meta-learning is one approach to address this issue, by enabling the network to learn how to learn. The exciting field of Deep Meta-Learning advances at great speed, but lacks a unified, insightful overview of current techniques. This work presents just that. After providing the reader with a theoretical foundation, we investigate and summarize key methods, which are categorized into i) metric-, ii) model-, and iii) optimization-based techniques. In addition, we identify the main open challenges, such as performance evaluations on heterogeneous benchmarks, and reduction of the computational costs of meta-learning.Comment: Extended version of book chapter in 'Metalearning: Applications to Automated Machine Learning and Data Mining' (2nd edition, forthcoming

    Deep representations of structures in the 3D-world

    Get PDF
    This thesis demonstrates a collection of neural network tools that leverage the structures and symmetries of the 3D-world. We have explored various aspects of a vision system ranging from relative pose estimation to 3D-part decomposition from 2D images. For any vision system, it is crucially important to understand and to resolve visual ambiguities in 3D arising from imaging methods. This thesis has shown that leveraging prior knowledge about the structures and the symmetries of the 3D-world in neural network architectures brings about better representations for ambiguous situations. It helps solve problems which are inherently ill-posed

    Deep Gaussian Processes: Advances in Models and Inference

    Get PDF
    Hierarchical models are certainly in fashion these days. It seems difficult to navigate the field of machine learning without encountering `deep' models of one sort or another. The popularity of the deep learning revolution has been driven by some striking empirical successes, prompting both intense rapture and intense criticism. The criticisms often centre around the lack of model uncertainty, leading to sometimes drastically overconfident predictions. Others point to the lack of a mechanism for incorporating prior knowledge, and the reliance on large datasets. A widely held hope is that a Bayesian approach might overcome these problems. The deep Gaussian process presents a paradigm for building deep models from a Bayesian perspective. A Gaussian process is a prior for functions. A deep Gaussian process uses several Gaussian process functions and combines them hierarchically through composition (that is, the output of one is the input to the next). The deep Gaussian process promises to capture the compositional nature of deep learning while mitigating some of the disadvantages through a Bayesian approach. The thesis develops deep Gaussian process modelling in a number of ways. The model is first interpreted differently from previous work, not as a `hierarchical prior' but as a factorized prior with an hierarchical likelihood. Mean functions are suggested to avoid issues of degeneracy and to aid initialization. The main contribution is a new method of inference that avoids the burden of representing the function values directly through an application of sparse variational inference. This method scales to arbitrarily large data and is shown to work well in practice through experiments. The use of variational inference recasts (approximate) inference as optimization of Gaussian distributions. This optimization has an exploitable geometry via the natural gradient. The natural gradient is shown to be advantageous for single layer non-conjugate models, and for the (final layer of a) deep Gaussian process model. Deep Gaussian processes can be a model both for complex associations between variables and complex marginal distributions of single variables. Incorporating noise in the hierarchy leads to complex marginal distribution through the non-linearities of the mappings at each layer. The inference required for noisy variables cannot be handled with sparse methods, as sparse methods rely on correlations between variables, which are absent for noisy variables. Instead, a more direct approach is developed, using an importance weighted variational scheme.Open Acces

    Information Bottleneck

    Get PDF
    The celebrated information bottleneck (IB) principle of Tishby et al. has recently enjoyed renewed attention due to its application in the area of deep learning. This collection investigates the IB principle in this new context. The individual chapters in this collection: • provide novel insights into the functional properties of the IB; • discuss the IB principle (and its derivates) as an objective for training multi-layer machine learning structures such as neural networks and decision trees; and • offer a new perspective on neural network learning via the lens of the IB framework. Our collection thus contributes to a better understanding of the IB principle specifically for deep learning and, more generally, of information–theoretic cost functions in machine learning. This paves the way toward explainable artificial intelligence

    Time-Conditioned Generative Modeling of Object-Centric Representations for Video Decomposition and Prediction

    Full text link
    When perceiving the world from multiple viewpoints, humans have the ability to reason about the complete objects in a compositional manner even when an object is completely occluded from certain viewpoints. Meanwhile, humans are able to imagine novel views after observing multiple viewpoints. Recent remarkable advances in multi-view object-centric learning still leaves some unresolved problems: 1) The shapes of partially or completely occluded objects can not be well reconstructed. 2) The novel viewpoint prediction depends on expensive viewpoint annotations rather than implicit rules in view representations. In this paper, we introduce a time-conditioned generative model for videos. To reconstruct the complete shape of an object accurately, we enhance the disentanglement between the latent representations of objects and views, where the latent representations of time-conditioned views are jointly inferred with a Transformer and then are input to a sequential extension of Slot Attention to learn object-centric representations. In addition, Gaussian processes are employed as priors of view latent variables for video generation and novel-view prediction without viewpoint annotations. Experiments on multiple datasets demonstrate that the proposed model can make object-centric video decomposition, reconstruct the complete shapes of occluded objects, and make novel-view predictions
    • …
    corecore