35 research outputs found

    Exploration by Maximizing R\'enyi Entropy for Reward-Free RL Framework

    Full text link
    Exploration is essential for reinforcement learning (RL). To face the challenges of exploration, we consider a reward-free RL framework that completely separates exploration from exploitation and brings new challenges for exploration algorithms. In the exploration phase, the agent learns an exploratory policy by interacting with a reward-free environment and collects a dataset of transitions by executing the policy. In the planning phase, the agent computes a good policy for any reward function based on the dataset without further interacting with the environment. This framework is suitable for the meta RL setting where there are many reward functions of interest. In the exploration phase, we propose to maximize the Renyi entropy over the state-action space and justify this objective theoretically. The success of using Renyi entropy as the objective results from its encouragement to explore the hard-to-reach state-actions. We further deduce a policy gradient formulation for this objective and design a practical exploration algorithm that can deal with complex environments. In the planning phase, we solve for good policies given arbitrary reward functions using a batch RL algorithm. Empirically, we show that our exploration algorithm is effective and sample efficient, and results in superior policies for arbitrary reward functions in the planning phase.Comment: Accepted by AAAI-2

    Efficient Meta Reinforcement Learning for Preference-based Fast Adaptation

    Full text link
    Learning new task-specific skills from a few trials is a fundamental challenge for artificial intelligence. Meta reinforcement learning (meta-RL) tackles this problem by learning transferable policies that support few-shot adaptation to unseen tasks. Despite recent advances in meta-RL, most existing methods require the access to the environmental reward function of new tasks to infer the task objective, which is not realistic in many practical applications. To bridge this gap, we study the problem of few-shot adaptation in the context of human-in-the-loop reinforcement learning. We develop a meta-RL algorithm that enables fast policy adaptation with preference-based feedback. The agent can adapt to new tasks by querying human's preference between behavior trajectories instead of using per-step numeric rewards. By extending techniques from information theory, our approach can design query sequences to maximize the information gain from human interactions while tolerating the inherent error of non-expert human oracle. In experiments, we extensively evaluate our method, Adaptation with Noisy OracLE (ANOLE), on a variety of meta-RL benchmark tasks and demonstrate substantial improvement over baseline algorithms in terms of both feedback efficiency and error tolerance.Comment: Thirty-sixth Conference on Neural Information Processing Systems (NeurIPS 2022

    Unsupervised reinforcement learning via state entropy maximization

    Get PDF
    Reinforcement Learning (RL) provides a powerful framework to address sequential decision-making problems in which the transition dynamics is unknown or too complex to be represented. The RL approach is based on speculating what is the best decision to make given sample estimates obtained from previous interactions, a recipe that led to several breakthroughs in various domains, ranging from game playing to robotics. Despite their success, current RL methods hardly generalize from one task to another, and achieving the kind of generalization obtained through unsupervised pre-training in non-sequential problems seems unthinkable. Unsupervised RL has recently emerged as a way to improve generalization of RL methods. Just as its non-sequential counterpart, the unsupervised RL framework comprises two phases: An unsupervised pre-training phase, in which the agent interacts with the environment without external feedback, and a supervised fine-tuning phase, in which the agent aims to efficiently solve a task in the same environment by exploiting the knowledge acquired during pre-training. In this thesis, we study unsupervised RL via state entropy maximization, in which the agent makes use of the unsupervised interactions to pre-train a policy that maximizes the entropy of its induced state distribution. First, we provide a theoretical characterization of the learning problem by considering a convex RL formulation that subsumes state entropy maximization. Our analysis shows that maximizing the state entropy in finite trials is inherently harder than RL. Then, we study the state entropy maximization problem from an optimization perspective. Especially, we show that the primal formulation of the corresponding optimization problem can be (approximately) addressed through tractable linear programs. Finally, we provide the first practical methodologies for state entropy maximization in complex domains, both when the pre-training takes place in a single environment as well as multiple environments

    Learning Continually Under Changing Data Distributions

    Get PDF

    Causally-Inspired Generalizable Deep Learning Methods under Distribution Shifts

    Get PDF
    Deep learning methods achieved remarkable success in various areas of artificial intelligence, due to their powerful distribution-matching capabilities. However, these successes rely heavily on the i.i.d assumption, i.e., the data distributions in the training and test datasets are the same. In this way, current deep learning methods typically exhibit poor generalization under distribution shift, performing poorly on test data with a distribution that differs from the training data. This significantly hinders the application of deep learning methods to real-world scenarios, as the distribution of test data is not always the same as the training distribution in our rapidly evolving world. This thesis aims to discuss how to construct generalizable deep learning methods under distribution shifts. To achieve this, the thesis first models one prediction task as a structural causal model (SCM) which establishes the relationship between variables using directed acyclic graphs. In an SCM, some variables are easily changed across domains while others are not. However, deep learning methods often unintentionally mix invariant variables with easily changed variables, and thus deviate the learned model from the true one, resulting in the poor generalization ability under distribution shift. To remedy this issue, we propose specific algorithms to model such an invariant part of the SCM with deep learning methods, and experimentally show it is beneficial for the trained model to generalize well into different distributions of the same task. Last, we further propose to identify and model the variant information in the new test distribution so that we can fully adapt the trained deep learning model accordingly. We show the method can be extended for several practical applications, such as classification under label shift, image translation under semantics shift, robotics control in dynamics generalization and generalizing large language models into visual question-answer tasks

    Improving Representation Learning for Deep Clustering and Few-shot Learning

    Get PDF
    The amounts of data in the world have increased dramatically in recent years, and it is quickly becoming infeasible for humans to label all these data. It is therefore crucial that modern machine learning systems can operate with few or no labels. The introduction of deep learning and deep neural networks has led to impressive advancements in several areas of machine learning. These advancements are largely due to the unprecedented ability of deep neural networks to learn powerful representations from a wide range of complex input signals. This ability is especially important when labeled data is limited, as the absence of a strong supervisory signal forces models to rely more on intrinsic properties of the data and its representations. This thesis focuses on two key concepts in deep learning with few or no labels. First, we aim to improve representation quality in deep clustering - both for single-view and multi-view data. Current models for deep clustering face challenges related to properly representing semantic similarities, which is crucial for the models to discover meaningful clusterings. This is especially challenging with multi-view data, since the information required for successful clustering might be scattered across many views. Second, we focus on few-shot learning, and how geometrical properties of representations influence few-shot classification performance. We find that a large number of recent methods for few-shot learning embed representations on the hypersphere. Hence, we seek to understand what makes the hypersphere a particularly suitable embedding space for few-shot learning. Our work on single-view deep clustering addresses the susceptibility of deep clustering models to find trivial solutions with non-meaningful representations. To address this issue, we present a new auxiliary objective that - when compared to the popular autoencoder-based approach - better aligns with the main clustering objective, resulting in improved clustering performance. Similarly, our work on multi-view clustering focuses on how representations can be learned from multi-view data, in order to make the representations suitable for the clustering objective. Where recent methods for deep multi-view clustering have focused on aligning view-specific representations, we find that this alignment procedure might actually be detrimental to representation quality. We investigate the effects of representation alignment, and provide novel insights on when alignment is beneficial, and when it is not. Based on our findings, we present several new methods for deep multi-view clustering - both alignment and non-alignment-based - that out-perform current state-of-the-art methods. Our first work on few-shot learning aims to tackle the hubness problem, which has been shown to have negative effects on few-shot classification performance. To this end, we present two new methods to embed representations on the hypersphere for few-shot learning. Further, we provide both theoretical and experimental evidence indicating that embedding representations as uniformly as possible on the hypersphere reduces hubness, and improves classification accuracy. Furthermore, based on our findings on hyperspherical embeddings for few-shot learning, we seek to improve the understanding of representation norms. In particular, we ask what type of information the norm carries, and why it is often beneficial to discard the norm in classification models. We answer this question by presenting a novel hypothesis on the relationship between representation norm and the number of a certain class of objects in the image. We then analyze our hypothesis both theoretically and experimentally, presenting promising results that corroborate the hypothesis

    Learning to Optimize: from Theory to Practice

    Get PDF
    Optimization is at the heart of everyday applications, from finding the fastest route for navigation to designing efficient drugs for diseases. The study of optimization algorithms has focused on developing general approaches that do not adapt to specific problem instances. While they enjoy wide applicability, they forgo the potentially useful information embedded in the structure of an instance. Furthermore, as new optimization problems appear, the algorithm development process relies heavily on domain expertise to identify special properties and design methods to exploit them. Such design philosophy is labor-intensive and difficult to deploy efficiently to a broad range of domain-specific optimization problems, which are becoming ubiquitous in the pursuit of ever more personalized applications. In this dissertation, we consider different hybrid versions of classical optimization algorithms with data-driven techniques. We aim to equip classical algorithms with the ability to adapt their behaviors on the fly based on specific problem instances. A common theme in our approaches is to train the data-driven components on a pre-collected batch of representative problem instances to optimize some performance metrics, e.g., wall-clock time. Varying the integration details, we present several approaches to learning data-driven optimization modules for combinatorial optimization problems and study the corresponding fundamental research questions on policy learning. We provide multiple practical experimental results to showcase the practicality of our methods which lead to state-of-the-art performance on some classes of problems.</p

    Advances in uncertainty modelling : from epistemic uncertainty estimation to generalized generative flow networks

    Full text link
    Les problèmes de prise de décision se produisent souvent dans des situations d'incertitude, englobant à la fois l'incertitude aléatoire due à la présence de processus inhérents aléatoires et l'incertitude épistémique liée aux connaissances limitées. Cette thèse explore le concept d'incertitude, un aspect crucial de l'apprentissage automatique et un facteur clé pour que les agents rationnels puissent déterminer où allouer leurs ressources afin d'obtenir les meilleurs résultats. Traditionnellement, l'incertitude est encodée à travers une probabilité postérieure, obtenue par des techniques d'inférence Bayésienne approximatives. Le premier ensemble de contributions de cette thèse tourne autour des propriétés mathématiques des réseaux de flot génératifs, qui sont des modèles probabilistes de séquences discrètes et des échantillonneurs amortis de distributions de probabilités non normalisées. Les réseaux de flot génératifs trouvent des applications dans l'inférence Bayésienne et peuvent être utilisés pour l'estimation de l'incertitude. De plus, ils sont utiles pour les problèmes de recherche dans de vastes espaces compositionnels. Au-delà du renforcement du cadre mathématique sous-jacent, une étude comparative avec les méthodes variationnelles hiérarchiques est fournie, mettant en lumière les importants avantages des réseaux de flot génératifs, tant d'un point de vue théorique que par le biais d'expériences diverses. Ces contributions incluent une théorie étendant les réseaux de flot génératifs à des espaces continus ou plus généraux, ce qui permet de modéliser la probabilité postérieure et l'incertitude dans de nombreux contextes intéressants. La théorie est validée expérimentalement dans divers domaines. Le deuxième axe de travail de cette thèse concerne les mesures alternatives de l'incertitude épistémique au-delà de la modélisation de la probabilité postérieure. La méthode présentée, appelée Estimation Directe de l'Incertitude Épistémique (DEUP), surmonte une faiblesse majeure des techniques d'inférence Bayésienne approximatives due à la mauvaise spécification du modèle. DEUP repose sur le maintien d'un prédicteur secondaire des erreurs du prédicteur principal, à partir duquel des mesures d'incertitude épistémique peuvent être déduites.Decision-making problems often occur under uncertainty, encompassing both aleatoric uncertainty arising from inherent randomness in processes and epistemic uncertainty due to limited knowledge. This thesis explores the concept of uncertainty, a crucial aspect of machine learning and a key factor for rational agents to determine where to allocate their resources for achieving the best possible results. Traditionally, uncertainty is encoded in a posterior distribution, obtained by approximate \textit{Bayesian} inference techniques. This thesis's first set of contributions revolves around the mathematical properties of generative flow networks, which are probabilistic models over discrete sequences and amortized samplers of unnormalized probability distributions. Generative flow networks find applications in Bayesian inference and can be used for uncertainty estimation. Additionally, they are helpful for search problems in large compositional spaces. Beyond deepening the mathematical framework underlying them, a comparative study with hierarchical variational methods is provided, shedding light on the significant advantages of generative flow networks, both from a theoretical point of view and via diverse experiments. These contributions include a theory extending generative flow networks to continuous or more general spaces, which allows modelling the Bayesian posterior and uncertainty in many interesting settings. The theory is experimentally validated in various domains. This thesis's second line of work is about alternative measures of epistemic uncertainty beyond posterior modelling. The presented method, called Direct Epistemic Uncertainty Estimation (DEUP), overcomes a major shortcoming of approximate Bayesian inference techniques caused by model misspecification. DEUP relies on maintaining a secondary predictor of the errors of the main predictor, from which measures of epistemic uncertainty can be deduced

    A Review of Deep Learning Techniques for Speech Processing

    Full text link
    The field of speech processing has undergone a transformative shift with the advent of deep learning. The use of multiple processing layers has enabled the creation of models capable of extracting intricate features from speech data. This development has paved the way for unparalleled advancements in speech recognition, text-to-speech synthesis, automatic speech recognition, and emotion recognition, propelling the performance of these tasks to unprecedented heights. The power of deep learning techniques has opened up new avenues for research and innovation in the field of speech processing, with far-reaching implications for a range of industries and applications. This review paper provides a comprehensive overview of the key deep learning models and their applications in speech-processing tasks. We begin by tracing the evolution of speech processing research, from early approaches, such as MFCC and HMM, to more recent advances in deep learning architectures, such as CNNs, RNNs, transformers, conformers, and diffusion models. We categorize the approaches and compare their strengths and weaknesses for solving speech-processing tasks. Furthermore, we extensively cover various speech-processing tasks, datasets, and benchmarks used in the literature and describe how different deep-learning networks have been utilized to tackle these tasks. Additionally, we discuss the challenges and future directions of deep learning in speech processing, including the need for more parameter-efficient, interpretable models and the potential of deep learning for multimodal speech processing. By examining the field's evolution, comparing and contrasting different approaches, and highlighting future directions and challenges, we hope to inspire further research in this exciting and rapidly advancing field

    Information Bottleneck

    Get PDF
    The celebrated information bottleneck (IB) principle of Tishby et al. has recently enjoyed renewed attention due to its application in the area of deep learning. This collection investigates the IB principle in this new context. The individual chapters in this collection: • provide novel insights into the functional properties of the IB; • discuss the IB principle (and its derivates) as an objective for training multi-layer machine learning structures such as neural networks and decision trees; and • offer a new perspective on neural network learning via the lens of the IB framework. Our collection thus contributes to a better understanding of the IB principle specifically for deep learning and, more generally, of information–theoretic cost functions in machine learning. This paves the way toward explainable artificial intelligence
    corecore