48 research outputs found

    Learning General World Models in a Handful of Reward-Free Deployments

    Get PDF
    Building generally capable agents is a grand challenge for deep reinforcement learning (RL). To approach this challenge practically, we outline two key desiderata: 1) to facilitate generalization, exploration should be task agnostic; 2) to facilitate scalability, exploration policies should collect large quantities of data without costly centralized retraining. Combining these two properties, we introduce the reward-free deployment efficiency setting, a new paradigm for RL research. We then present CASCADE, a novel approach for self-supervised exploration in this new setting. CASCADE seeks to learn a world model by collecting data with a population of agents, using an information theoretic objective inspired by Bayesian Active Learning. CASCADE achieves this by specifically maximizing the diversity of trajectories sampled by the population through a novel cascading objective. We provide theoretical intuition for CASCADE which we show in a tabular setting improves upon naïve approaches that do not account for population diversity. We then demonstrate that CASCADE collects diverse task-agnostic datasets and learns agents that generalize zero-shot to novel, unseen downstream tasks on Atari, MiniGrid, Crafter and the DM Control Suite. Code and videos are available at https://ycxuyingchen.github.io/cascade/

    Microenvironmental acidosis in carcinogenesis and metastases: new strategies in prevention and therapy

    Get PDF

    Higher integrability and stability of (p,q)-quasiminimizers

    No full text
    Using purely variational methods, we prove local and global higher integrability results for upper gradients of quasiminimizers of a (p,q) -Dirichlet integral with fixed boundary data, assuming it belongs to a slightly better Newtonian space. We also obtain a stability property with respect to the varying exponents p and q. The setting is a doubling metric measure space supporting a Poincaré inequality

    Geometrically coupled monte carlo sampling

    No full text
    © 2018 Curran Associates Inc..All rights reserved. Monte Carlo sampling in high-dimensional, low-sample settings is important in many machine learning tasks. We improve current methods for sampling in Euclidean spaces by avoiding independence, and instead consider ways to couple samples. We show fundamental connections to optimal transport theory, leading to novel sampling algorithms, and providing new theoretical grounding for existing strategies. We compare our new strategies against prior methods for improving sample efficiency, including quasi-Monte Carlo, by studying discrepancy. We explore our findings empirically, and observe benefits of our sampling schemes for reinforcement learning and generative modelling

    Using bifurcations for diversity in differentiable games

    No full text
    Ridge Rider (RR) is an algorithm for finding diverse solutions to optimization problems by following eigenvectors of the Hessian ("ridges"). RR is designed for conservative gradient systems (i.e. settings involving a single loss function), where it branches at saddles - the only relevant bifurcation points. We generalize this idea to non-conservative, multi-agent gradient systems by identifying new types of bifurcation points and proposing a method to follow eigenvectors with complex eigenvalues. We give theoretical motivation for our method - denoted Game Ridge Rider (GRR) - by leveraging machinery from the field of dynamical systems. Finally, we empirically motivate our method by constructing novel toy problems where we can visualize new phenomena and by finding diverse solutions in the iterated prisoners' dilemma, where existing methods often converge to a single solution mode
    corecore