878 research outputs found

    Importance Weighted Adversarial Nets for Partial Domain Adaptation

    Full text link
    This paper proposes an importance weighted adversarial nets-based method for unsupervised domain adaptation, specific for partial domain adaptation where the target domain has less number of classes compared to the source domain. Previous domain adaptation methods generally assume the identical label spaces, such that reducing the distribution divergence leads to feasible knowledge transfer. However, such an assumption is no longer valid in a more realistic scenario that requires adaptation from a larger and more diverse source domain to a smaller target domain with less number of classes. This paper extends the adversarial nets-based domain adaptation and proposes a novel adversarial nets-based partial domain adaptation method to identify the source samples that are potentially from the outlier classes and, at the same time, reduce the shift of shared classes between domains

    Wasserstein Distance Guided Representation Learning for Domain Adaptation

    Full text link
    Domain adaptation aims at generalizing a high-performance learner on a target domain via utilizing the knowledge distilled from a source domain which has a different but related data distribution. One solution to domain adaptation is to learn domain invariant feature representations while the learned representations should also be discriminative in prediction. To learn such representations, domain adaptation frameworks usually include a domain invariant representation learning approach to measure and reduce the domain discrepancy, as well as a discriminator for classification. Inspired by Wasserstein GAN, in this paper we propose a novel approach to learn domain invariant feature representations, namely Wasserstein Distance Guided Representation Learning (WDGRL). WDGRL utilizes a neural network, denoted by the domain critic, to estimate empirical Wasserstein distance between the source and target samples and optimizes the feature extractor network to minimize the estimated Wasserstein distance in an adversarial manner. The theoretical advantages of Wasserstein distance for domain adaptation lie in its gradient property and promising generalization bound. Empirical studies on common sentiment and image classification adaptation datasets demonstrate that our proposed WDGRL outperforms the state-of-the-art domain invariant representation learning approaches.Comment: The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI 2018

    Replay-Guided Adversarial Environment Design

    Get PDF
    Deep reinforcement learning (RL) agents may successfully generalize to new settings if trained on an appropriately diverse set of environment and task configurations. Unsupervised Environment Design (UED) is a promising self-supervised RL paradigm, wherein the free parameters of an underspecified environment are automatically adapted during training to the agent's capabilities, leading to the emergence of diverse training environments. Here, we cast Prioritized Level Replay (PLR), an empirically successful but theoretically unmotivated method that selectively samples randomly-generated training levels, as UED. We argue that by curating completely random levels, PLR, too, can generate novel and complex levels for effective training. This insight reveals a natural class of UED methods we call Dual Curriculum Design (DCD). Crucially, DCD includes both PLR and a popular UED algorithm, PAIRED, as special cases and inherits similar theoretical guarantees. This connection allows us to develop novel theory for PLR, providing a version with a robustness guarantee at Nash equilibria. Furthermore, our theory suggests a highly counterintuitive improvement to PLR: by stopping the agent from updating its policy on uncurated levels (training on less data), we can improve the convergence to Nash equilibria. Indeed, our experiments confirm that our new method, PLR ⊥ , obtains better results on a suite of out-of-distribution, zero-shot transfer tasks, in addition to demonstrating that PLR ⊥ improves the performance of PAIRED, from which it inherited its theoretical framework

    Rethinking Domain Generalization: Discriminability and Generalizability

    Full text link
    Domain generalization (DG) endeavors to develop robust models that possess strong generalizability while preserving excellent discriminability. Nonetheless, pivotal DG techniques tend to improve the feature generalizability by learning domain-invariant representations, inadvertently overlooking the feature discriminability. On the one hand, the simultaneous attainment of generalizability and discriminability of features presents a complex challenge, often entailing inherent contradictions. This challenge becomes particularly pronounced when domain-invariant features manifest reduced discriminability owing to the inclusion of unstable factors, \emph{i.e.,} spurious correlations. On the other hand, prevailing domain-invariant methods can be categorized as category-level alignment, susceptible to discarding indispensable features possessing substantial generalizability and narrowing intra-class variations. To surmount these obstacles, we rethink DG from a new perspective that concurrently imbues features with formidable discriminability and robust generalizability, and present a novel framework, namely, Discriminative Microscopic Distribution Alignment (DMDA). DMDA incorporates two core components: Selective Channel Pruning~(SCP) and Micro-level Distribution Alignment (MDA). Concretely, SCP attempts to curtail redundancy within neural networks, prioritizing stable attributes conducive to accurate classification. This approach alleviates the adverse effect of spurious domain invariance and amplifies the feature discriminability. Besides, MDA accentuates micro-level alignment within each class, going beyond mere category-level alignment. This strategy accommodates sufficient generalizable features and facilitates within-class variations. Extensive experiments on four benchmark datasets corroborate the efficacy of our method

    Game Theory Solutions in Sensor-Based Human Activity Recognition: A Review

    Full text link
    The Human Activity Recognition (HAR) tasks automatically identify human activities using the sensor data, which has numerous applications in healthcare, sports, security, and human-computer interaction. Despite significant advances in HAR, critical challenges still exist. Game theory has emerged as a promising solution to address these challenges in machine learning problems including HAR. However, there is a lack of research work on applying game theory solutions to the HAR problems. This review paper explores the potential of game theory as a solution for HAR tasks, and bridges the gap between game theory and HAR research work by suggesting novel game-theoretic approaches for HAR problems. The contributions of this work include exploring how game theory can improve the accuracy and robustness of HAR models, investigating how game-theoretic concepts can optimize recognition algorithms, and discussing the game-theoretic approaches against the existing HAR methods. The objective is to provide insights into the potential of game theory as a solution for sensor-based HAR, and contribute to develop a more accurate and efficient recognition system in the future research directions

    Learning Curricula in Open-Ended Worlds

    Full text link
    Deep reinforcement learning (RL) provides powerful methods for training optimal sequential decision-making agents. As collecting real-world interactions can entail additional costs and safety risks, the common paradigm of sim2real conducts training in a simulator, followed by real-world deployment. Unfortunately, RL agents easily overfit to the choice of simulated training environments, and worse still, learning ends when the agent masters the specific set of simulated environments. In contrast, the real world is highly open-ended, featuring endlessly evolving environments and challenges, making such RL approaches unsuitable. Simply randomizing over simulated environments is insufficient, as it requires making arbitrary distributional assumptions and can be combinatorially less likely to sample specific environment instances that are useful for learning. An ideal learning process should automatically adapt the training environment to maximize the learning potential of the agent over an open-ended task space that matches or surpasses the complexity of the real world. This thesis develops a class of methods called Unsupervised Environment Design (UED), which aim to produce such open-ended processes. Given an environment design space, UED automatically generates an infinite sequence or curriculum of training environments at the frontier of the learning agent's capabilities. Through extensive empirical studies and theoretical arguments founded on minimax-regret decision theory and game theory, the findings in this thesis show that UED autocurricula can produce RL agents exhibiting significantly improved robustness and generalization to previously unseen environment instances. Such autocurricula are promising paths toward open-ended learning systems that achieve more general intelligence by continually generating and mastering additional challenges of their own design.Comment: PhD dissertatio

    Learning Curricula in Open-Ended Worlds

    Get PDF
    Deep reinforcement learning (RL) provides powerful methods for training optimal sequential decision-making agents. As collecting real-world interactions can entail additional costs and safety risks, the common paradigm of sim2real conducts training in a simulator, followed by real-world deployment. Unfortunately, RL agents easily overfit to the choice of simulated training environments, and worse still, learning ends when the agent masters the specific set of simulated environments. In contrast, the real-world is highly open-ended—featuring endlessly evolving environments and challenges, making such RL approaches unsuitable. Simply randomizing across a large space of simulated environments is insufficient, as it requires making arbitrary distributional assumptions, and as the design space grows, it can become combinatorially less likely to sample specific environment instances that are useful for learning. An ideal learning process should automatically adapt the training environment to maximize the learning potential of the agent over an open-ended task space that matches or surpasses the complexity of the real world. This thesis develops a class of methods called Unsupervised Environment Design (UED), which seeks to enable such an open-ended process via a principled approach for gradually improving the robustness and generality of the learning agent. Given a potentially open-ended environment design space, UED automatically generates an infinite sequence or curriculum of training environments at the frontier of the learning agent’s capabilities. Through both extensive empirical studies and theoretical arguments founded on minimax-regret decision theory and game theory, the findings in this thesis show that UED autocurricula can produce RL agents exhibiting significantly improved robustness and generalization to previously unseen environment instances. Such autocurricula are promising paths toward open-ended learning systems that approach general intelligence—a long sought-after ambition of artificial intelligence research—by continually generating and mastering additional challenges of their own design

    A Survey of Monte Carlo Tree Search Methods

    Get PDF
    Monte Carlo tree search (MCTS) is a recently proposed search method that combines the precision of tree search with the generality of random sampling. It has received considerable interest due to its spectacular success in the difficult problem of computer Go, but has also proved beneficial in a range of other domains. This paper is a survey of the literature to date, intended to provide a snapshot of the state of the art after the first five years of MCTS research. We outline the core algorithm's derivation, impart some structure on the many variations and enhancements that have been proposed, and summarize the results from the key game and nongame domains to which MCTS methods have been applied. A number of open research questions indicate that the field is ripe for future work
    corecore