878 research outputs found
Importance Weighted Adversarial Nets for Partial Domain Adaptation
This paper proposes an importance weighted adversarial nets-based method for
unsupervised domain adaptation, specific for partial domain adaptation where
the target domain has less number of classes compared to the source domain.
Previous domain adaptation methods generally assume the identical label spaces,
such that reducing the distribution divergence leads to feasible knowledge
transfer. However, such an assumption is no longer valid in a more realistic
scenario that requires adaptation from a larger and more diverse source domain
to a smaller target domain with less number of classes. This paper extends the
adversarial nets-based domain adaptation and proposes a novel adversarial
nets-based partial domain adaptation method to identify the source samples that
are potentially from the outlier classes and, at the same time, reduce the
shift of shared classes between domains
Wasserstein Distance Guided Representation Learning for Domain Adaptation
Domain adaptation aims at generalizing a high-performance learner on a target
domain via utilizing the knowledge distilled from a source domain which has a
different but related data distribution. One solution to domain adaptation is
to learn domain invariant feature representations while the learned
representations should also be discriminative in prediction. To learn such
representations, domain adaptation frameworks usually include a domain
invariant representation learning approach to measure and reduce the domain
discrepancy, as well as a discriminator for classification. Inspired by
Wasserstein GAN, in this paper we propose a novel approach to learn domain
invariant feature representations, namely Wasserstein Distance Guided
Representation Learning (WDGRL). WDGRL utilizes a neural network, denoted by
the domain critic, to estimate empirical Wasserstein distance between the
source and target samples and optimizes the feature extractor network to
minimize the estimated Wasserstein distance in an adversarial manner. The
theoretical advantages of Wasserstein distance for domain adaptation lie in its
gradient property and promising generalization bound. Empirical studies on
common sentiment and image classification adaptation datasets demonstrate that
our proposed WDGRL outperforms the state-of-the-art domain invariant
representation learning approaches.Comment: The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI
2018
Replay-Guided Adversarial Environment Design
Deep reinforcement learning (RL) agents may successfully generalize to new settings if trained on an appropriately diverse set of environment and task configurations. Unsupervised Environment Design (UED) is a promising self-supervised RL paradigm, wherein the free parameters of an underspecified environment are automatically adapted during training to the agent's capabilities, leading to the emergence of diverse training environments. Here, we cast Prioritized Level Replay (PLR), an empirically successful but theoretically unmotivated method that selectively samples randomly-generated training levels, as UED. We argue that by curating completely random levels, PLR, too, can generate novel and complex levels for effective training. This insight reveals a natural class of UED methods we call Dual Curriculum Design (DCD). Crucially, DCD includes both PLR and a popular UED algorithm, PAIRED, as special cases and inherits similar theoretical guarantees. This connection allows us to develop novel theory for PLR, providing a version with a robustness guarantee at Nash equilibria. Furthermore, our theory suggests a highly counterintuitive improvement to PLR: by stopping the agent from updating its policy on uncurated levels (training on less data), we can improve the convergence to Nash equilibria. Indeed, our experiments confirm that our new method, PLR
⊥
, obtains better results on a suite of out-of-distribution, zero-shot transfer tasks, in addition to demonstrating that PLR
⊥
improves the performance of PAIRED, from which it inherited its theoretical framework
Rethinking Domain Generalization: Discriminability and Generalizability
Domain generalization (DG) endeavors to develop robust models that possess
strong generalizability while preserving excellent discriminability.
Nonetheless, pivotal DG techniques tend to improve the feature generalizability
by learning domain-invariant representations, inadvertently overlooking the
feature discriminability. On the one hand, the simultaneous attainment of
generalizability and discriminability of features presents a complex challenge,
often entailing inherent contradictions. This challenge becomes particularly
pronounced when domain-invariant features manifest reduced discriminability
owing to the inclusion of unstable factors, \emph{i.e.,} spurious correlations.
On the other hand, prevailing domain-invariant methods can be categorized as
category-level alignment, susceptible to discarding indispensable features
possessing substantial generalizability and narrowing intra-class variations.
To surmount these obstacles, we rethink DG from a new perspective that
concurrently imbues features with formidable discriminability and robust
generalizability, and present a novel framework, namely, Discriminative
Microscopic Distribution Alignment (DMDA). DMDA incorporates two core
components: Selective Channel Pruning~(SCP) and Micro-level Distribution
Alignment (MDA). Concretely, SCP attempts to curtail redundancy within neural
networks, prioritizing stable attributes conducive to accurate classification.
This approach alleviates the adverse effect of spurious domain invariance and
amplifies the feature discriminability. Besides, MDA accentuates micro-level
alignment within each class, going beyond mere category-level alignment. This
strategy accommodates sufficient generalizable features and facilitates
within-class variations. Extensive experiments on four benchmark datasets
corroborate the efficacy of our method
Game Theory Solutions in Sensor-Based Human Activity Recognition: A Review
The Human Activity Recognition (HAR) tasks automatically identify human
activities using the sensor data, which has numerous applications in
healthcare, sports, security, and human-computer interaction. Despite
significant advances in HAR, critical challenges still exist. Game theory has
emerged as a promising solution to address these challenges in machine learning
problems including HAR. However, there is a lack of research work on applying
game theory solutions to the HAR problems. This review paper explores the
potential of game theory as a solution for HAR tasks, and bridges the gap
between game theory and HAR research work by suggesting novel game-theoretic
approaches for HAR problems. The contributions of this work include exploring
how game theory can improve the accuracy and robustness of HAR models,
investigating how game-theoretic concepts can optimize recognition algorithms,
and discussing the game-theoretic approaches against the existing HAR methods.
The objective is to provide insights into the potential of game theory as a
solution for sensor-based HAR, and contribute to develop a more accurate and
efficient recognition system in the future research directions
Learning Curricula in Open-Ended Worlds
Deep reinforcement learning (RL) provides powerful methods for training
optimal sequential decision-making agents. As collecting real-world
interactions can entail additional costs and safety risks, the common paradigm
of sim2real conducts training in a simulator, followed by real-world
deployment. Unfortunately, RL agents easily overfit to the choice of simulated
training environments, and worse still, learning ends when the agent masters
the specific set of simulated environments. In contrast, the real world is
highly open-ended, featuring endlessly evolving environments and challenges,
making such RL approaches unsuitable. Simply randomizing over simulated
environments is insufficient, as it requires making arbitrary distributional
assumptions and can be combinatorially less likely to sample specific
environment instances that are useful for learning. An ideal learning process
should automatically adapt the training environment to maximize the learning
potential of the agent over an open-ended task space that matches or surpasses
the complexity of the real world. This thesis develops a class of methods
called Unsupervised Environment Design (UED), which aim to produce such
open-ended processes. Given an environment design space, UED automatically
generates an infinite sequence or curriculum of training environments at the
frontier of the learning agent's capabilities. Through extensive empirical
studies and theoretical arguments founded on minimax-regret decision theory and
game theory, the findings in this thesis show that UED autocurricula can
produce RL agents exhibiting significantly improved robustness and
generalization to previously unseen environment instances. Such autocurricula
are promising paths toward open-ended learning systems that achieve more
general intelligence by continually generating and mastering additional
challenges of their own design.Comment: PhD dissertatio
Learning Curricula in Open-Ended Worlds
Deep reinforcement learning (RL) provides powerful methods for training optimal sequential decision-making agents. As collecting real-world interactions can entail additional costs and safety risks, the common paradigm of sim2real conducts training in a simulator, followed by real-world deployment. Unfortunately, RL agents easily overfit to the choice of simulated training environments, and worse still, learning ends when the agent masters the specific set of simulated environments. In contrast, the real-world is highly open-ended—featuring endlessly evolving environments and challenges, making such RL approaches unsuitable. Simply randomizing across a large space of simulated environments is insufficient, as it requires making arbitrary distributional assumptions, and as the design space grows, it can become combinatorially less likely to sample specific environment instances that are useful for learning. An ideal learning process should automatically adapt the training environment to maximize the learning potential of the agent over an open-ended task space that matches or surpasses the complexity of the real world. This thesis develops a class of methods called Unsupervised Environment Design (UED), which seeks to enable such an open-ended process via a principled approach for gradually improving the robustness and generality of the learning agent. Given a potentially open-ended environment design space, UED automatically generates an infinite sequence or curriculum of training environments at the frontier of the learning agent’s capabilities. Through both extensive empirical studies and theoretical arguments founded on minimax-regret decision theory and game theory, the findings in this thesis show that UED autocurricula can produce RL agents exhibiting significantly improved robustness and generalization to previously unseen environment instances. Such autocurricula are promising paths toward open-ended learning systems that approach general intelligence—a long sought-after ambition of artificial intelligence research—by continually generating and mastering additional challenges of their own design
A Survey of Monte Carlo Tree Search Methods
Monte Carlo tree search (MCTS) is a recently proposed search method that combines the precision of tree search with the generality of random sampling. It has received considerable interest due to its spectacular success in the difficult problem of computer Go, but has also proved beneficial in a range of other domains. This paper is a survey of the literature to date, intended to provide a snapshot of the state of the art after the first five years of MCTS research. We outline the core algorithm's derivation, impart some structure on the many variations and enhancements that have been proposed, and summarize the results from the key game and nongame domains to which MCTS methods have been applied. A number of open research questions indicate that the field is ripe for future work
- …