187 research outputs found
Meta-Reinforcement Learning via Exploratory Task Clustering
Meta-reinforcement learning (meta-RL) aims to quickly solve new tasks by
leveraging knowledge from prior tasks. However, previous studies often assume a
single mode homogeneous task distribution, ignoring possible structured
heterogeneity among tasks. Leveraging such structures can better facilitate
knowledge sharing among related tasks and thus improve sample efficiency. In
this paper, we explore the structured heterogeneity among tasks via clustering
to improve meta-RL. We develop a dedicated exploratory policy to discover task
structures via divide-and-conquer. The knowledge of the identified clusters
helps to narrow the search space of task-specific information, leading to more
sample efficient policy adaptation. Experiments on various MuJoCo tasks showed
the proposed method can unravel cluster structures effectively in both rewards
and state dynamics, proving strong advantages against a set of state-of-the-art
baselines.Comment: 22 page
When Are Linear Stochastic Bandits Attackable?
We study adversarial attacks on linear stochastic bandits: by manipulating
the rewards, an adversary aims to control the behaviour of the bandit
algorithm. Perhaps surprisingly, we first show that some attack goals can never
be achieved. This is in sharp contrast to context-free stochastic bandits, and
is intrinsically due to the correlation among arms in linear stochastic
bandits. Motivated by this finding, this paper studies the attackability of a
-armed linear bandit environment. We first provide a complete necessity and
sufficiency characterization of attackability based on the geometry of the
arms' context vectors. We then propose a two-stage attack method against LinUCB
and Robust Phase Elimination. The method first asserts whether the given
environment is attackable; and if yes, it poisons the rewards to force the
algorithm to pull a target arm linear times using only a sublinear cost.
Numerical experiments further validate the effectiveness and cost-efficiency of
the proposed attack method.Comment: 27 pages, 3 figures, ICML 202
Multi-Objective Intrinsic Reward Learning for Conversational Recommender Systems
Conversational Recommender Systems (CRS) actively elicit user preferences to
generate adaptive recommendations. Mainstream reinforcement learning-based CRS
solutions heavily rely on handcrafted reward functions, which may not be
aligned with user intent in CRS tasks. Therefore, the design of task-specific
rewards is critical to facilitate CRS policy learning, which remains largely
under-explored in the literature. In this work, we propose a novel approach to
address this challenge by learning intrinsic rewards from interactions with
users. Specifically, we formulate intrinsic reward learning as a
multi-objective bi-level optimization problem. The inner level optimizes the
CRS policy augmented by the learned intrinsic rewards, while the outer level
drives the intrinsic rewards to optimize two CRS-specific objectives:
maximizing the success rate and minimizing the number of turns to reach a
successful recommendation in conversations. To evaluate the effectiveness of
our approach, we conduct extensive experiments on three public CRS benchmarks.
The results show that our algorithm significantly improves CRS performance by
exploiting informative learned intrinsic rewards.Comment: 11 page
Reversible Action Design for Combinatorial Optimization with Reinforcement Learning
Combinatorial optimization problem (COP) over graphs is a fundamental
challenge in optimization. Reinforcement learning (RL) has recently emerged as
a new framework to tackle these problems and has demonstrated promising
results. However, most RL solutions employ a greedy manner to construct the
solution incrementally, thus inevitably pose unnecessary dependency on action
sequences and need a lot of problem-specific designs. We propose a general RL
framework that not only exhibits state-of-the-art empirical performance but
also generalizes to a variety class of COPs. Specifically, we define state as a
solution to a problem instance and action as a perturbation to this solution.
We utilize graph neural networks (GNN) to extract latent representations for
given problem instances for state-action encoding, and then apply deep
Q-learning to obtain a policy that gradually refines the solution by flipping
or swapping vertex labels. Experiments are conducted on Maximum -Cut and
Traveling Salesman Problem and performance improvement is achieved against a
set of learning-based and heuristic baselines
Extract interaction detection methods from the biological literature
Abstract Background Considerable efforts have been made to extract protein-protein interactions from the biological literature, but little work has been done on the extraction of interaction detection methods. It is crucial to annotate the detection methods in the literature, since different detection methods shed different degrees of reliability on the reported interactions. However, the diversity of method mentions in the literature makes the automatic extraction quite challenging. Results In this article, we develop a generative topic model, the Correlated Method-Word model (CMW model) to extract the detection methods from the literature. In the CMW model, we formulate the correlation between the different methods and related words in a probabilistic framework in order to infer the potential methods from the given document. By applying the model on a corpus of 5319 full text documents annotated by the MINT and IntAct databases, we observe promising results, which outperform the best result reported in the BioCreative II challenge evaluation. Conclusion From the promising experiment results, we can see that the CMW model overcomes the issues caused by the diversity in the method mentions and properly captures the in-depth correlations between the detection methods and related words. The performance outperforming the baseline methods confirms that the dependence assumptions of the model are reasonable and the model is competent for the practical processing.</p
Learning from Crowds by Modeling Common Confusions
Crowdsourcing provides a practical way to obtain large amounts of labeled
data at a low cost. However, the annotation quality of annotators varies
considerably, which imposes new challenges in learning a high-quality model
from the crowdsourced annotations. In this work, we provide a new perspective
to decompose annotation noise into common noise and individual noise and
differentiate the source of confusion based on instance difficulty and
annotator expertise on a per-instance-annotator basis. We realize this new
crowdsourcing model by an end-to-end learning solution with two types of noise
adaptation layers: one is shared across annotators to capture their commonly
shared confusions, and the other one is pertaining to each annotator to realize
individual confusion. To recognize the source of noise in each annotation, we
use an auxiliary network to choose the two noise adaptation layers with respect
to both instances and annotators. Extensive experiments on both synthesized and
real-world benchmarks demonstrate the effectiveness of our proposed common
noise adaptation solution.Comment: Accepted by AAAI 202
- β¦