2 research outputs found
An Overview of Natural Language State Representation for Reinforcement Learning
A suitable state representation is a fundamental part of the learning process
in Reinforcement Learning. In various tasks, the state can either be described
by natural language or be natural language itself. This survey outlines the
strategies used in the literature to build natural language state
representations. We appeal for more linguistically interpretable and grounded
representations, careful justification of design decisions and evaluation of
the effectiveness of different approaches.Comment: Accepted to the ICML 2020 Workshop on Language in Reinforcement
Learning (LaReL). 4 page
BanditRank: Learning to Rank Using Contextual Bandits
We propose an extensible deep learning method that uses reinforcement
learning to train neural networks for offline ranking in information retrieval
(IR). We call our method BanditRank as it treats ranking as a contextual bandit
problem. In the domain of learning to rank for IR, current deep learning models
are trained on objective functions different from the measures they are
evaluated on. Since most evaluation measures are discrete quantities, they
cannot be leveraged by directly using gradient descent algorithms without an
approximation. BanditRank bridges this gap by directly optimizing a
task-specific measure, such as mean average precision (MAP), using gradient
descent. Specifically, a contextual bandit whose action is to rank input
documents is trained using a policy gradient algorithm to directly maximize the
reward. The reward can be a single measure, such as MAP, or a combination of
several measures. The notion of ranking is also inherent in BanditRank, similar
to the current \textit{listwise} approaches. To evaluate the effectiveness of
BanditRank, we conducted a series of experiments on datasets related to three
different tasks, i.e., web search, community, and factoid question answering.
We found that it performs better than state-of-the-art methods when applied on
the question answering datasets. On the web search dataset, we found that
BanditRank performed better than four strong listwise baselines including
LambdaMART, AdaRank, ListNet and Coordinate Ascent.Comment: 9 page