1,372 research outputs found
Fine-tuning Multi-hop Question Answering with Hierarchical Graph Network
In this paper, we present a two stage model for multi-hop question answering.
The first stage is a hierarchical graph network, which is used to reason over
multi-hop question and is capable to capture different levels of granularity
using the nature structure(i.e., paragraphs, questions, sentences and entities)
of documents. The reasoning process is convert to node classify task(i.e.,
paragraph nodes and sentences nodes). The second stage is a language model
fine-tuning task. In a word, stage one use graph neural network to select and
concatenate support sentences as one paragraph, and stage two find the answer
span in language model fine-tuning paradigm.Comment: the experience result is not as good as I excep
RIDE: Rewarding Impact-Driven Exploration for Procedurally-Generated Environments
Exploration in sparse reward environments remains one of the key challenges
of model-free reinforcement learning. Instead of solely relying on extrinsic
rewards provided by the environment, many state-of-the-art methods use
intrinsic rewards to encourage exploration. However, we show that existing
methods fall short in procedurally-generated environments where an agent is
unlikely to visit a state more than once. We propose a novel type of intrinsic
reward which encourages the agent to take actions that lead to significant
changes in its learned state representation. We evaluate our method on multiple
challenging procedurally-generated tasks in MiniGrid, as well as on tasks with
high-dimensional observations used in prior work. Our experiments demonstrate
that this approach is more sample efficient than existing exploration methods,
particularly for procedurally-generated MiniGrid environments. Furthermore, we
analyze the learned behavior as well as the intrinsic reward received by our
agent. In contrast to previous approaches, our intrinsic reward does not
diminish during the course of training and it rewards the agent substantially
more for interacting with objects that it can control
Pretraining in Deep Reinforcement Learning: A Survey
The past few years have seen rapid progress in combining reinforcement
learning (RL) with deep learning. Various breakthroughs ranging from games to
robotics have spurred the interest in designing sophisticated RL algorithms and
systems. However, the prevailing workflow in RL is to learn tabula rasa, which
may incur computational inefficiency. This precludes continuous deployment of
RL algorithms and potentially excludes researchers without large-scale
computing resources. In many other areas of machine learning, the pretraining
paradigm has shown to be effective in acquiring transferable knowledge, which
can be utilized for a variety of downstream tasks. Recently, we saw a surge of
interest in Pretraining for Deep RL with promising results. However, much of
the research has been based on different experimental settings. Due to the
nature of RL, pretraining in this field is faced with unique challenges and
hence requires new design principles. In this survey, we seek to systematically
review existing works in pretraining for deep reinforcement learning, provide a
taxonomy of these methods, discuss each sub-field, and bring attention to open
problems and future directions
Recommended from our members
Depth uncertainty in neural networks
Existing methods for estimating uncertainty in deep learning tend to require multiple forward passes, making them unsuitable for applications where computational resources are limited. To solve this, we perform probabilistic reasoning over the depth of neural networks. Different depths correspond to subnetworks which share weights and whose predictions are combined via marginalisation, yielding model uncertainty. By exploiting the sequential structure of feed-forward networks, we are able to both evaluate our training objective and make predictions with a single forward pass. We validate our approach on real-world regression and image classification tasks. Our approach provides uncertainty calibration, robustness to dataset shift, and accuracies competitive with more computationally expensive baselines
- …