1,172 research outputs found
Distributed Evaluations: Ending Neural Point Metrics
With the rise of neural models across the field of information retrieval,
numerous publications have incrementally pushed the envelope of performance for
a multitude of IR tasks. However, these networks often sample data in random
order, are initialized randomly, and their success is determined by a single
evaluation score. These issues are aggravated by neural models achieving
incremental improvements from previous neural baselines, leading to multiple
near state of the art models that are difficult to reproduce and quickly become
deprecated. As neural methods are starting to be incorporated into low resource
and noisy collections that further exacerbate this issue, we propose evaluating
neural models both over multiple random seeds and a set of hyperparameters
within distance of the chosen configuration for a given metric.Comment: ACM SIGIR - LND4IR Worksho
Hacking Google reCAPTCHA v3 using Reinforcement Learning
We present a Reinforcement Learning (RL) methodology to bypass Google
reCAPTCHA v3. We formulate the problem as a grid world where the agent learns
how to move the mouse and click on the reCAPTCHA button to receive a high
score. We study the performance of the agent when we vary the cell size of the
grid world and show that the performance drops when the agent takes big steps
toward the goal. Finally, we used a divide and conquer strategy to defeat the
reCAPTCHA system for any grid resolution. Our proposed method achieves a
success rate of 97.4% on a 100x100 grid and 96.7% on a 1000x1000 screen
resolution.Comment: Accepted for the Conference on Reinforcement Learning and Decision
Making (RLDM) 201
Changing Model Behavior at Test-Time Using Reinforcement Learning
Machine learning models are often used at test-time subject to constraints
and trade-offs not present at training-time. For example, a computer vision
model operating on an embedded device may need to perform real-time inference,
or a translation model operating on a cell phone may wish to bound its average
compute time in order to be power-efficient. In this work we describe a
mixture-of-experts model and show how to change its test-time resource-usage on
a per-input basis using reinforcement learning. We test our method on a small
MNIST-based example.Comment: Submitted to ICLR 2017 Workshop Trac
Learnings Options End-to-End for Continuous Action Tasks
We present new results on learning temporally extended actions for
continuoustasks, using the options framework (Suttonet al.[1999b], Precup
[2000]). In orderto achieve this goal we work with the option-critic
architecture (Baconet al.[2017])using a deliberation cost and train it with
proximal policy optimization (Schulmanet al.[2017]) instead of vanilla policy
gradient. Results on Mujoco domains arepromising, but lead to interesting
questions aboutwhena given option should beused, an issue directly connected to
the use of initiation sets
Parameter Space Noise for Exploration
Deep reinforcement learning (RL) methods generally engage in exploratory
behavior through noise injection in the action space. An alternative is to add
noise directly to the agent's parameters, which can lead to more consistent
exploration and a richer set of behaviors. Methods such as evolutionary
strategies use parameter perturbations, but discard all temporal structure in
the process and require significantly more samples. Combining parameter noise
with traditional RL methods allows to combine the best of both worlds. We
demonstrate that both off- and on-policy methods benefit from this approach
through experimental comparison of DQN, DDPG, and TRPO on high-dimensional
discrete action environments as well as continuous control tasks. Our results
show that RL with parameter noise learns more efficiently than traditional RL
with action space noise and evolutionary strategies individually.Comment: Updated to camera-ready ICLR submissio
Analyzing Language Learned by an Active Question Answering Agent
We analyze the language learned by an agent trained with reinforcement
learning as a component of the ActiveQA system [Buck et al., 2017]. In
ActiveQA, question answering is framed as a reinforcement learning task in
which an agent sits between the user and a black box question-answering system.
The agent learns to reformulate the user's questions to elicit the optimal
answers. It probes the system with many versions of a question that are
generated via a sequence-to-sequence question reformulation model, then
aggregates the returned evidence to find the best answer. This process is an
instance of \emph{machine-machine} communication. The question reformulation
model must adapt its language to increase the quality of the answers returned,
matching the language of the question answering system. We find that the agent
does not learn transformations that align with semantic intuitions but
discovers through learning classical information retrieval techniques such as
tf-idf re-weighting and stemming.Comment: Emergent Communication Workshop, NIPS 201
Memory Augmented Self-Play
Self-play is an unsupervised training procedure which enables the
reinforcement learning agents to explore the environment without requiring any
external rewards. We augment the self-play setting by providing an external
memory where the agent can store experience from the previous tasks. This
enables the agent to come up with more diverse self-play tasks resulting in
faster exploration of the environment. The agent pretrained in the memory
augmented self-play setting easily outperforms the agent pretrained in
no-memory self-play setting
Blindfold Baselines for Embodied QA
We explore blindfold (question-only) baselines for Embodied Question
Answering. The EmbodiedQA task requires an agent to answer a question by
intelligently navigating in a simulated environment, gathering necessary visual
information only through first-person vision before finally answering.
Consequently, a blindfold baseline which ignores the environment and visual
information is a degenerate solution, yet we show through our experiments on
the EQAv1 dataset that a simple question-only baseline achieves
state-of-the-art results on the EmbodiedQA task in all cases except when the
agent is spawned extremely close to the object.Comment: NIPS 2018 Visually-Grounded Interaction and Language (ViGilL)
Worksho
Independently Controllable Features
Finding features that disentangle the different causes of variation in real
data is a difficult task, that has nonetheless received considerable attention
in static domains like natural images. Interactive environments, in which an
agent can deliberately take actions, offer an opportunity to tackle this task
better, because the agent can experiment with different actions and observe
their effects. We introduce the idea that in interactive environments, latent
factors that control the variation in observed data can be identified by
figuring out what the agent can control. We propose a naive method to find
factors that explain or measure the effect of the actions of a learner, and
test it in illustrative experiments.Comment: RLDM submissio
Look, Investigate, and Classify: A Deep Hybrid Attention Method for Breast Cancer Classification
One issue with computer based histopathology image analysis is that the size
of the raw image is usually very large. Taking the raw image as input to the
deep learning model would be computationally expensive while resizing the raw
image to low resolution would incur information loss. In this paper, we present
a novel deep hybrid attention approach to breast cancer classification. It
first adaptively selects a sequence of coarse regions from the raw image by a
hard visual attention algorithm, and then for each such region it is able to
investigate the abnormal parts based on a soft-attention mechanism. A recurrent
network is then built to make decisions to classify the image region and also
to predict the location of the image region to be investigated at the next time
step. As the region selection process is non-differentiable, we optimize the
whole network through a reinforcement approach to learn an optimal policy to
classify the regions. Based on this novel Look, Investigate and Classify
approach, we only need to process a fraction of the pixels in the raw image
resulting in significant saving in computational resources without sacrificing
performances. Our approach is evaluated on a public breast cancer
histopathology database, where it demonstrates superior performance to the
state-of-the-art deep learning approaches, achieving around 96\% classification
accuracy while only 15% of raw pixels are used.Comment: Accepted to ISBI'1
- …