70,974 research outputs found
Improving Search through A3C Reinforcement Learning based Conversational Agent
We develop a reinforcement learning based search assistant which can assist
users through a set of actions and sequence of interactions to enable them
realize their intent. Our approach caters to subjective search where the user
is seeking digital assets such as images which is fundamentally different from
the tasks which have objective and limited search modalities. Labeled
conversational data is generally not available in such search tasks and
training the agent through human interactions can be time consuming. We propose
a stochastic virtual user which impersonates a real user and can be used to
sample user behavior efficiently to train the agent which accelerates the
bootstrapping of the agent. We develop A3C algorithm based context preserving
architecture which enables the agent to provide contextual assistance to the
user. We compare the A3C agent with Q-learning and evaluate its performance on
average rewards and state values it obtains with the virtual user in validation
episodes. Our experiments show that the agent learns to achieve higher rewards
and better states.Comment: 17 pages, 7 figure
Describing Videos by Exploiting Temporal Structure
Recent progress in using recurrent neural networks (RNNs) for image
description has motivated the exploration of their application for video
description. However, while images are static, working with videos requires
modeling their dynamic temporal structure and then properly integrating that
information into a natural language description. In this context, we propose an
approach that successfully takes into account both the local and global
temporal structure of videos to produce descriptions. First, our approach
incorporates a spatial temporal 3-D convolutional neural network (3-D CNN)
representation of the short temporal dynamics. The 3-D CNN representation is
trained on video action recognition tasks, so as to produce a representation
that is tuned to human motion and behavior. Second we propose a temporal
attention mechanism that allows to go beyond local temporal modeling and learns
to automatically select the most relevant temporal segments given the
text-generating RNN. Our approach exceeds the current state-of-art for both
BLEU and METEOR metrics on the Youtube2Text dataset. We also present results on
a new, larger and more challenging dataset of paired video and natural language
descriptions.Comment: Accepted to ICCV15. This version comes with code release and
supplementary materia
- …