146,942 research outputs found
Classical Structured Prediction Losses for Sequence to Sequence Learning
There has been much recent work on training neural attention models at the
sequence-level using either reinforcement learning-style methods or by
optimizing the beam. In this paper, we survey a range of classical objective
functions that have been widely used to train linear models for structured
prediction and apply them to neural sequence to sequence models. Our
experiments show that these losses can perform surprisingly well by slightly
outperforming beam search optimization in a like for like setup. We also report
new state of the art results on both IWSLT'14 German-English translation as
well as Gigaword abstractive summarization. On the larger WMT'14 English-French
translation task, sequence-level training achieves 41.5 BLEU which is on par
with the state of the art.Comment: 10 pages, NAACL 201
IRGAN: A Minimax Game for Unifying Generative and Discriminative Information Retrieval Models
This paper provides a unified account of two schools of thinking in
information retrieval modelling: the generative retrieval focusing on
predicting relevant documents given a query, and the discriminative retrieval
focusing on predicting relevancy given a query-document pair. We propose a game
theoretical minimax game to iteratively optimise both models. On one hand, the
discriminative model, aiming to mine signals from labelled and unlabelled data,
provides guidance to train the generative model towards fitting the underlying
relevance distribution over documents given the query. On the other hand, the
generative model, acting as an attacker to the current discriminative model,
generates difficult examples for the discriminative model in an adversarial way
by minimising its discrimination objective. With the competition between these
two models, we show that the unified framework takes advantage of both schools
of thinking: (i) the generative model learns to fit the relevance distribution
over documents via the signals from the discriminative model, and (ii) the
discriminative model is able to exploit the unlabelled data selected by the
generative model to achieve a better estimation for document ranking. Our
experimental results have demonstrated significant performance gains as much as
23.96% on Precision@5 and 15.50% on MAP over strong baselines in a variety of
applications including web search, item recommendation, and question answering.Comment: 12 pages; appendix adde
Optimizing Neural Architecture Search using Limited GPU Time in a Dynamic Search Space: A Gene Expression Programming Approach
Efficient identification of people and objects, segmentation of regions of
interest and extraction of relevant data in images, texts, audios and videos
are evolving considerably in these past years, which deep learning methods,
combined with recent improvements in computational resources, contributed
greatly for this achievement. Although its outstanding potential, development
of efficient architectures and modules requires expert knowledge and amount of
resource time available. In this paper, we propose an evolutionary-based neural
architecture search approach for efficient discovery of convolutional models in
a dynamic search space, within only 24 GPU hours. With its efficient search
environment and phenotype representation, Gene Expression Programming is
adapted for network's cell generation. Despite having limited GPU resource time
and broad search space, our proposal achieved similar state-of-the-art to
manually-designed convolutional networks and also NAS-generated ones, even
beating similar constrained evolutionary-based NAS works. The best cells in
different runs achieved stable results, with a mean error of 2.82% in CIFAR-10
dataset (which the best model achieved an error of 2.67%) and 18.83% for
CIFAR-100 (best model with 18.16%). For ImageNet in the mobile setting, our
best model achieved top-1 and top-5 errors of 29.51% and 10.37%, respectively.
Although evolutionary-based NAS works were reported to require a considerable
amount of GPU time for architecture search, our approach obtained promising
results in little time, encouraging further experiments in evolutionary-based
NAS, for search and network representation improvements.Comment: Accepted for presentation at the IEEE Congress on Evolutionary
Computation (IEEE CEC) 202
Deep Active Learning for Dialogue Generation
We propose an online, end-to-end, neural generative conversational model for
open-domain dialogue. It is trained using a unique combination of offline
two-phase supervised learning and online human-in-the-loop active learning.
While most existing research proposes offline supervision or hand-crafted
reward functions for online reinforcement, we devise a novel interactive
learning mechanism based on hamming-diverse beam search for response generation
and one-character user-feedback at each step. Experiments show that our model
inherently promotes the generation of semantically relevant and interesting
responses, and can be used to train agents with customized personas, moods and
conversational styles.Comment: Accepted at 6th Joint Conference on Lexical and Computational
Semantics (*SEM) 2017 (Previously titled "Online Sequence-to-Sequence Active
Learning for Open-Domain Dialogue Generation" on ArXiv
- …