13,502 research outputs found
Go From the General to the Particular: Multi-Domain Translation with Domain Transformation Networks
The key challenge of multi-domain translation lies in simultaneously encoding
both the general knowledge shared across domains and the particular knowledge
distinctive to each domain in a unified model. Previous work shows that the
standard neural machine translation (NMT) model, trained on mixed-domain data,
generally captures the general knowledge, but misses the domain-specific
knowledge. In response to this problem, we augment NMT model with additional
domain transformation networks to transform the general representations to
domain-specific representations, which are subsequently fed to the NMT decoder.
To guarantee the knowledge transformation, we also propose two complementary
supervision signals by leveraging the power of knowledge distillation and
adversarial learning. Experimental results on several language pairs, covering
both balanced and unbalanced multi-domain translation, demonstrate the
effectiveness and universality of the proposed approach. Encouragingly, the
proposed unified model achieves comparable results with the fine-tuning
approach that requires multiple models to preserve the particular knowledge.
Further analyses reveal that the domain transformation networks successfully
capture the domain-specific knowledge as expected.Comment: AAAI 202
Large Margin Neural Language Model
We propose a large margin criterion for training neural language models.
Conventionally, neural language models are trained by minimizing perplexity
(PPL) on grammatical sentences. However, we demonstrate that PPL may not be the
best metric to optimize in some tasks, and further propose a large margin
formulation. The proposed method aims to enlarge the margin between the "good"
and "bad" sentences in a task-specific sense. It is trained end-to-end and can
be widely applied to tasks that involve re-scoring of generated text. Compared
with minimum-PPL training, our method gains up to 1.1 WER reduction for speech
recognition and 1.0 BLEU increase for machine translation.Comment: 9 pages. Accepted as a long paper in EMNLP201
From Word to Sense Embeddings: A Survey on Vector Representations of Meaning
Over the past years, distributed semantic representations have proved to be
effective and flexible keepers of prior knowledge to be integrated into
downstream applications. This survey focuses on the representation of meaning.
We start from the theoretical background behind word vector space models and
highlight one of their major limitations: the meaning conflation deficiency,
which arises from representing a word with all its possible meanings as a
single vector. Then, we explain how this deficiency can be addressed through a
transition from the word level to the more fine-grained level of word senses
(in its broader acceptation) as a method for modelling unambiguous lexical
meaning. We present a comprehensive overview of the wide range of techniques in
the two main branches of sense representation, i.e., unsupervised and
knowledge-based. Finally, this survey covers the main evaluation procedures and
applications for this type of representation, and provides an analysis of four
of its important aspects: interpretability, sense granularity, adaptability to
different domains and compositionality.Comment: 46 pages, 8 figures. Published in Journal of Artificial Intelligence
Researc
Adversarial Connective-exploiting Networks for Implicit Discourse Relation Classification
Implicit discourse relation classification is of great challenge due to the
lack of connectives as strong linguistic cues, which motivates the use of
annotated implicit connectives to improve the recognition. We propose a feature
imitation framework in which an implicit relation network is driven to learn
from another neural network with access to connectives, and thus encouraged to
extract similarly salient features for accurate classification. We develop an
adversarial model to enable an adaptive imitation scheme through competition
between the implicit network and a rival feature discriminator. Our method
effectively transfers discriminability of connectives to the implicit features,
and achieves state-of-the-art performance on the PDTB benchmark.Comment: To appear in ACL201
- …