73,211 research outputs found
Bird-Eye Transformers for Text Generation Models
Transformers have become an indispensable module for text generation models
since their great success in machine translation. Previous works attribute
the~success of transformers to the query-key-value dot-product attention, which
provides a robust inductive bias by the fully connected token graphs. However,
we found that self-attention has a severe limitation. When predicting the
(i+1)-th token, self-attention only takes the i-th token as an information
collector, and it tends to give a high attention weight to those tokens similar
to itself. Therefore, most of the historical information that occurred before
the i-th token is not taken into consideration. Based on this observation, in
this paper, we propose a new architecture, called bird-eye transformer(BET),
which goes one step further to improve the performance of transformers by
reweighting self-attention to encourage it to focus more on important
historical information. We have conducted experiments on multiple text
generation tasks, including machine translation (2 datasets) and language
models (3 datasets). These experimental~results show that our proposed model
achieves a better performance than the baseline transformer architectures
on~all~datasets. The code is released at:
\url{https://sites.google.com/view/bet-transformer/home}
Memory-Based Lexical Acquisition and Processing
Current approaches to computational lexicology in language technology are
knowledge-based (competence-oriented) and try to abstract away from specific
formalisms, domains, and applications. This results in severe complexity,
acquisition and reusability bottlenecks. As an alternative, we propose a
particular performance-oriented approach to Natural Language Processing based
on automatic memory-based learning of linguistic (lexical) tasks. The
consequences of the approach for computational lexicology are discussed, and
the application of the approach on a number of lexical acquisition and
disambiguation tasks in phonology, morphology and syntax is described.Comment: 18 page
LIG-CRIStAL System for the WMT17 Automatic Post-Editing Task
This paper presents the LIG-CRIStAL submission to the shared Automatic Post-
Editing task of WMT 2017. We propose two neural post-editing models: a
monosource model with a task-specific attention mechanism, which performs
particularly well in a low-resource scenario; and a chained architecture which
makes use of the source sentence to provide extra context. This latter
architecture manages to slightly improve our results when more training data is
available. We present and discuss our results on two datasets (en-de and de-en)
that are made available for the task.Comment: keywords: neural post-edition, attention model
Automatic Accuracy Prediction for AMR Parsing
Abstract Meaning Representation (AMR) represents sentences as directed,
acyclic and rooted graphs, aiming at capturing their meaning in a machine
readable format. AMR parsing converts natural language sentences into such
graphs. However, evaluating a parser on new data by means of comparison to
manually created AMR graphs is very costly. Also, we would like to be able to
detect parses of questionable quality, or preferring results of alternative
systems by selecting the ones for which we can assess good quality. We propose
AMR accuracy prediction as the task of predicting several metrics of
correctness for an automatically generated AMR parse - in absence of the
corresponding gold parse. We develop a neural end-to-end multi-output
regression model and perform three case studies: firstly, we evaluate the
model's capacity of predicting AMR parse accuracies and test whether it can
reliably assign high scores to gold parses. Secondly, we perform parse
selection based on predicted parse accuracies of candidate parses from
alternative systems, with the aim of improving overall results. Finally, we
predict system ranks for submissions from two AMR shared tasks on the basis of
their predicted parse accuracy averages. All experiments are carried out across
two different domains and show that our method is effective.Comment: accepted at *SEM 201
- …