8,728 research outputs found
LINSPECTOR: Multilingual Probing Tasks for Word Representations
Despite an ever growing number of word representation models introduced for a
large number of languages, there is a lack of a standardized technique to
provide insights into what is captured by these models. Such insights would
help the community to get an estimate of the downstream task performance, as
well as to design more informed neural architectures, while avoiding extensive
experimentation which requires substantial computational resources not all
researchers have access to. A recent development in NLP is to use simple
classification tasks, also called probing tasks, that test for a single
linguistic feature such as part-of-speech. Existing studies mostly focus on
exploring the linguistic information encoded by the continuous representations
of English text. However, from a typological perspective the morphologically
poor English is rather an outlier: the information encoded by the word order
and function words in English is often stored on a morphological level in other
languages. To address this, we introduce 15 type-level probing tasks such as
case marking, possession, word length, morphological tag count and pseudoword
identification for 24 languages. We present a reusable methodology for creation
and evaluation of such tests in a multilingual setting. We then present
experiments on several diverse multilingual word embedding models, in which we
relate the probing task performance for a diverse set of languages to a range
of five classic NLP tasks: POS-tagging, dependency parsing, semantic role
labeling, named entity recognition and natural language inference. We find that
a number of probing tests have significantly high positive correlation to the
downstream tasks, especially for morphologically rich languages. We show that
our tests can be used to explore word embeddings or black-box neural models for
linguistic cues in a multilingual setting.Comment: Demo is available from:
https://linspector.ukp.informatik.tu-darmstadt.de
Understanding Cross-Lingual Syntactic Transfer in Multilingual Recurrent Neural Networks
It is now established that modern neural language models can be successfully
trained on multiple languages simultaneously without changes to the underlying
architecture, providing an easy way to adapt a variety of NLP models to
low-resource languages. But what kind of knowledge is really shared among
languages within these models? Does multilingual training mostly lead to an
alignment of the lexical representation spaces or does it also enable the
sharing of purely grammatical knowledge? In this paper we dissect different
forms of cross-lingual transfer and look for its most determining factors,
using a variety of models and probing tasks. We find that exposing our language
models to a related language does not always increase grammatical knowledge in
the target language, and that optimal conditions for lexical-semantic transfer
may not be optimal for syntactic transfer.Comment: v2: Added acknowledgements, 9 pages single column with 6 figure
Sentence Embeddings for Russian NLU
We investigate the performance of sentence embeddings models on several tasks
for the Russian language. In our comparison, we include such tasks as multiple
choice question answering, next sentence prediction, and paraphrase
identification. We employ FastText embeddings as a baseline and compare it to
ELMo and BERT embeddings. We conduct two series of experiments, using both
unsupervised (i.e., based on similarity measure only) and supervised approaches
for the tasks. Finally, we present datasets for multiple choice question
answering and next sentence prediction in Russian.Comment: to appear in AIST201
ParsBERT: Transformer-based Model for Persian Language Understanding
The surge of pre-trained language models has begun a new era in the field of
Natural Language Processing (NLP) by allowing us to build powerful language
models. Among these models, Transformer-based models such as BERT have become
increasingly popular due to their state-of-the-art performance. However, these
models are usually focused on English, leaving other languages to multilingual
models with limited resources. This paper proposes a monolingual BERT for the
Persian language (ParsBERT), which shows its state-of-the-art performance
compared to other architectures and multilingual models. Also, since the amount
of data available for NLP tasks in Persian is very restricted, a massive
dataset for different NLP tasks as well as pre-training the model is composed.
ParsBERT obtains higher scores in all datasets, including existing ones as well
as composed ones and improves the state-of-the-art performance by outperforming
both multilingual BERT and other prior works in Sentiment Analysis, Text
Classification and Named Entity Recognition tasks.Comment: 10 pages, 5 figures, 7 tables, table 7 corrected and some refs
related to table
Machine Translation Evaluation with Neural Networks
We present a framework for machine translation evaluation using neural
networks in a pairwise setting, where the goal is to select the better
translation from a pair of hypotheses, given the reference translation. In this
framework, lexical, syntactic and semantic information from the reference and
the two hypotheses is embedded into compact distributed vector representations,
and fed into a multi-layer neural network that models nonlinear interactions
between each of the hypotheses and the reference, as well as between the two
hypotheses. We experiment with the benchmark datasets from the WMT Metrics
shared task, on which we obtain the best results published so far, with the
basic network configuration. We also perform a series of experiments to analyze
and understand the contribution of the different components of the network. We
evaluate variants and extensions, including fine-tuning of the semantic
embeddings, and sentence-based representations modeled with convolutional and
recurrent neural networks. In summary, the proposed framework is flexible and
generalizable, allows for efficient learning and scoring, and provides an MT
evaluation metric that correlates with human judgments, and is on par with the
state of the art.Comment: Machine Translation, Reference-based MT Evaluation, Deep Neural
Networks, Distributed Representation of Texts, Textual Similarit
Tensorized Embedding Layers for Efficient Model Compression
The embedding layers transforming input words into real vectors are the key
components of deep neural networks used in natural language processing.
However, when the vocabulary is large, the corresponding weight matrices can be
enormous, which precludes their deployment in a limited resource setting. We
introduce a novel way of parametrizing embedding layers based on the Tensor
Train (TT) decomposition, which allows compressing the model significantly at
the cost of a negligible drop or even a slight gain in performance. We evaluate
our method on a wide range of benchmarks in natural language processing and
analyze the trade-off between performance and compression ratios for a wide
range of architectures, from MLPs to LSTMs and Transformers
Conditional Generators of Words Definitions
We explore recently introduced definition modeling technique that provided
the tool for evaluation of different distributed vector representations of
words through modeling dictionary definitions of words. In this work, we study
the problem of word ambiguities in definition modeling and propose a possible
solution by employing latent variable modeling and soft attention mechanisms.
Our quantitative and qualitative evaluation and analysis of the model shows
that taking into account words ambiguity and polysemy leads to performance
improvement.Comment: Accepted as a conference paper at ACL 201
Synchronous Bidirectional Neural Machine Translation
Existing approaches to neural machine translation (NMT) generate the target
language sequence token by token from left to right. However, this kind of
unidirectional decoding framework cannot make full use of the target-side
future contexts which can be produced in a right-to-left decoding direction,
and thus suffers from the issue of unbalanced outputs. In this paper, we
introduce a synchronous bidirectional neural machine translation (SB-NMT) that
predicts its outputs using left-to-right and right-to-left decoding
simultaneously and interactively, in order to leverage both of the history and
future information at the same time. Specifically, we first propose a new
algorithm that enables synchronous bidirectional decoding in a single model.
Then, we present an interactive decoding model in which left-to-right
(right-to-left) generation does not only depend on its previously generated
outputs, but also relies on future contexts predicted by right-to-left
(left-to-right) decoding. We extensively evaluate the proposed SB-NMT model on
large-scale NIST Chinese-English, WMT14 English-German, and WMT18
Russian-English translation tasks. Experimental results demonstrate that our
model achieves significant improvements over the strong Transformer model by
3.92, 1.49 and 1.04 BLEU points respectively, and obtains the state-of-the-art
performance on Chinese-English and English-German translation tasks.Comment: Published by TACL 2019, 15 pages, 9 figures, 9 tabel
Enhancing lexical-based approach with external knowledge for Vietnamese multiple-choice machine reading comprehension
Although Vietnamese is the 17th most popular native-speaker language in the
world, there are not many research studies on Vietnamese machine reading
comprehension (MRC), the task of understanding a text and answering questions
about it. One of the reasons is because of the lack of high-quality benchmark
datasets for this task. In this work, we construct a dataset which consists of
2,783 pairs of multiple-choice questions and answers based on 417 Vietnamese
texts which are commonly used for teaching reading comprehension for elementary
school pupils. In addition, we propose a lexical-based MRC method that utilizes
semantic similarity measures and external knowledge sources to analyze
questions and extract answers from the given text. We compare the performance
of the proposed model with several baseline lexical-based and neural
network-based models. Our proposed method achieves 61.81% by accuracy, which is
5.51% higher than the best baseline model. We also measure human performance on
our dataset and find that there is a big gap between machine-model and human
performances. This indicates that significant progress can be made on this
task. The dataset is freely available on our website for research purposes
Analysis Methods in Neural Language Processing: A Survey
The field of natural language processing has seen impressive progress in
recent years, with neural network models replacing many of the traditional
systems. A plethora of new models have been proposed, many of which are thought
to be opaque compared to their feature-rich counterparts. This has led
researchers to analyze, interpret, and evaluate neural networks in novel and
more fine-grained ways. In this survey paper, we review analysis methods in
neural language processing, categorize them according to prominent research
trends, highlight existing limitations, and point to potential directions for
future work.Comment: Version including the supplementary materials (3 tables), also
available at https://boknilev.github.io/nlp-analysis-method
- …