Search CORE

8,728 research outputs found

LINSPECTOR: Multilingual Probing Tasks for Word Representations

Author: Gurevych Iryna
Kuznetsov Ilia
Vania Clara
Şahin Gözde Gül
Publication venue
Publication date: 11/12/2019
Field of study

Despite an ever growing number of word representation models introduced for a large number of languages, there is a lack of a standardized technique to provide insights into what is captured by these models. Such insights would help the community to get an estimate of the downstream task performance, as well as to design more informed neural architectures, while avoiding extensive experimentation which requires substantial computational resources not all researchers have access to. A recent development in NLP is to use simple classification tasks, also called probing tasks, that test for a single linguistic feature such as part-of-speech. Existing studies mostly focus on exploring the linguistic information encoded by the continuous representations of English text. However, from a typological perspective the morphologically poor English is rather an outlier: the information encoded by the word order and function words in English is often stored on a morphological level in other languages. To address this, we introduce 15 type-level probing tasks such as case marking, possession, word length, morphological tag count and pseudoword identification for 24 languages. We present a reusable methodology for creation and evaluation of such tests in a multilingual setting. We then present experiments on several diverse multilingual word embedding models, in which we relate the probing task performance for a diverse set of languages to a range of five classic NLP tasks: POS-tagging, dependency parsing, semantic role labeling, named entity recognition and natural language inference. We find that a number of probing tests have significantly high positive correlation to the downstream tasks, especially for morphologically rich languages. We show that our tests can be used to explore word embeddings or black-box neural models for linguistic cues in a multilingual setting.Comment: Demo is available from: https://linspector.ukp.informatik.tu-darmstadt.de

arXiv.org e-Print Archive

Understanding Cross-Lingual Syntactic Transfer in Multilingual Recurrent Neural Networks

Author: Bisazza Arianna
Dhar Prajit
Publication venue
Publication date: 28/05/2020
Field of study

It is now established that modern neural language models can be successfully trained on multiple languages simultaneously without changes to the underlying architecture, providing an easy way to adapt a variety of NLP models to low-resource languages. But what kind of knowledge is really shared among languages within these models? Does multilingual training mostly lead to an alignment of the lexical representation spaces or does it also enable the sharing of purely grammatical knowledge? In this paper we dissect different forms of cross-lingual transfer and look for its most determining factors, using a variety of models and probing tasks. We find that exposing our language models to a related language does not always increase grammatical knowledge in the target language, and that optimal conditions for lexical-semantic transfer may not be optimal for syntactic transfer.Comment: v2: Added acknowledgements, 9 pages single column with 6 figure

arXiv.org e-Print Archive

Sentence Embeddings for Russian NLU

Author: Artemova Ekaterina
Popov Dmitry
Pugachev Alexander
Svitanko Elizaveta
Svyatokum Polina
Publication venue
Publication date: 29/10/2019
Field of study

We investigate the performance of sentence embeddings models on several tasks for the Russian language. In our comparison, we include such tasks as multiple choice question answering, next sentence prediction, and paraphrase identification. We employ FastText embeddings as a baseline and compare it to ELMo and BERT embeddings. We conduct two series of experiments, using both unsupervised (i.e., based on similarity measure only) and supervised approaches for the tasks. Finally, we present datasets for multiple choice question answering and next sentence prediction in Russian.Comment: to appear in AIST201

arXiv.org e-Print Archive

ParsBERT: Transformer-based Model for Persian Language Understanding

Author: Farahani Marzieh
Farahani Mehrdad
Gharachorloo Mohammad
Manthouri Mohammad
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 31/05/2020
Field of study

The surge of pre-trained language models has begun a new era in the field of Natural Language Processing (NLP) by allowing us to build powerful language models. Among these models, Transformer-based models such as BERT have become increasingly popular due to their state-of-the-art performance. However, these models are usually focused on English, leaving other languages to multilingual models with limited resources. This paper proposes a monolingual BERT for the Persian language (ParsBERT), which shows its state-of-the-art performance compared to other architectures and multilingual models. Also, since the amount of data available for NLP tasks in Persian is very restricted, a massive dataset for different NLP tasks as well as pre-training the model is composed. ParsBERT obtains higher scores in all datasets, including existing ones as well as composed ones and improves the state-of-the-art performance by outperforming both multilingual BERT and other prior works in Sentiment Analysis, Text Classification and Named Entity Recognition tasks.Comment: 10 pages, 5 figures, 7 tables, table 7 corrected and some refs related to table

arXiv.org e-Print Archive

Machine Translation Evaluation with Neural Networks

Author: Guzmán Francisco
Joty Shafiq R.
Màrquez Lluís
Nakov Preslav
Publication venue
Publication date: 05/10/2017
Field of study

We present a framework for machine translation evaluation using neural networks in a pairwise setting, where the goal is to select the better translation from a pair of hypotheses, given the reference translation. In this framework, lexical, syntactic and semantic information from the reference and the two hypotheses is embedded into compact distributed vector representations, and fed into a multi-layer neural network that models nonlinear interactions between each of the hypotheses and the reference, as well as between the two hypotheses. We experiment with the benchmark datasets from the WMT Metrics shared task, on which we obtain the best results published so far, with the basic network configuration. We also perform a series of experiments to analyze and understand the contribution of the different components of the network. We evaluate variants and extensions, including fine-tuning of the semantic embeddings, and sentence-based representations modeled with convolutional and recurrent neural networks. In summary, the proposed framework is flexible and generalizable, allows for efficient learning and scoring, and provides an MT evaluation metric that correlates with human judgments, and is on par with the state of the art.Comment: Machine Translation, Reference-based MT Evaluation, Deep Neural Networks, Distributed Representation of Texts, Textual Similarit

arXiv.org e-Print Archive

Tensorized Embedding Layers for Efficient Model Compression

Author: Hrinchuk Oleksii
Khrulkov Valentin
Mirvakhabova Leyla
Orlova Elena
Oseledets Ivan
Publication venue
Publication date: 19/02/2020
Field of study

The embedding layers transforming input words into real vectors are the key components of deep neural networks used in natural language processing. However, when the vocabulary is large, the corresponding weight matrices can be enormous, which precludes their deployment in a limited resource setting. We introduce a novel way of parametrizing embedding layers based on the Tensor Train (TT) decomposition, which allows compressing the model significantly at the cost of a negligible drop or even a slight gain in performance. We evaluate our method on a wide range of benchmarks in natural language processing and analyze the trade-off between performance and compression ratios for a wide range of architectures, from MLPs to LSTMs and Transformers

arXiv.org e-Print Archive

Conditional Generators of Words Definitions

Author: Gadetsky Artyom
Vetrov Dmitry
Yakubovskiy Ilya
Publication venue
Publication date: 26/06/2018
Field of study

We explore recently introduced definition modeling technique that provided the tool for evaluation of different distributed vector representations of words through modeling dictionary definitions of words. In this work, we study the problem of word ambiguities in definition modeling and propose a possible solution by employing latent variable modeling and soft attention mechanisms. Our quantitative and qualitative evaluation and analysis of the model shows that taking into account words ambiguity and polysemy leads to performance improvement.Comment: Accepted as a conference paper at ACL 201

arXiv.org e-Print Archive

Synchronous Bidirectional Neural Machine Translation

Author: Zhang Jiajun
Zhou Long
Zong Chengqing
Publication venue
Publication date: 12/05/2019
Field of study

Existing approaches to neural machine translation (NMT) generate the target language sequence token by token from left to right. However, this kind of unidirectional decoding framework cannot make full use of the target-side future contexts which can be produced in a right-to-left decoding direction, and thus suffers from the issue of unbalanced outputs. In this paper, we introduce a synchronous bidirectional neural machine translation (SB-NMT) that predicts its outputs using left-to-right and right-to-left decoding simultaneously and interactively, in order to leverage both of the history and future information at the same time. Specifically, we first propose a new algorithm that enables synchronous bidirectional decoding in a single model. Then, we present an interactive decoding model in which left-to-right (right-to-left) generation does not only depend on its previously generated outputs, but also relies on future contexts predicted by right-to-left (left-to-right) decoding. We extensively evaluate the proposed SB-NMT model on large-scale NIST Chinese-English, WMT14 English-German, and WMT18 Russian-English translation tasks. Experimental results demonstrate that our model achieves significant improvements over the strong Transformer model by 3.92, 1.49 and 1.04 BLEU points respectively, and obtains the state-of-the-art performance on Chinese-English and English-German translation tasks.Comment: Published by TACL 2019, 15 pages, 9 figures, 9 tabel

arXiv.org e-Print Archive

Enhancing lexical-based approach with external knowledge for Vietnamese multiple-choice machine reading comprehension

Author: Luu Son T.
Nguyen Anh Gia-Tuan
Nguyen Ngan Luu-Thuy
Tran Khiem Vinh
Van Nguyen Kiet
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/11/2020
Field of study

Although Vietnamese is the 17th most popular native-speaker language in the world, there are not many research studies on Vietnamese machine reading comprehension (MRC), the task of understanding a text and answering questions about it. One of the reasons is because of the lack of high-quality benchmark datasets for this task. In this work, we construct a dataset which consists of 2,783 pairs of multiple-choice questions and answers based on 417 Vietnamese texts which are commonly used for teaching reading comprehension for elementary school pupils. In addition, we propose a lexical-based MRC method that utilizes semantic similarity measures and external knowledge sources to analyze questions and extract answers from the given text. We compare the performance of the proposed model with several baseline lexical-based and neural network-based models. Our proposed method achieves 61.81% by accuracy, which is 5.51% higher than the best baseline model. We also measure human performance on our dataset and find that there is a big gap between machine-model and human performances. This indicates that significant progress can be made on this task. The dataset is freely available on our website for research purposes

arXiv.org e-Print Archive

Analysis Methods in Neural Language Processing: A Survey

Author: Belinkov Yonatan
Glass James
Publication venue
Publication date: 14/01/2019
Field of study

The field of natural language processing has seen impressive progress in recent years, with neural network models replacing many of the traditional systems. A plethora of new models have been proposed, many of which are thought to be opaque compared to their feature-rich counterparts. This has led researchers to analyze, interpret, and evaluate neural networks in novel and more fine-grained ways. In this survey paper, we review analysis methods in neural language processing, categorize them according to prominent research trends, highlight existing limitations, and point to potential directions for future work.Comment: Version including the supplementary materials (3 tables), also available at https://boknilev.github.io/nlp-analysis-method

arXiv.org e-Print Archive