Search CORE

6 research outputs found

Cross-Lingual Zero Pronoun Resolution

Author: Aloraini A
Poesio M
Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020)
Publication venue: ELRA and the Association for Computational Linguistics
Publication date: 31/05/2020
Field of study

In languages like Arabic, Chinese, Italian, Japanese, Korean, Portuguese, Spanish, and many others, predicate arguments in certainsyntactic positions are not realized instead of being realized as overt pronouns, and are thus called zero- or null-pronouns. Identifyingand resolving such omitted arguments is crucial to machine translation, information extraction and other NLP tasks, but depends heavilyonsemanticcoherenceandlexicalrelationships. WeproposeaBERT-basedcross-lingualmodelforzeropronounresolution,andevaluateit on the Arabic and Chinese portions of OntoNotes 5.0. As far as we know, ours is the first neural model of zero-pronoun resolutionfor Arabic; and our model also outperforms the state-of-the-art for Chinese. In the paper we also evaluate BERT feature extraction andfine-tune models on the task, and compare them with our model. We also report on an investigation of BERT layers indicating whichlayer encodes the most suitable representation for the task. Our code is available at https://github.com/amaloraini/cross-lingual-Z

Queen Mary Research Online

A Cluster Ranking Model for Full Anaphora Resolution

Author: Poesio M
Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020)
Uma A
Yu J
Publication venue: European Language Resources Association (ELRA)
Publication date: 31/05/2020
Field of study

Anaphora resolution (coreference) systems designed for theCONLL2012 dataset typically cannot handle key aspects of the full anaphoraresolution task such as the identification of singletons and of certain types of non-referring expressions (e.g., expletives), as these aspectsare not annotated in that corpus. However, the recently releasedCRAC2018 Shared Task and Phrase Detectives (PD) datasets can nowbe used for that purpose. In this paper, we introduce an architecture to simultaneously identify non-referring expressions (includingexpletives, predicativeNPs, and other types) and build coreference chains, including singletons. Our cluster-ranking system uses anattention mechanism to determine the relative importance of the mentions in the same cluster. Additional classifiers are used to identifysingletons and non-referring markables. Our contributions are as follows. First of all, we report the first result on theCRACdata usingsystem mentions; our result is 5.8% better than the shared task baseline system, which used gold mentions. Our system also outperformsthe best-reported system onPDby up to 5.3%. Second, we demonstrate that the availability of singleton clusters and non-referringexpressions can lead to substantially improved performance on non-singleton clusters as well. Third, we show that despite our model notbeing designed specifically for theCONLLdata, it achieves a very competitive result

Queen Mary Research Online

Neural Mention Detection

Author: Bohnet B
Poesio M
Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020)
Yu J
Publication venue: ELRA and the Association for Computational Linguistics
Publication date: 31/05/2020
Field of study

Mention detection is an important preprocessing step for annotation and interpretation in applications such as NER and coreference resolution, but few stand-alone neural models have been proposed able to handle the full range of mentions. In this work, we propose and compare three neural network-based approaches to mention detection. The first approach is based on the mention detection part of a state of the art coreference resolution system; the second uses ELMO embeddings together with a bidirectional LSTM and a biaffine classifier; the third approach uses the recently introduced BERT model. Our best model (using a biaffine classifier) achieves gains of up to 1.8 percentage points on mention recall when compared with a strong baseline in a HIGH RECALL coreference annotation setting. The same model achieves improvements of up to 5.3 and 6.2 p.p. when compared with the best-reported mention detection F1 on the CONLL and CRAC coreference data sets respectively in a HIGH F1 annotation setting. We then evaluate our models for coreference resolution by using mentions predicted by our best model in start-of-the-art coreference systems. The enhanced model achieved absolute improvements of up to 1.7 and 0.7 p.p. when compared with our strong baseline systems (pipeline system and end-to-end system) respectively. For nested NER, the evaluation of our model on the GENIA corpora shows that our model matches or outperforms state-of-the-art models despite not being specifically designed for this task

Queen Mary Research Online

Alector: A Parallel Corpus of Simplified French Texts with Alignments of Misreadings by Poor and Dyslexic Readers

Author: François Thomas
Gala Nuria
Javourey-Drevet Ludivine
LREC 2020
Tack Anaïs
Ziegler Johannes
Publication venue: HAL CCSD
Publication date: 01/01/2020
Field of study

International audienceIn this paper, we present a new parallel corpus addressed to researchers, teachers, and speech therapists interested in text simplification as a means of alleviating difficulties in children learning to read. The corpus is composed of excerpts drawn from 79 authentic literary (tales, stories) and scientific (documentary) texts commonly used in French schools for children aged between 7 to 9 years old. The excerpts were manually simplified at the lexical, morpho-syntactic, and discourse levels in order to propose a parallel corpus for reading tests and for the development of automatic text simplification tools. A sample of 21 poor-reading and dyslexic children with an average reading delay of 2.5 years read a portion of the corpus. The transcripts of readings errors were integrated into the corpus with the goal of identifying lexical difficulty in the target population. By means of statistical testing, we provide evidence that the manual simplifications significantly reduced reading errors, highlighting that the words targeted for simplification were not only well-chosen but also substituted with substantially easier alternatives. The entire corpus is available for consultation through a web interface and available on demand for research purposes

HAL AMU

DIAL UCLouvain

Alector: A Parallel Corpus of Simplified French Texts with Alignments of Misreadings by Poor and Dyslexic Readers

Author: François Thomas
Gala Nuria
Javourey-Drevet Ludivine
LREC 2020
Tack Anaïs
Ziegler Johannes
Publication venue: European Language Resources Association
Publication date: 01/01/2020
Field of study

In this paper, we present a new parallel corpus addressed to researchers, teachers, and speech therapists interested in text simplification as a means of alleviating difficulties in children learning to read. The corpus is composed of excerpts drawn from 79 authentic literary (tales, stories) and scientific (documentary) texts commonly used in French schools for children aged between 7 to 9 years old. The excerpts were manually simplified at the lexical, morpho-syntactic, and discourse levels in order to propose a parallel corpus for reading tests and for the development of automatic text simplification tools. A sample of 21 poor-reading and dyslexic children with an average reading delay of 2.5 years read a portion of the corpus. The transcripts of readings errors were integrated into the corpus with thegoal of identifying lexical difficulty in the target population. By means of statistical testing, we provide evidence that the manual simplifications significantly reduced reading errors, highlighting that the words targeted for simplification were not only well-chosen but also substituted with substantially easier alternatives. The entire corpus is available for consultation through a web interface andavailable on demand for research purposes

DIAL UCLouvain

Universal Dependencies v2 : An Evergrowing Multilingual Treebank Collection

Author: Christopher D. Manning
Daniel Zeman
de Marneffe Marie-Catherine
Filip Ginter
Francis Tyers
Jan Hajič
Joakim Nivre
Proceedings of 12th International Conference on Language Resources and Evaluation (LREC 2020).
Sampo Pyysalo
Sebastian Schuster
Publication venue: Charles Univ Prague, Prague, Czech Republic.
Publication date: 01/01/2020
Field of study

Universal Dependencies is an open community effort to create cross-linguistically consistent treebank annotation for many languages within a dependency-based lexicalist framework. The annotation consists in a linguistically motivated word segmentation; a morphological layer comprising lemmas, universal part-of-speech tags, and standardized morphological features; and a syntactic layer focusing on syntactic relations between predicates, arguments and modifiers. In this paper, we describe version 2 of the guidelines (UD v2), discuss the major changes from UD v1 to UD v2, and give an overview of the currently available treebanks for 90 languages

arXiv.org e-Print Archive

Publikationer från Uppsala Universitet

Digitala Vetenskapliga Arkivet - Academic Archive On-line

DIAL UCLouvain