Search CORE

171 research outputs found

Less is More? Towards a Reduced Inventory of Categories for Training a Parser for the Italian Stanford Dependencies

Author: Bosco Cristina
Maria Simi
Simonetta Montemagni
Publication venue: European Language Resources Association (ELRA)
Publication date: 01/01/2014
Field of study

Stanford Dependencies (SD) represent nowadays a de facto standard as far as dependency annotation is concerned. The goal of this paper is to explore pros and cons of different strategies for generating SD annotated Italian texts to enrich the existing Italian Stanford Dependency Treebank (ISDT). This is done by comparing the performance of a statistical parser (DeSR) trained on a simpler resource (the augmented version of the Merged Italian Dependency Treebank or MIDT+) and whose output was automatically converted to SD, with the results of the parser directly trained on ISDT. Experiments carried out to test reliability and effectiveness of the two strategies show that the performance of a parser trained on the reduced dependencies repertoire, whose output can be easily converted to SD, is slightly higher than the performance of a parser directly trained on ISDT. A non-negligible advantage of the first strategy for generating SD annotated texts is that semi-automatic extensions of the training resource are more easily and consistently carried out with respect to a reduced dependency tag set. Preliminary experiments carried out for generating the collapsed and propagated SD representation are also reported

Archivio della Ricerca - Università di Pisa

Institutional Research Information System University of Turin

Bootstrapping enhanced universal dependencies for Italian

Author: Montemagni Simonetta
Simi Maria
Publication venue: CEUR-WS
Publication date: 01/01/2018
Field of study

The paper presents an extension of the Italian Universal Dependencies Treebank with an "enhanced" representation level (e-IUDT), aimed at simplifying the information extraction process. The modules developed to semi-automatically build e-IUDT were delexicalized to perform cross-language enhancements: preliminary experiments in this direction led to promising results

Crossref

Archivio della Ricerca - Università di Pisa

Coordinate constructions in English enhanced universal dependencies: analysis and computational modeling

Author: Friedrich Annemarie
Grünewald Stefan
Piccirilli Prisca
Publication venue
Publication date: 01/01/2021
Field of study

In this paper, we address the representation of coordinate constructions in Enhanced Universal Dependencies (UD), where relevant dependency links are propagated from conjunction heads to other conjuncts. English treebanks for enhanced UD have been created from gold basic dependencies using a heuristic rule-based converter, which propagates only core arguments. With the aim of determining which set of links should be propagated from a semantic perspective, we create a large-scale dataset of manually edited syntax graphs. We identify several systematic errors in the original data, and propose to also propagate adjuncts. We observe high inter-annotator agreement for this semantic annotation task. Using our new manually verified dataset, we perform the first principled comparison of rule-based and (partially novel) machine-learning based methods for conjunction propagation for English. We show that learning propagation rules is more effective than hand-designing heuristic rules. When using automatic parses, our neural graph-parser based edge predictor outperforms the currently predominant pipelines using a basic-layer tree parser plus converters

arXiv.org e-Print Archive

OPUS Augsburg

Natural Language Processing Resources for Finnish. Corpus Development in the General and Clinical Domains

Author: Haverinen Katri
Publication venue: Turku Centre for Computer Science
Publication date: 04/09/2014
Field of study

Siirretty Doriast

UTUPub

Proceedings of the Second Workshop on Universal Dependencies (UDW 2018)

Author: Filip Ginter
Jenna Kanerva
Joakim Nivre
Maria Simi
Paola Marongiu
Sebastian Schuster
Simonetta Montemagni
Publication venue
Publication date: 28/10/2022
Field of study

We evaluate two cross-lingual techniques for adding enhanced dependencies to existing treebanks in Universal Dependencies. We apply a rule-based system developed for English and a data-driven system trained on Finnish to Swedish and Italian. We find that both systems are accurate enough to bootstrap enhanced dependencies in existing UD treebanks. In the case of Italian, results are even on par with those of a prototype language-specific system.</p

UTUPub

enhancing universal dependency treebanks a case study

Author: Filip Ginter
Jenna Kanerva
Joakim Nivre
Maria Simi
Paola Marongiu
Sebastian Schuster
Simonetta Montemagni
Publication venue
Publication date: 01/01/2018
Field of study

Crossref

Serveur académique lausannois

Archivio della Ricerca - Università di Pisa

Open Access Repository

RobertNLP at the IWPT 2020 shared task: surprisingly simple enhanced UD parsing for English

Author: Friedrich Annemarie
Grünewald Stefan
Publication venue
Publication date: 01/01/2020
Field of study

This paper presents our system at the IWPT 2020 Shared Task on Parsing into Enhanced Universal Dependencies. Using a biaffine classifier architecture (Dozat and Manning, 2017) which operates directly on finetuned RoBERTa embeddings, our parser generates enhanced UD graphs by predicting the best dependency label (or absence of a dependency) for each pair of tokens in the sentence. We address label sparsity issues by replacing lexical items in relations with placeholders at prediction time, later retrieving them from the parse in a rule-based fashion. In addition, we ensure structural graph constraints using a simple set of heuristics. On the English blind test data, our system achieves a very high parsing accuracy, ranking 1st out of 10 with an ELAS F1 score of 88.94%

OPUS Augsburg

Crossref

RobertNLP at the IWPT 2021 shared task: simple enhanced UD parsing for 17 languages

Author: Friedrich Annemarie
Grünewald Stefan
Oertel Frederik Tobias
Publication venue
Publication date: 01/01/2021
Field of study

This paper presents our multilingual dependency parsing system as used in the IWPT 2021 Shared Task on Parsing into Enhanced Universal Dependencies. Our system consists of an unfactorized biaffine classifier that operates directly on fine-tuned XLM-R embeddings and generates enhanced UD graphs by predicting the best dependency label (or absence of a dependency) for each pair of tokens. To avoid sparsity issues resulting from lexicalized dependency labels, we replace lexical items in relations with placeholders at training and prediction time, later retrieving them from the parse via a hybrid rule-based/machine-learning system. In addition, we utilize model ensembling at prediction time. Our system achieves high parsing accuracy on the blind test data, ranking 3rd out of 9 with an average ELAS F1 score of 86.97

OPUS Augsburg

From raw text to enhanced universal dependencies:The parsing shared task at IWPT 2021

Author: Bouma Gosse
Seddah Djame
Zeman Dan
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2021
Field of study

We describe the second IWPT task on end-to-end parsing from raw text to Enhanced Universal Dependencies. We provide details about the evaluation metrics and the datasets used for training and evaluation. We compare the approaches taken by participating teams and discuss the results of the shared task, also in comparison with the first edition of this task

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen