Search CORE

25 research outputs found

PoSTWITA-UD: an Italian Twitter Treebank in Universal Dependencies

Author: Alberto Lavelli
Bosco Cristina
Fabio Tamburini
Mazzei Alessandro
Sanguinetti Manuela
Publication venue: ELRA
Publication date: 01/01/2018
Field of study

Institutional Research Information System University of Turin

Paris and Stanford at EPE 2017: Downstream Evaluation of Graph-based Dependency Representations

Author: Candito Marie,
Manning Christopher,
Sagot Benoît
Schuster Sebastian
Seddah Djamé
Villemonte de La Clergerie Éric
Publication venue: HAL CCSD
Publication date: 20/09/2017
Field of study

International audienceWe describe the STANFORD-PARIS and PARIS-STANFORD submissions to the 2017 Extrinsic Parser Evaluation (EPE) Shared Task. The purpose of this shared task was to evaluate dependency graphs on three downstream tasks. Through our submissions, we evaluated the usability of several representations derived from English Universal Dependencies (UD), as well as the Stanford Dependencies (SD), Predicate Argument Structure (PAS), and DM representations. We further compared two parsing strategies: Directly parsing to graph-based dependency representations and a two-stage process of first parsing to surface syntax trees and then applying rule-based augmentations to obtain the final graphs. Overall, our systems performed very well and our submissions ranked first and third. In our analysis, we find that the two-stage parsing process leads to better downstream performance, and that enhanced UD, a graph-based representation, consistently outperforms basic UD, a strict surface syntax representation, suggesting an advantage of enriched representations for downstream tasks

INRIA a CCSD electronic archive server

Hal-Diderot

When Collaborative Treebank Curation Meets Graph Grammars: Arborator With a Grew Back-End

Author: Courtin Marine
Gerdes Kim
Guibon Gaël
Guillaume Bruno
Publication venue: HAL CCSD
Publication date: 11/05/2020
Field of study

International audienceIn this paper we present Arborator-Grew, a collaborative annotation tool for treebank development. Arborator-Grew combines the features of two preexisting tools: Arborator and Grew. Arborator is a widely used collaborative graphical online dependency treebank annotation tool. Grew is a tool for graph querying and rewriting specialized in structures needed in NLP, i.e. syntactic and semantic dependency trees and graphs. Grew also has an online version, Grew-match, where all Universal Dependencies treebanks in their classical, deep and surface-syntactic flavors can be queried. Arborator-Grew is a complete redevelopment and modernization of Arborator, replacing its own internal database storage by a new Grew API, which adds a powerful query tool to Arborator's existing treebank creation and correction features. This includes complex access control for parallel expert and crowd-sourced annotation, tree comparison visualization, and various exercise modes for teaching and training of annotators. Arborator-Grew opens up new paths of collectively creating, updating, maintaining, and curating syntactic treebanks and semantic graph banks

INRIA a CCSD electronic archive server

HAL Descartes

HAL-Rennes 1

MultiLexNorm: A Shared Task on Multilingual Lexical Normalization

Author: Baldwin T
Caselli T
Ljubešić N
Mahendra R
Muller B
Plank B
Ramponi A
Roncal ISV
Sidorenko W
van der Goot R
Workshop on Noisy User-Generated Text
Zubiaga A
Çetinoğlu Ö
Çolakoğlu T
Publication venue
Publication date: 01/01/2021
Field of study

Lexical normalization is the task of transforming an utterance into its standardized form. This task is beneficial for downstream analysis, as it provides a way to harmonize (often spontaneous) linguistic variation. Such variation is typical for social media on which information is shared in a multitude of ways, including diverse languages and code-switching. Since the seminal work of Han and Baldwin (2011) a decade ago, lexical normalization has attracted attention in English and multiple other languages. However, there exists a lack of a common benchmark for comparison of systems across languages with a homogeneous data and evaluation setup. The MULTILEXNORM shared task sets out to fill this gap. We provide the largest publicly available multilingual lexical normalization benchmark including 12 language variants. We propose a homogenized evaluation setup with both intrinsic and extrinsic evaluation. As extrinsic evaluation, we use dependency parsing and part-of-speech tagging with adapted evaluation metrics (a-LAS, a-UAS, and a-POS) to account for alignment discrepancies. The shared task hosted at W-NUT 2021 attracted 9 participants and 18 submissions. The results show that neural normalization systems outperform the previous state-of-the-art system by a large margin. Downstream parsing and part-of-speech tagging performance is positively affected but to varying degrees, with improvements of up to 1.72 a-LAS, 0.85 a-UAS, and 1.54 a-POS for the winning system

Queen Mary Research Online

MultiLexNorm: A Shared Task on Multilingual Lexical Normalization

Author: Baldwin Timothy
Caselli Tommaso
Ljubešic´ Nikola
Mahendra Rahmad
Muller Benjamin
Plank Barbara
Ramponi Alan
San Vicente Roncal Iñaki
Sidorenko Wladimir
van der Goot Rob
Zubiaga Arkaitz
Çetinoğlu Özlem
Çolakoglu Talha
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2021
Field of study

Lexical normalization is the task of transforming an utterance into its standardized form. This task is beneficial for downstream analysis, as it provides a way to harmonize (often spontaneous) linguistic variation. Such variation is typical for social media on which information is shared in a multitude of ways, including diverse languages and code-switching. Since the seminal work of Han and Baldwin (2011) a decade ago, lexical normalization has attracted attention in English and multiple other languages. However, there exists a lack of a common benchmark for comparison of systems across languages with a homogeneous data and evaluation setup. The MultiLexNorm shared task sets out to fill this gap. We provide the largest publicly available multilingual lexical normalization benchmark including 13 language variants. We propose a homogenized evaluation setup with both intrinsic and extrinsic evaluation. As extrinsic evaluation, we use dependency parsing and part-of-speech tagging with adapted evaluation metrics (a-LAS, a-UAS, and a-POS) to account for alignment discrepancies. The shared task hosted at W-NUT 2021 attracted 9 participants and 18 submissions. The results show that neural normalization systems outperform the previous state-of-the-art system by a large margin. Downstream parsing and part-of-speech tagging performance is positively affected but to varying degrees, with improvements of up to 1.72 a-LAS, 0.85 a-UAS, and 1.54 a-POS for the winning system

Proceedings - University of Groningen

University of Groningen

Archivio della ricerca - Fondazione Bruno Kessler

ARTS repository - University of Groningen

The IT University of Copenhagen's Repository

Dissertations of the University of Groningen

Proceedings of the Fifth Italian Conference on Computational Linguistics CLiC-it 2018 : 10-12 December 2018, Torino

Author: Alessandro Mazzei
Elena Cabrio
Fabio Tamburini
Publication venue: 'OpenEdition'
Publication date: 01/01/2018
Field of study

On behalf of the Program Committee, a very warm welcome to the Fifth Italian Conference on Computational Linguistics (CLiC-‐it 2018). This edition of the conference is held in Torino. The conference is locally organised by the University of Torino and hosted into its prestigious main lecture hall “Cavallerizza Reale”. The CLiC-‐it conference series is an initiative of the Italian Association for Computational Linguistics (AILC) which, after five years of activity, has clearly established itself as the premier national forum for research and development in the fields of Computational Linguistics and Natural Language Processing, where leading researchers and practitioners from academia and industry meet to share their research results, experiences, and challenges

Directory of Open Access Books (DOAB)

Semantic Role Labeling in Portuguese: Improving the State of the Art with Transfer Learning and BERT-based Models

Author: Ana Sofia Medeiros Oliveira
Publication venue
Publication date: 09/11/2020
Field of study

Repositório Aberto da Universidade do Porto