Search CORE

6 research outputs found

Cross-Lingual Semantic Role Labeling with High-Quality Translated Training Corpus

Author: Fei Hao
Ji Donghong
Zhang Meishan
Publication venue
Publication date: 01/01/2020
Field of study

Many efforts of research are devoted to semantic role labeling (SRL) which is crucial for natural language understanding. Supervised approaches have achieved impressing performances when large-scale corpora are available for resource-rich languages such as English. While for the low-resource languages with no annotated SRL dataset, it is still challenging to obtain competitive performances. Cross-lingual SRL is one promising way to address the problem, which has achieved great advances with the help of model transferring and annotation projection. In this paper, we propose a novel alternative based on corpus translation, constructing high-quality training datasets for the target languages from the source gold-standard SRL annotations. Experimental results on Universal Proposition Bank show that the translation-based method is highly effective, and the automatic pseudo datasets can improve the target-language SRL performances significantly.Comment: Accepted at ACL 202

arXiv.org e-Print Archive

Crossref

A Surface-Syntactic UD Treebank for Naija

Author: Caron Bernard
Courtin Marine
Gerdes Kim
Kahane Sylvain
Publication venue: HAL CCSD
Publication date: 28/08/2019
Field of study

International audienceThis paper presents a syntactic treebank for spoken Naija, an English pidgincreole, which is rapidly spreading across Nigeria. The syntactic annotation is developed in the Surface-Syntactic Universal Dependency annotation scheme (SUD) (Gerdes et al., 2018) and automatically converted into UD. We present the workflow of the treebank development for this under-resourced language. A crucial step in the syntactic analysis of a spoken language consists in manually adding a markup onto the transcription, indicating the segmentation into major syntactic units and their internal structure. We show that this so-called "macrosyntactic" markup improves parsing results. We also study some iconic syntactic phenomena that clearly distinguish Naija from English

Improving Surface-syntactic Universal Dependencies (SUD): surface-syntactic relations and deep syntactic features

Author: Gerdes Kim
Guillaume Bruno
Kahane Sylvain
Perrier Guy
Publication venue: HAL CCSD
Publication date: 26/08/2019
Field of study

International audienceSUD is an annotation scheme for syntactic dependency treebanks, near isomorphic to UD (Universal Dependencies). Contrary to UD, it is based on syntactic criteria (favoring functional heads) and the relations are defined on distributional and functional bases. In this paper, we will recall and specify the general principles underlying SUD, present the updated set of SUD relations, discuss the central question of MWEs, and introduce an orthogonal layer of deep-syntactic features converted from the deep-syntactic part of the UD scheme

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL Descartes

A Surface-Syntactic UD Treebank for Naija

Author: Caron Bernard
Courtin Marine
Gerdes Kim
Kahane Sylvain
Publication venue: HAL CCSD
Publication date: 01/01/2019
Field of study

HAL-CentraleSupelec

Crossref

INRIA a CCSD electronic archive server

Y a-t-il une relation entre la valence (pleine) et la synonymie?

Author: Banyś Wiesław
Publication venue: 'University of Silesia in Katowice'
Publication date: 01/01/2019
Field of study

The article is devoted to the analysis of the possible relationships between (full) valency and synonymy. We first present a very short overview of positions on valency, then we proceed to present the position of researchers who see a relationship between valency (full, i.e., not distinguishing arguments from adjuncts and treating them all together as arguments) and synonymy. The article shows that since a more frequent word would appear in more contexts than a less frequent word, the more frequent word would tend to have more meanings, and therefore it will have more synonyms, and being more polysemous it would result in a greater number of full valency frames of that word. It has been shown that the hypothesis has not been sufficiently precise, because it is the word, as a form, that can be considered polysemous, but it cannot itself have synonyms: it is only a particular meaning of this polysemous word that can have them. Therefore, the analyses could not be sufficiently subtle to identify any relationship, if any, between the two phenomena. On the other hand, the results of the analyses from this not sufficiently precise starting point did not demonstrate that there is a significant correlation, let alone dependency, between the two phenomena. Kendall coefficient, which measures the ordinal association, was estimated at 0.18 in the case of the material analysed (for the range –1/+1). It was pointed out at the end that it is not possible to draw from the fact that the differentiation between arguments and adjuncts is often subtle and sometimes difficult to make, the conclusion that there is no difference between them, that the problem in fact does not exist, and to refrain from searching for satisfactory elements and criteria for differentiation of the two categories or to apply in a consequent way those at our disposal, namely, semantic implication

Repozytorium Uniwersytetu Śląskiego RE-BUŚ