6 research outputs found
Cross-Lingual Semantic Role Labeling with High-Quality Translated Training Corpus
Many efforts of research are devoted to semantic role labeling (SRL) which is
crucial for natural language understanding. Supervised approaches have achieved
impressing performances when large-scale corpora are available for
resource-rich languages such as English. While for the low-resource languages
with no annotated SRL dataset, it is still challenging to obtain competitive
performances. Cross-lingual SRL is one promising way to address the problem,
which has achieved great advances with the help of model transferring and
annotation projection. In this paper, we propose a novel alternative based on
corpus translation, constructing high-quality training datasets for the target
languages from the source gold-standard SRL annotations. Experimental results
on Universal Proposition Bank show that the translation-based method is highly
effective, and the automatic pseudo datasets can improve the target-language
SRL performances significantly.Comment: Accepted at ACL 202
A Surface-Syntactic UD Treebank for Naija
International audienceThis paper presents a syntactic treebank for spoken Naija, an English pidgincreole, which is rapidly spreading across Nigeria. The syntactic annotation is developed in the Surface-Syntactic Universal Dependency annotation scheme (SUD) (Gerdes et al., 2018) and automatically converted into UD. We present the workflow of the treebank development for this under-resourced language. A crucial step in the syntactic analysis of a spoken language consists in manually adding a markup onto the transcription, indicating the segmentation into major syntactic units and their internal structure. We show that this so-called "macrosyntactic" markup improves parsing results. We also study some iconic syntactic phenomena that clearly distinguish Naija from English
Improving Surface-syntactic Universal Dependencies (SUD): surface-syntactic relations and deep syntactic features
International audienceSUD is an annotation scheme for syntactic dependency treebanks, near isomorphic to UD (Universal Dependencies). Contrary to UD, it is based on syntactic criteria (favoring functional heads) and the relations are defined on distributional and functional bases. In this paper, we will recall and specify the general principles underlying SUD, present the updated set of SUD relations, discuss the central question of MWEs, and introduce an orthogonal layer of deep-syntactic features converted from the deep-syntactic part of the UD scheme
A Surface-Syntactic UD Treebank for Naija
International audienceThis paper presents a syntactic treebank for spoken Naija, an English pidgincreole, which is rapidly spreading across Nigeria. The syntactic annotation is developed in the Surface-Syntactic Universal Dependency annotation scheme (SUD) (Gerdes et al., 2018) and automatically converted into UD. We present the workflow of the treebank development for this under-resourced language. A crucial step in the syntactic analysis of a spoken language consists in manually adding a markup onto the transcription, indicating the segmentation into major syntactic units and their internal structure. We show that this so-called "macrosyntactic" markup improves parsing results. We also study some iconic syntactic phenomena that clearly distinguish Naija from English
Y a-t-il une relation entre la valence (pleine) et la synonymie?
The article is devoted to the analysis of the possible relationships between (full) valency and
synonymy. We first present a very short overview of positions on valency, then we proceed to present
the position of researchers who see a relationship between valency (full, i.e., not distinguishing
arguments from adjuncts and treating them all together as arguments) and synonymy. The article
shows that since a more frequent word would appear in more contexts than a less frequent word, the
more frequent word would tend to have more meanings, and therefore it will have more synonyms,
and being more polysemous it would result in a greater number of full valency frames of that word.
It has been shown that the hypothesis has not been sufficiently precise, because it is the word,
as a form, that can be considered polysemous, but it cannot itself have synonyms: it is only a particular
meaning of this polysemous word that can have them. Therefore, the analyses could not be
sufficiently subtle to identify any relationship, if any, between the two phenomena. On the other
hand, the results of the analyses from this not sufficiently precise starting point did not demonstrate
that there is a significant correlation, let alone dependency, between the two phenomena. Kendall
coefficient, which measures the ordinal association, was estimated at 0.18 in the case of the material
analysed (for the range –1/+1).
It was pointed out at the end that it is not possible to draw from the fact that the differentiation
between arguments and adjuncts is often subtle and sometimes difficult to make, the conclusion
that there is no difference between them, that the problem in fact does not exist, and to refrain from
searching for satisfactory elements and criteria for differentiation of the two categories or to apply
in a consequent way those at our disposal, namely, semantic implication