6 research outputs found

    Cross-Lingual Semantic Role Labeling with High-Quality Translated Training Corpus

    Full text link
    Many efforts of research are devoted to semantic role labeling (SRL) which is crucial for natural language understanding. Supervised approaches have achieved impressing performances when large-scale corpora are available for resource-rich languages such as English. While for the low-resource languages with no annotated SRL dataset, it is still challenging to obtain competitive performances. Cross-lingual SRL is one promising way to address the problem, which has achieved great advances with the help of model transferring and annotation projection. In this paper, we propose a novel alternative based on corpus translation, constructing high-quality training datasets for the target languages from the source gold-standard SRL annotations. Experimental results on Universal Proposition Bank show that the translation-based method is highly effective, and the automatic pseudo datasets can improve the target-language SRL performances significantly.Comment: Accepted at ACL 202

    A Surface-Syntactic UD Treebank for Naija

    Get PDF
    International audienceThis paper presents a syntactic treebank for spoken Naija, an English pidgincreole, which is rapidly spreading across Nigeria. The syntactic annotation is developed in the Surface-Syntactic Universal Dependency annotation scheme (SUD) (Gerdes et al., 2018) and automatically converted into UD. We present the workflow of the treebank development for this under-resourced language. A crucial step in the syntactic analysis of a spoken language consists in manually adding a markup onto the transcription, indicating the segmentation into major syntactic units and their internal structure. We show that this so-called "macrosyntactic" markup improves parsing results. We also study some iconic syntactic phenomena that clearly distinguish Naija from English

    Improving Surface-syntactic Universal Dependencies (SUD): surface-syntactic relations and deep syntactic features

    Get PDF
    International audienceSUD is an annotation scheme for syntactic dependency treebanks, near isomorphic to UD (Universal Dependencies). Contrary to UD, it is based on syntactic criteria (favoring functional heads) and the relations are defined on distributional and functional bases. In this paper, we will recall and specify the general principles underlying SUD, present the updated set of SUD relations, discuss the central question of MWEs, and introduce an orthogonal layer of deep-syntactic features converted from the deep-syntactic part of the UD scheme

    A Surface-Syntactic UD Treebank for Naija

    Get PDF
    International audienceThis paper presents a syntactic treebank for spoken Naija, an English pidgincreole, which is rapidly spreading across Nigeria. The syntactic annotation is developed in the Surface-Syntactic Universal Dependency annotation scheme (SUD) (Gerdes et al., 2018) and automatically converted into UD. We present the workflow of the treebank development for this under-resourced language. A crucial step in the syntactic analysis of a spoken language consists in manually adding a markup onto the transcription, indicating the segmentation into major syntactic units and their internal structure. We show that this so-called "macrosyntactic" markup improves parsing results. We also study some iconic syntactic phenomena that clearly distinguish Naija from English

    Y a-t-il une relation entre la valence (pleine) et la synonymie?

    Get PDF
    The article is devoted to the analysis of the possible relationships between (full) valency and synonymy. We first present a very short overview of positions on valency, then we proceed to present the position of researchers who see a relationship between valency (full, i.e., not distinguishing arguments from adjuncts and treating them all together as arguments) and synonymy. The article shows that since a more frequent word would appear in more contexts than a less frequent word, the more frequent word would tend to have more meanings, and therefore it will have more synonyms, and being more polysemous it would result in a greater number of full valency frames of that word. It has been shown that the hypothesis has not been sufficiently precise, because it is the word, as a form, that can be considered polysemous, but it cannot itself have synonyms: it is only a particular meaning of this polysemous word that can have them. Therefore, the analyses could not be sufficiently subtle to identify any relationship, if any, between the two phenomena. On the other hand, the results of the analyses from this not sufficiently precise starting point did not demonstrate that there is a significant correlation, let alone dependency, between the two phenomena. Kendall coefficient, which measures the ordinal association, was estimated at 0.18 in the case of the material analysed (for the range –1/+1). It was pointed out at the end that it is not possible to draw from the fact that the differentiation between arguments and adjuncts is often subtle and sometimes difficult to make, the conclusion that there is no difference between them, that the problem in fact does not exist, and to refrain from searching for satisfactory elements and criteria for differentiation of the two categories or to apply in a consequent way those at our disposal, namely, semantic implication
    corecore