3,703 research outputs found

    A Framework for Understanding the Role of Morphology in Universal Dependency Parsing

    Get PDF
    International audienceThis paper presents a simple framework forcharacterizing morphological complexity andhow it encodes syntactic information. In particular,we propose a new measure of morphosyntacticcomplexity in terms of governordependentpreferential attachment that explainsparsing performance. Through experimentson dependency parsing with datafrom Universal Dependencies (UD), we showthat representations derived from morphologicalattributes deliver important parsing performanceimprovements over standard wordform embeddings when trained on the samedatasets. We also show that the new morphosyntacticcomplexity measure is predictive ofthe gains provided by using morphological attributesover plain forms on parsing scores,making it a tool to distinguish languages usingmorphology as a syntactic marker from others

    Statistical parsing of morphologically rich languages (SPMRL): what, how and whither

    Get PDF
    The term Morphologically Rich Languages (MRLs) refers to languages in which significant information concerning syntactic units and relations is expressed at word-level. There is ample evidence that the application of readily available statistical parsing models to such languages is susceptible to serious performance degradation. The first workshop on statistical parsing of MRLs hosts a variety of contributions which show that despite language-specific idiosyncrasies, the problems associated with parsing MRLs cut across languages and parsing frameworks. In this paper we review the current state-of-affairs with respect to parsing MRLs and point out central challenges. We synthesize the contributions of researchers working on parsing Arabic, Basque, French, German, Hebrew, Hindi and Korean to point out shared solutions across languages. The overarching analysis suggests itself as a source of directions for future investigations

    Cross-Lingual Semantic Role Labeling with High-Quality Translated Training Corpus

    Full text link
    Many efforts of research are devoted to semantic role labeling (SRL) which is crucial for natural language understanding. Supervised approaches have achieved impressing performances when large-scale corpora are available for resource-rich languages such as English. While for the low-resource languages with no annotated SRL dataset, it is still challenging to obtain competitive performances. Cross-lingual SRL is one promising way to address the problem, which has achieved great advances with the help of model transferring and annotation projection. In this paper, we propose a novel alternative based on corpus translation, constructing high-quality training datasets for the target languages from the source gold-standard SRL annotations. Experimental results on Universal Proposition Bank show that the translation-based method is highly effective, and the automatic pseudo datasets can improve the target-language SRL performances significantly.Comment: Accepted at ACL 202

    Implementing universal dependency, morphology, and multiword expression annotation standards for Turkish language processing

    Get PDF
    Released only a year ago as the outputs of a research project (“Parsing Web 2.0 Sentences”, supported in part by a TUBİTAK 1001 grant (No. 112E276) and a part of the ICT COST Action PARSEME (IC1207)), IMST and IWT are currently the most comprehensive Turkish dependency treebanks in the literature. This article introduces the final states of our treebanks, as well as a newly integrated hierarchical categorization of the multiheaded dependencies and their organization in an exclusive deep dependency layer in the treebanks. It also presents the adaptation of recent studies on standardizing multiword expression and named entity annotation schemes for the Turkish language and integration of benchmark annotations into the dependency layers of our treebanks and the mapping of the treebanks to the latest Universal Dependencies (v2.0) standard, ensuring further compliance with rising universal annotation trends. In addition to significantly boosting the universal recognition of Turkish treebanks, our recent efforts have shown an improvement in their syntactic parsing performance (up to 77.8%/82.8% LAS and 84.0%/87.9% UAS for IMST/IWT, respectively). The final states of the treebanks are expected to be more suited to different natural language processing tasks, such as named entity recognition, multiword expression detection, transfer-based machine translation, semantic parsing, and semantic role labeling.Peer reviewe

    Morphological features of the Irish universal dependency treebank

    Get PDF
    The Universal Dependencies Project1 (Nivre, [9]; Nivre et al., [10]) is an ongoing effort towards creating a set of harmonised dependency treebanks that are annotated and structured according to universal guidelines. This paper reports on the addition of morphological features to the Irish Universal Dependencies Treebank (IUDT). Our feature set subscribes to the feature inventory of the UD Project and has been mapped from Irish morpho-syntactic tags – the output of a Finite State Morphological Analyser for Irish (Uí Dhonnchadha and van Genabith [16]). Irish, a Celtic language, has some relatively unusual morphological features that require language-specific labels not covered by the universal feature set. In this paper, we summarise the Irish-specific features that we have added to this set by explaining the linguistic properties that they each describe. We also report on the first parsing experiments using the IUDT by assessing the effect that the inclusion of morphological features has on parsing accuracy
    corecore