4,528 research outputs found

    Reflexive pronouns in Spanish Universal Dependencies

    Get PDF
    In this paper, we argue that in current Universal Dependencies treebanks, the annotation of Spanish reflexives is an unsolved problem, which clearly affects the accuracy and consistency of current parsers. We evaluate different proposals for fine-tuning the various categories, and discuss remaining open issues. We believe that the solution for these issues could lie in a multi-layered way of annotating the characteristics, combining annotation of the dependency relation and of the so-called token features, rather than in expanding the number of categories on one layer. We apply this proposal to the v2.5 Spanish UD AnCora treebank and provide a categorized conversion table that can be run with a Python script

    Rediscovering Greenberg's Word Order Universals in UD

    Get PDF
    International audienceThis paper discusses an empirical refoundation of selected Greenbergian word order univer-sals based on a data analysis of the Universal Dependencies project. The nature of the data we work on allows us to extract rich details for testing well-known typological universals and constitutes therefore a valuable basis for validating Greenberg's universals. Our results show that we can refine some Greenbergian universals in a more empirical and accurate way by means of a data-driven typological analysis

    Variation in Universal Dependencies annotation : A token-based typological case study on adpossessive constructions

    Get PDF
    In this paper we present a method for identifying and analyzing adnominal possessive constructions in 66 Universal Dependencies treebanks. We classify adpossessive constructions in terms of their morphological type (locus of marking) and present a workflow for detecting and analyzing them typologically. Based on a preliminary evaluation, the algorithm works fairly reliably in adpossessive constructions that are morphologically marked. However, it performs rather poorly in adpossessive constructions that are not marked morphologically, so-called zero-marked constructions, because of difficulties in identifying these constructions with the current annotation. We also discuss different types of variation in annotation in different treebanks for the same language and for treebanks of closely related languages. The research focuses on one well-circumscribed and universal construction in the hope of generating more interest in using UD for cross-linguistic comparison and for contributing towards developing yet more consistent annotation of constructions in the UD annotation scheme.Peer reviewe

    Toward the morpho-syntactic annotation of an Old English corpus with universal dependencies

    Get PDF
    [EN] The aim of this article is to take the first steps toward the compilation of a treebank of Old English compatible with the framework of Universal Dependencies (UD). Such a treebank will comprise morphological and syntactic annotation of Old English texts adequate for cross-linguistic comparison, diachronic analysis and natural language processing. The article, therefore, engages in four tasks: (i) identifying the Old English exponents of UD lexical categories; (ii) selecting the Old English exponents of UD morphological features; (iii) finding the areas of Old English morphology that require token indexing in the UD format; and (iv) checking on the relevance of the universal set of dependency relations. The data have been extracted from ParCorOEv2, an open access annotated parallel corpus Old English-English. The main conclusions are that the annotation format calls for two additional fields (gloss and morphological relatedness) and that enhanced dependencies are required in order to account for some syntactic phenomena.Martín Arista, J. (2022). Toward the morpho-syntactic annotation of an Old English corpus with universal dependencies. Revista de Lingüística y Lenguas Aplicadas. 17:85-97. https://doi.org/10.4995/rlyla.2022.16787OJS85971

    Harmony, Head Proximity, and the Near Parallels between Nominal and Clausal Linkers

    Get PDF
    This paper puts forward a notion of harmonic word order that leads to a new generalisation over the presence or absence of disharmony: specific functional heads must cross-linguistically obey this notion of harmony absolutely, while for other categories the presence of harmony is simply a tendency. The difference between the two classes is defined by semantics. This approach allows us both to draw certain parallels between restrictions on word order in nominals and in clauses, and furthermore to explain why other expected parallels should fail to be realised completely, specifically as regards differences in the distribution of relative clauses in the NP and complement clauses in the sentence. Syntactically independent relative clause markers and subordinating complementisers share a striking restriction as regards ordering: relative clause markers are always initial in postnominal relative clauses, and final in prenominal relative clauses (Andrews 1975; Downing 1978; Lehmann 1984; Keenan 1985; De Vries 2002, 2005); similarly, initial subordinating Cs only appear in postverbal complement clauses, while final subordinating Cs are only possible where the complement clause is preverbal (Bayer 1996, 1997, 1999; Kayne 2000). In this paper, I provide new evidence from eighty genetically and geographically diverse languages of a third category sharing precisely the same restriction: linkers in the complex NP. These are syntactically independent, semantically vacuous heads, serving to mark the presence of a relationship between a noun and any kind of phrasal dependent (Rubin 2002; Den Dikken and Singhapreecha 2004; Philip 2009). The class of linkers in the NP therefore includes the ezafe in Indo-Iranian, the associative marker -a in Bantu, as well as purely functional adpositions such as of in English. Like relative clause markers and subordinating Cs, the linker always intervenes linearly between the superordinate head (the noun) and the subordinate dependent. Crucially, relative clause markers, subordinating Cs, and linkers in the NP form a natural class: they are syntactically independent, semantically vacuous words serving purely to mark the presence of a relationship between head and dependent. Any member of this class is a ‘linker’. I propose a theory of disharmony whereby linearisation rules targeting heads with specified semantics can require such heads to appear in a prominent position, either initial or final, irrespective of the general headedness of the language. Linkers, being semantically vacuous, are of course impervious to such rules; they will therefore always conform to the harmonic, or optimal, word order. I propose a theory of harmony whereby the optimal word order is determined by the interaction of three independently motivated harmonic word order constraints: Head Proximity (adapted from Rijkhoff 1984, 1986, cf. Head-Final Filter, Williams 1982), the preference for uniformity in headedness (initial or final), and the preference for clausal dependents to appear in final position (Dryer 1980, 1992). Where the three constraints compete, it is always Head Proximity that takes precedence. I show that the distribution of all three types of linker is fully captured by this proposal. Moreover, this theory of ordering also accounts for another well observed near parallel between clauses and nominals, as well as its exceptions. This concerns a left-right asymmetry in the distribution of clausal dependents: while in OV languages complement clauses appear with near equal frequency in both preverbal and postverbal position, in VO languages they are found uniquely in postverbal position (Dryer 1980; Hawkins 1994; Dryer 2009); similarly, in OV languages relative clauses are distributed relatively evenly between prenominal and postnominal position, whereas in VO languages they are almost always postnominal, with very few exceptions (Mallinson & Blake 1981; Hawkins 1983, 1990; Lehmann 1984; Keenan 1985; Dryer 1992, 2007, 2008; De Vries 2005). The theory predicts these exceptions to be permitted only in languages that are rigidly N-final. Hawkins’ (1983) Noun Modifier Hierarchy suggests that this prediction is borne out; apparent exceptions (cf. Dryer 2008) are found underlyingly to be N-final

    Building an endangered language resource in the classroom: Universal dependencies for Kakataibo

    Get PDF
    In this paper, we launch a new Universal Dependencies treebank for an endangered language from Amazonia: Kakataibo, a Panoan language spoken in Peru. We first discuss the collaborative methodology implemented, which proved effective to create a treebank in the context of a Computational Linguistic course for undergraduates. Then, we describe the general details of the treebank and the language-specific considerations implemented for the proposed annotation. We finally conduct some experiments on part-of-speech tagging and syntactic dependency parsing. We focus on monolingual and transfer learning settings, where we study the impact of a Shipibo-Konibo treebank, another Panoan language resourc

    Cognitive processing, language typology, and variation

    Get PDF
    Linguistic typological preferences have often been linked to cognitive processing preferences but often without recourse to typologically relevant experiments on cognitive processing. This article reviews experimental work on the possible parallels between preferences in cognitive processing and language typology. I summarize the main theoretical accounts of the processing‐typology connection and show that typological distributions arise diachronically from preferred paths of language change, which may be affected by the degree to which alternative structures are preferred (e.g., easier) in acquisition or usage. The surveyed experimental evidence shows that considerable support exists for many linguistic universals to reflect preferences in cognitive processing. Artificial language learning experiments emerge as a promising method for researching the processing‐typology connection, as long as its limitations are taken into account. I further show that social and cultural differences in cognition may have an effect on typological distributions and that to account for this variation a multidisciplinary approach to the processing‐typology connection has to be developed. Lastly, since the body of experimental research does not adequately represent the linguistic diversity of the world's languages, it remains as an urgent task for the field to better account for this diversity in future work.Peer reviewe

    Los pronombres reflexivos en las Universal Dependencies en español: desde la anotación hacia el análisis morfosintáctico automático

    Get PDF
    In this follow-up article of Degraeuwe and Goethals (2020), we present the annotation scheme used to reannotate the 7298 potentially reflexive pronouns included in the Universal Dependencies Spanish AnCora v2.6 treebank, which resulted in significant modifications for the “Case” feature (100% changed) and dependency relations (87% changed). Next, we evaluate the performance of spaCy v3.2.2 and Stanza v1.3.0 (both trained on AnCora v2.8, and thus based on our reannotations) on the AnCora v2.8 test set, which yielded weighted F1 scores up to 0.88 and 0.98 for the “Case” and “Reflex” features, respectively, and up to 0.71 for the dependency relations. Finally, the error analysis of the spaCy results underlines the (generalisation) potential of the model, but also reveals some of the remaining issues in the automatic morphosyntactic analysis of reflexive pronouns in Spanish, such as determining if expletive relations denote an impersonal, passive or inherently reflexive use.En este artículo de seguimiento de Degraeuwe y Goethals (2020), presentamos el esquema de anotación utilizado para reanotar los 7298 pronombres potencialmente reflexivos incluidos en el Universal Dependencies Spanish AnCora v2.6 treebank, lo cual resultó en un significativo número de modificaciones para la característica (feature) de “Case” (el 100% cambiado) y las relaciones de dependencia (el 87% cambiado). A continuación, evaluamos el desempeño de spaCy v3.2.2 y Stanza v1.3.0 (ambos entrenados en AnCora v2.8, y, por tanto, basados en nuestras reanotaciones) en el set de prueba de AnCora v2.8, lo cual dio como resultado puntuaciones de F1 ponderado de hasta 0,88 y 0,98 para las características de “Case” y “Reflex”, respectivamente, y de hasta 0,71 para las relaciones de dependencia. Por último, el análisis de errores de los resultados de spaCy subraya el potencial (generalizador) del modelo, pero también desvela algunos de los problemas pendientes en el análisis morfosintáctico automático de los pronombres reflexivos en español, como por ejemplo determinar si las relaciones de dependencia expletivas son de carácter impersonal, pasivo o inherentemente reflexivo.This research has been carried out as part of a PhD fellowship on the IVESS project (file number 11D3921N), funded by the Research Foundation – Flanders (FWO)

    Dependencies in language: On the causal ontology of linguistic systems

    Get PDF
    Dependency is a fundamental concept in the analysis of linguistic systems. The many if-then statements offered in typology and grammar-writing imply a causally real notion of dependency that is central to the claim being made—usually with reference to widely varying timescales and types of processes. But despite the importance of the concept of dependency in our work, its nature is seldom defined or made explicit. This book brings together experts on language, representing descriptive linguistics, language typology, functional/cognitive linguistics, cognitive science, research on gesture and other semiotic systems, developmental psychology, psycholinguistics, and linguistic anthropology to address the following question: What kinds of dependencies exist among language-related systems, and how do we define and explain them in natural, causal terms
    corecore