2,872 research outputs found

    Toward the morpho-syntactic annotation of an Old English corpus with universal dependencies

    Get PDF
    [EN] The aim of this article is to take the first steps toward the compilation of a treebank of Old English compatible with the framework of Universal Dependencies (UD). Such a treebank will comprise morphological and syntactic annotation of Old English texts adequate for cross-linguistic comparison, diachronic analysis and natural language processing. The article, therefore, engages in four tasks: (i) identifying the Old English exponents of UD lexical categories; (ii) selecting the Old English exponents of UD morphological features; (iii) finding the areas of Old English morphology that require token indexing in the UD format; and (iv) checking on the relevance of the universal set of dependency relations. The data have been extracted from ParCorOEv2, an open access annotated parallel corpus Old English-English. The main conclusions are that the annotation format calls for two additional fields (gloss and morphological relatedness) and that enhanced dependencies are required in order to account for some syntactic phenomena.Martín Arista, J. (2022). Toward the morpho-syntactic annotation of an Old English corpus with universal dependencies. Revista de Lingüística y Lenguas Aplicadas. 17:85-97. https://doi.org/10.4995/rlyla.2022.16787OJS85971

    Attempto - From Specifications in Controlled Natural Language towards Executable Specifications

    Full text link
    Deriving formal specifications from informal requirements is difficult since one has to take into account the disparate conceptual worlds of the application domain and of software development. To bridge the conceptual gap we propose controlled natural language as a textual view on formal specifications in logic. The specification language Attempto Controlled English (ACE) is a subset of natural language that can be accurately and efficiently processed by a computer, but is expressive enough to allow natural usage. The Attempto system translates specifications in ACE into discourse representation structures and into Prolog. The resulting knowledge base can be queried in ACE for verification, and it can be executed for simulation, prototyping and validation of the specification.Comment: 15 pages, compressed, uuencoded Postscript, to be presented at EMISA Workshop 'Naturlichsprachlicher Entwurf von Informationssystemen - Grundlagen, Methoden, Werkzeuge, Anwendungen', May 28-30, 1996, Ev. Akademie Tutzin

    Usage-based and emergentist approaches to language acquisition

    Get PDF
    It was long considered to be impossible to learn grammar based on linguistic experience alone. In the past decade, however, advances in usage-based linguistic theory, computational linguistics, and developmental psychology changed the view on this matter. So-called usage-based and emergentist approaches to language acquisition state that language can be learned from language use itself, by means of social skills like joint attention, and by means of powerful generalization mechanisms. This paper first summarizes the assumptions regarding the nature of linguistic representations and processing. Usage-based theories are nonmodular and nonreductionist, i.e., they emphasize the form-function relationships, and deal with all of language, not just selected levels of representations. Furthermore, storage and processing is considered to be analytic as well as holistic, such that there is a continuum between children's unanalyzed chunks and abstract units found in adult language. In the second part, the empirical evidence is reviewed. Children's linguistic competence is shown to be limited initially, and it is demonstrated how children can generalize knowledge based on direct and indirect positive evidence. It is argued that with these general learning mechanisms, the usage-based paradigm can be extended to multilingual language situations and to language acquisition under special circumstances

    Content Differences in Syntactic and Semantic Representations

    Full text link
    Syntactic analysis plays an important role in semantic parsing, but the nature of this role remains a topic of ongoing debate. The debate has been constrained by the scarcity of empirical comparative studies between syntactic and semantic schemes, which hinders the development of parsing methods informed by the details of target schemes and constructions. We target this gap, and take Universal Dependencies (UD) and UCCA as a test case. After abstracting away from differences of convention or formalism, we find that most content divergences can be ascribed to: (1) UCCA's distinction between a Scene and a non-Scene; (2) UCCA's distinction between primary relations, secondary ones and participants; (3) different treatment of multi-word expressions, and (4) different treatment of inter-clause linkage. We further discuss the long tail of cases where the two schemes take markedly different approaches. Finally, we show that the proposed comparison methodology can be used for fine-grained evaluation of UCCA parsing, highlighting both challenges and potential sources for improvement. The substantial differences between the schemes suggest that semantic parsers are likely to benefit downstream text understanding applications beyond their syntactic counterparts.Comment: NAACL-HLT 2019 camera read

    Statistical parsing of morphologically rich languages (SPMRL): what, how and whither

    Get PDF
    The term Morphologically Rich Languages (MRLs) refers to languages in which significant information concerning syntactic units and relations is expressed at word-level. There is ample evidence that the application of readily available statistical parsing models to such languages is susceptible to serious performance degradation. The first workshop on statistical parsing of MRLs hosts a variety of contributions which show that despite language-specific idiosyncrasies, the problems associated with parsing MRLs cut across languages and parsing frameworks. In this paper we review the current state-of-affairs with respect to parsing MRLs and point out central challenges. We synthesize the contributions of researchers working on parsing Arabic, Basque, French, German, Hebrew, Hindi and Korean to point out shared solutions across languages. The overarching analysis suggests itself as a source of directions for future investigations

    A novel dependency-based evaluation metric for machine translation

    Get PDF
    Automatic evaluation measures such as BLEU (Papineni et al. (2002)) and NIST (Doddington (2002)) are indispensable in the development of Machine Translation (MT) systems, because they allow MT developers to conduct frequent, fast, and cost-effective evaluations of their evolving translation models. However, most of the automatic evaluation metrics rely on a comparison of word strings, measuring only the surface similarity of the candidate and reference translations, and will penalize any divergence. In effect,a candidate translation expressing the source meaning accurately and fluently will be given a low score if the lexical and syntactic choices it contains, even though perfectly legitimate, are not present in at least one of the references. Necessarily, this score would differ from a much more favourable human judgment that such a translation would receive. This thesis presents a method that automatically evaluates the quality of translation based on the labelled dependency structure of the sentence, rather than on its surface form. Dependencies abstract away from the some of the particulars of the surface string realization and provide a more "normalized" representation of (some) syntactic variants of a given sentence. The translation and reference files are analyzed by a treebank-based, probabilistic Lexical-Functional Grammar (LFG) parser (Cahill et al. (2004)) for English, which produces a set of dependency triples for each input. The translation set is compared to the reference set, and the number of matches is calculated, giving the precision, recall, and f-score for that particular translation. The use of WordNet synonyms and partial matching during the evaluation process allows for adequate treatment of lexical variation, while employing a number of best parses helps neutralize the noise introduced during the parsing stage. The dependency-based method is compared against a number of other popular MT evaluation metrics, including BLEU, NIST, GTM (Turian et al. (2003)), TER (Snover et al. (2006)), and METEOR (Banerjee and Lavie (2005)), in terms of segment- and system-level correlations with human judgments of fluency and adequacy. We also examine whether it shows bias towards statistical MT models. The comparison of the dependency-based method with other evaluation metrics is then extended to languages other than English: French, German, Spanish, and Japanese, where we apply our method to dependencies generated by Microsoft's NLPWin analyzer (Corston-Oliver and Dolan (1999); Heidorn (2000)) as well as, in the case of the Spanish data, those produced by the treebank-based, probabilistic LFG parser of Chrupa la and van Genabith (2006a,b)
    corecore