13,057 research outputs found

    Modelling the formation of phonotactic restrictions across the mental lexicon

    Get PDF
    Experimental data shows that adult learners of an artificial language with a phonotactic restriction learned this restriction better when being trained on word types (e.g. when they were presented with 80 different words twice each) than when being trained on word tokens (e.g. when presented with 40 different words four times each) (Hamann & Ernestus submitted). These findings support Pierrehumbert’s (2003) observation that phonotactic co-occurrence restrictions are formed across lexical entries, since only lexical levels of representation can be sensitive to type frequencies

    Learning OT constraint rankings using a maximum entropy model

    Get PDF
    Abstract. A weakness of standard Optimality Theory is its inability to account for grammar

    A Machine learning approach to POS tagging

    Get PDF
    We have applied inductive learning of statistical decision trees and relaxation labelling to the Natural Language Processing (NLP) task of morphosyntactic disambiguation (Part Of Speech Tagging). The learning process is supervised and obtains a language model oriented to resolve POS ambiguities. This model consists of a set of statistical decision trees expressing distribution of tags and words in some relevant contexts. The acquired language models are complete enough to be directly used as sets of POS disambiguation rules, and include more complex contextual information than simple collections of n-grams usually used in statistical taggers. We have implemented a quite simple and fast tagger that has been tested and evaluated on the Wall Street Journal (WSJ) corpus with a remarkable accuracy. However, better results can be obtained by translating the trees into rules to feed a flexible relaxation labelling based tagger. In this direction we describe a tagger which is able to use information of any kind (n-grams, automatically acquired constraints, linguistically motivated manually written constraints, etc.), and in particular to incorporate the machine learned decision trees. Simultaneously, we address the problem of tagging when only small training material is available, which is crucial in any process of constructing, from scratch, an annotated corpus. We show that quite high accuracy can be achieved with our system in this situation.Postprint (published version

    Wide-coverage deep statistical parsing using automatic dependency structure annotation

    Get PDF
    A number of researchers (Lin 1995; Carroll, Briscoe, and Sanfilippo 1998; Carroll et al. 2002; Clark and Hockenmaier 2002; King et al. 2003; Preiss 2003; Kaplan et al. 2004;Miyao and Tsujii 2004) have convincingly argued for the use of dependency (rather than CFG-tree) representations for parser evaluation. Preiss (2003) and Kaplan et al. (2004) conducted a number of experiments comparing “deep” hand-crafted wide-coverage with “shallow” treebank- and machine-learning based parsers at the level of dependencies, using simple and automatic methods to convert tree output generated by the shallow parsers into dependencies. In this article, we revisit the experiments in Preiss (2003) and Kaplan et al. (2004), this time using the sophisticated automatic LFG f-structure annotation methodologies of Cahill et al. (2002b, 2004) and Burke (2006), with surprising results. We compare various PCFG and history-based parsers (based on Collins, 1999; Charniak, 2000; Bikel, 2002) to find a baseline parsing system that fits best into our automatic dependency structure annotation technique. This combined system of syntactic parser and dependency structure annotation is compared to two hand-crafted, deep constraint-based parsers (Carroll and Briscoe 2002; Riezler et al. 2002). We evaluate using dependency-based gold standards (DCU 105, PARC 700, CBS 500 and dependencies for WSJ Section 22) and use the Approximate Randomization Test (Noreen 1989) to test the statistical significance of the results. Our experiments show that machine-learning-based shallow grammars augmented with sophisticated automatic dependency annotation technology outperform hand-crafted, deep, widecoverage constraint grammars. Currently our best system achieves an f-score of 82.73% against the PARC 700 Dependency Bank (King et al. 2003), a statistically significant improvement of 2.18%over the most recent results of 80.55%for the hand-crafted LFG grammar and XLE parsing system of Riezler et al. (2002), and an f-score of 80.23% against the CBS 500 Dependency Bank (Carroll, Briscoe, and Sanfilippo 1998), a statistically significant 3.66% improvement over the 76.57% achieved by the hand-crafted RASP grammar and parsing system of Carroll and Briscoe (2002)

    Treebank-based acquisition of LFG parsing resources for French

    Get PDF
    Motivated by the expense in time and other resources to produce hand-crafted grammars, there has been increased interest in automatically obtained wide-coverage grammars from treebanks for natural language processing. In particular, recent years have seen the growth in interest in automatically obtained deep resources that can represent information absent from simple CFG-type structured treebanks and which are considered to produce more language-neutral linguistic representations, such as dependency syntactic trees. As is often the case in early pioneering work on natural language processing, English has provided the focus of first efforts towards acquiring deep-grammar resources, followed by successful treatments of, for example, German, Japanese, Chinese and Spanish. However, no comparable large-scale automatically acquired deep-grammar resources have been obtained for French to date. The goal of this paper is to present the application of treebank-based language acquisition to the case of French. We show that with modest changes to the established parsing architectures, encouraging results can be obtained for French, with a best dependency structure f-score of 86.73%

    Semantics as a gateway to language

    Get PDF
    This paper presents an account of semantics as a system that integrates conceptual representations into language. I define the semantic system as an interface level of the conceptual system CS that translates conceptual representations into a format that is accessible by language. The analysis I put forward does not treat the make up of this level as idiosyncratic, but subsumes it under a unified notion of linguistic interfaces. This allows us to understand core aspects of the linguistic-conceptual interface as an instance of a general pattern underlying the correlation of linguistic and non-linguistic structures. By doing so, the model aims to provide a broader perspective onto the distinction between and interaction of conceptual and linguistic processes and the correlation of semantic and syntactic structures

    RDF/S)XML Linguistic Annotation of Semantic Web Pages

    Full text link
    Although with the Semantic Web initiative much research on web pages semantic annotation has already done by AI researchers, linguistic text annotation, including the semantic one, was originally developed in Corpus Linguistics and its results have been somehow neglected by AI. ..

    Introduction

    Get PDF
    This chapter will motivate why it is useful to consider the topic of derivations and filtering in more detail. We will argue against the popular belief that the minimalist program and optimality theory are incompatible theories in that the former places the explanatory burden on the generative device (the computational system) whereas the latter places it on the fi ltering device (the OT evaluator). Although this belief may be correct in as far as it describes existing tendencies, we will argue that minimalist and optimality theoretic approaches normally adopt more or less the same global architecture of grammar: both assume that a generator defines a set S of potentially well-formed expressions that can be generated on the basis of a given input and that there is an evaluator that selects the expressions from S that are actually grammatical in a given language L. For this reason, we believe that it has a high priority to investigate the role of the two components in more detail in the hope that this will provide a better understanding of the differences and similarities between the two approaches. We will conclude this introduction with a brief review of the studies collected in this book.

    The diachronic emergence of retroflex segments in three languages

    Get PDF
    The present study shows that though retroflex segments can be considered articulatorily marked, there are perceptual reasons why languages introduce this class into their phoneme inventory. This observation is illustrated with the diachronic developments of retroflexes in Norwegian (North- Germanic), Nyawaygi (Australian) and Minto-Nenana (Athapaskan). The developments in these three languages are modelled in a perceptually oriented phonological theory, since traditional articulatorily-based features cannot deal with such processes
    corecore