36,804 research outputs found

    Developmental constraints on learning artificial grammars with fixed, flexible and free word order

    Get PDF
    Human learning, although highly flexible and efficient, is constrained in ways that facilitate or impede the acquisition of certain systems of information. Some such constraints, active during infancy and childhood, have been proposed to account for the apparent ease with which typically developing children acquire language. In a series of experiments, we investigated the role of developmental constraints on learning artificial grammars with a distinction between shorter and relatively frequent words (‘function words,’ F-words) and longer and less frequent words (‘content words,’ C-words). We constructed 4 finite-state grammars, in which the order of F-words, relative to C-words, was either fixed (F-words always occupied the same positions in a string), flexible (every F-word always followed a C-word), or free. We exposed adults (N = 84) and kindergarten children (N = 100) to strings from each of these artificial grammars, and we assessed their ability to recognize strings with the same structure, but a different vocabulary. Adults were better at recognizing strings when regularities were available (i.e., fixed and flexible order grammars), while children were better at recognizing strings from the grammars consistent with the attested distribution of function and content words in natural languages (i.e., flexible and free order grammars). These results provide evidence for a link between developmental constraints on learning and linguistic typology

    Implicit learning of recursive context-free grammars

    Get PDF
    Context-free grammars are fundamental for the description of linguistic syntax. However, most artificial grammar learning experiments have explored learning of simpler finite-state grammars, while studies exploring context-free grammars have not assessed awareness and implicitness. This paper explores the implicit learning of context-free grammars employing features of hierarchical organization, recursive embedding and long-distance dependencies. The grammars also featured the distinction between left- and right-branching structures, as well as between centre- and tail-embedding, both distinctions found in natural languages. People acquired unconscious knowledge of relations between grammatical classes even for dependencies over long distances, in ways that went beyond learning simpler relations (e.g. n-grams) between individual words. The structural distinctions drawn from linguistics also proved important as performance was greater for tail-embedding than centre-embedding structures. The results suggest the plausibility of implicit learning of complex context-free structures, which model some features of natural languages. They support the relevance of artificial grammar learning for probing mechanisms of language learning and challenge existing theories and computational models of implicit learning

    On the relationship between the LL(k) and LR(k) grammars

    Get PDF
    In the literature various proofs of the inclusion of the class of LL(k) grammars into the class of LR(k) grammars can be found. Some of these proofs are not correct, others are informal, semi-formal or contain flaws. Some of them are correct but the proof is less straightforward than demonstrated here

    Syntactic variation and diglossia in French

    Get PDF
    The present article addresses syntactic variation within French, and is an example of a relatively recent shift in attitude towards variation in this language. It considers the status of the variation with respect to the mental grammars of speakers, in particular in the light of Massot’s work suggesting that contemporary metropolitan France is characterised by diglossia, that is, a community of speakers with two (in this case massively overlapping but not entirely identical) ‘French’ grammars which co-exist in their minds, one stylistically marked High, the other Low. The article reviews one particular instance of variation and argues that Massot’s model needs to be revised in order to account for the particular phenomenon of surface forms which can be generated by both putative grammars but which have a different linguistic status in each

    Practical experiments with regular approximation of context-free languages

    Get PDF
    Several methods are discussed that construct a finite automaton given a context-free grammar, including both methods that lead to subsets and those that lead to supersets of the original context-free language. Some of these methods of regular approximation are new, and some others are presented here in a more refined form with respect to existing literature. Practical experiments with the different methods of regular approximation are performed for spoken-language input: hypotheses from a speech recognizer are filtered through a finite automaton.Comment: 28 pages. To appear in Computational Linguistics 26(1), March 200

    Empirical Risk Minimization for Probabilistic Grammars: Sample Complexity and Hardness of Learning

    Get PDF
    Probabilistic grammars are generative statistical models that are useful for compositional and sequential structures. They are used ubiquitously in computational linguistics. We present a framework, reminiscent of structural risk minimization, for empirical risk minimization of probabilistic grammars using the log-loss. We derive sample complexity bounds in this framework that apply both to the supervised setting and the unsupervised setting. By making assumptions about the underlying distribution that are appropriate for natural language scenarios, we are able to derive distribution-dependent sample complexity bounds for probabilistic grammars. We also give simple algorithms for carrying out empirical risk minimization using this framework in both the supervised and unsupervised settings. In the unsupervised case, we show that the problem of minimizing empirical risk is NP-hard. We therefore suggest an approximate algorithm, similar to expectation-maximization, to minimize the empirical risk. Learning from data is central to contemporary computational linguistics. It is in common in such learning to estimate a model in a parametric family using the maximum likelihood principle. This principle applies in the supervised case (i.e., using annotate

    Splittability of bilexical context-free grammars is undecidable

    Get PDF
    Bilexical context-free grammars (2-LCFGs) have proved to be accurate models for statistical natural language parsing. Existing dynamic programming algorithms used to parse sentences under these models have running time of O(|w|^4), where w is the input string. A 2-LCFG is splittable if the left arguments of a lexical head are always independent of the right arguments, and vice versa. When a 2-LCFGs is splittable, parsing time can be asymptotically improved to O(|w|^3). Testing this propertyis therefore of central interest to parsing efficiency. In this article, however, we show the negative result that splittability of 2-LCFGs is undecidable.Publisher PDFPeer reviewe
    corecore