Search CORE

2,941 research outputs found

Implicit learning of recursive context-free grammars

Author: Johan J. Bolhuis
Martin Rohrmeier
Qiufang Fu
Zoltan Dienes
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2012
Field of study

Context-free grammars are fundamental for the description of linguistic syntax. However, most artificial grammar learning experiments have explored learning of simpler finite-state grammars, while studies exploring context-free grammars have not assessed awareness and implicitness. This paper explores the implicit learning of context-free grammars employing features of hierarchical organization, recursive embedding and long-distance dependencies. The grammars also featured the distinction between left- and right-branching structures, as well as between centre- and tail-embedding, both distinctions found in natural languages. People acquired unconscious knowledge of relations between grammatical classes even for dependencies over long distances, in ways that went beyond learning simpler relations (e.g. n-grams) between individual words. The structural distinctions drawn from linguistics also proved important as performance was greater for tail-embedding than centre-embedding structures. The results suggest the plausibility of implicit learning of complex context-free structures, which model some features of natural languages. They support the relevance of artificial grammar learning for probing mechanisms of language learning and challenge existing theories and computational models of implicit learning

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

Institute of Psychology,Chinese Academy Of Sciences

PubMed Central

Sussex Research Online

FigShare

Language Acquisition in Computers

Author: Belzner Megan
Colin-Ellerin Sean
Roman Jorge H.
Publication venue
Publication date: 01/01/2012
Field of study

This project explores the nature of language acquisition in computers, guided by techniques similar to those used in children. While existing natural language processing methods are limited in scope and understanding, our system aims to gain an understanding of language from first principles and hence minimal initial input. The first portion of our system was implemented in Java and is focused on understanding the morphology of language using bigrams. We use frequency distributions and differences between them to define and distinguish languages. English and French texts were analyzed to determine a difference threshold of 55 before the texts are considered to be in different languages, and this threshold was verified using Spanish texts. The second portion of our system focuses on gaining an understanding of the syntax of a language using a recursive method. The program uses one of two possible methods to analyze given sentences based on either sentence patterns or surrounding words. Both methods have been implemented in C++. The program is able to understand the structure of simple sentences and learn new words. In addition, we have provided some suggestions regarding future work and potential extensions of the existing program.Comment: 39 pages, 10 figures and 6 table

arXiv.org e-Print Archive

CiteSeerX

Modelling the acquisition of syntactic categories

Author: Gobet F
Pine J M
Publication venue: 'Informa UK Limited'
Publication date: 01/01/1997
Field of study

This research represents an attempt to model the child’s acquisition of syntactic categories. A computational model, based on the EPAM theory of perception and learning, is developed. The basic assumptions are that (1) syntactic categories are actively constructed by the child using distributional learning abilities; and (2) cognitive constraints in learning rate and memory capacity limit these learning abilities. We present simulations of the syntax acquisition of a single subject, where the model learns to build up multi-word utterances by scanning a sample of the speech addressed to the subject by his mother

CiteSeerX

Brunel University Research Archive

The acquisition of questions with long-distance dependencies

Author: Anna Theakston
Caroline Rowland
Chomsky Noam
de Villiers Jill
de Villiers Jill
Ewa Dąbrowska
Lust
Philip William
Thornton Rosalind
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/01/2009
Field of study

A number of researchers have claimed that questions and other constructions with long distance dependencies (LDDs) are acquired relatively early, by age 4 or even earlier, in spite of their complexity. Analysis of LDD questions in the input available to children suggests that they are extremely stereotypical, raising the possibility that children learn lexically specific templates such as WH do you think S-GAP? rather than general rules of the kind postulated in traditional linguistic accounts of this construction. We describe three elicited imitation experiments with children aged from 4;6 to 6;9 and adult controls. Participants were asked to repeat prototypical questions (i.e., questions which match the hypothesised template), unprototypical questions (which depart from it in several respects) and declarative counterparts of both types of interrogative sentences. The children performed significantly better on the prototypical variants of both constructions, even when both variants contained exactly the same lexical material, while adults showed prototypicality e¤ects for LDD questions only. These results suggest that a general declarative complementation construction emerges quite late in development (after age 6), and that even adults rely on lexically specific templates for LDD questions

Northumbria Research Link

Crossref

The University of Manchester - Institutional Repository

Radboud Repository

MPG.PuRe

Questions with long-distance dependencies: a usage-based perspective

Author: Dabrowska Ewa
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/08/2008
Field of study

Attested questions with long-distance dependencies (e.g., What do you think you’re doing?) tend to be quite stereotypical: the matrix clause usually consists of a WH word, the auxiliary do or did, the pronoun you, and the verb think or say, with no other elements; and they virtually never contain more than one subordinate clause. This has lead some researchers in the usage-based framework (Da˛browska 2004; Verhagen 2005) to hypothesise that speakers’ knowledge about such constructions is best explained in terms of relatively speciﬁc, low level templates rather than general rules that apply ‘‘across the board’’. The research reported here was designed to test this hypothesis and alternative hypotheses derived from rule-based theories

Northumbria Research Link

Recommended from our members

Identifying idiolect in forensic authorship attribution: an n-gram textbite approach

Author: Johnson A
Wright D
Publication venue: Faculdade de Letras da Universidade do Porto
Publication date: 01/01/2014
Field of study

Forensic authorship attribution is concerned with identifying authors of disputed or anonymous documents, which are potentially evidential in legal cases, through the analysis of linguistic clues left behind by writers. The forensic linguist “approaches this problem of questioned authorship from the theoretical position that every native speaker has their own distinct and individual version of the language [. . . ], their own idiolect” (Coulthard, 2004: 31). However, given the diXculty in empirically substantiating a theory of idiolect, there is growing concern in the Veld that it remains too abstract to be of practical use (Kredens, 2002; Grant, 2010; Turell, 2010). Stylistic, corpus, and computational approaches to text, however, are able to identify repeated collocational patterns, or n-grams, two to six word chunks of language, similar to the popular notion of soundbites: small segments of no more than a few seconds of speech that journalists are able to recognise as having news value and which characterise the important moments of talk. The soundbite oUers an intriguing parallel for authorship attribution studies, with the following question arising: looking at any set of texts by any author, is it possible to identify ‘n-gram textbites’, small textual segments that characterise that author’s writing, providing DNA-like chunks of identifying material

Nottingham Trent Institutional Repository (IRep)

White Rose Research Online

Better text compression from fewer lexical n-grams

Author: Lorenz Michelle
Smith Tony C.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2001
Field of study

Word-based context models for text compression have the capacity to outperform more simple character-based models, but are generally unattractive because of inherent problems with exponential model growth and corresponding data sparseness. These ill-effects can be mitigated in an adaptive lossless compression scheme by modelling syntactic and semantic lexical dependencies independently

Research Commons@Waikato

Parsing coordinations

Author: Hinrichs Erhard
Klett Eva
Kübler Sandra
Maier Wolfgang
Publication venue
Publication date: 05/05/2009
Field of study

The present paper is concerned with statistical parsing of constituent structures in German. The paper presents four experiments that aim at improving parsing performance of coordinate structure: 1) reranking the n-best parses of a PCFG parser, 2) enriching the input to a PCFG parser by gold scopes for any conjunct, 3) reranking the parser output for all possible scopes for conjuncts that are permissible with regard to clause structure. Experiment 4 reranks a combination of parses from experiments 1 and 3. The experiments presented show that n- best parsing combined with reranking improves results by a large margin. Providing the parser with different scope possibilities and reranking the resulting parses results in an increase in F-score from 69.76 for the baseline to 74.69. While the F-score is similar to the one of the first experiment (n-best parsing and reranking), the first experiment results in higher recall (75.48% vs. 73.69%) and the third one in higher precision (75.43% vs. 73.26%). Combining the two methods results in the best result with an F-score of 76.69

Hochschulschriftenserver - Universität Frankfurt am Main