231 research outputs found

    Structured Access in Sentence Comprehension

    Get PDF
    This thesis is concerned with the nature of memory access during the construction of long-distance dependencies in online sentence comprehension. In recent years, an intense focus on the computational challenges posed by long-distance dependencies has proven to be illuminating with respect to the characteristics of the architecture of the human sentence processor, suggesting a tight link between general memory access procedures and sentence processing routines (Lewis & Vasishth 2005; Lewis, Vasishth, & Van Dyke 2006; Wagers, Lau & Phillips 2009). The present thesis builds upon this line of research, and its primary aim is to motivate and defend the hypothesis that the parser accesses linguistic memory in an essentially structured fashion for certain long-distance dependencies. In order to make this case, I focus on the processing of reflexive and agreement dependencies, and ask whether or not non-structural information such as morphological features are used to gate memory access during syntactic comprehension. Evidence from eight experiments in a range of methodologies in English and Chinese is brought to bear on this question, providing arguments from interference effects and time-course effects that primarily syntactic information is used to access linguistic memory in the construction of certain long-distance dependencies. The experimental evidence for structured access is compatible with a variety of architectural assumptions about the parser, and I present one implementation of this idea in a parser based on the ACT-R memory architecture. In the context of such a content-addressable model of memory, the claim of structured access is equivalent to the claim that only syntactic cues are used to query memory. I argue that structured access reflects an optimal parsing strategy in the context of a noisy, interference-prone cognitive architecture: abstract structural cues are favored over lexical feature cues for certain structural dependencies in order to minimize memory interference in online processing

    CLiFF Notes: Research In Natural Language Processing at the University of Pennsylvania

    Get PDF
    CLIFF is the Computational Linguists\u27 Feedback Forum. We are a group of students and faculty who gather once a week to hear a presentation and discuss work currently in progress. The \u27feedback\u27 in the group\u27s name is important: we are interested in sharing ideas, in discussing ongoing research, and in bringing together work done by the students and faculty in Computer Science and other departments. However, there are only so many presentations which we can have in a year. We felt that it would be beneficial to have a report which would have, in one place, short descriptions of the work in Natural Language Processing at the University of Pennsylvania. This report then, is a collection of abstracts from both faculty and graduate students, in Computer Science, Psychology and Linguistics. We want to stress the close ties between these groups, as one of the things that we pride ourselves on here at Penn is the communication among different departments and the inter-departmental work. Rather than try to summarize the varied work currently underway at Penn, we suggest reading the abstracts to see how the students and faculty themselves describe their work. The report illustrates the diversity of interests among the researchers here, as well as explaining the areas of common interest. In addition, since it was our intent to put together a document that would be useful both inside and outside of the university, we hope that this report will explain to everyone some of what we are about

    The compilation of a sample PFR Chinese corpus of Skeleton-parsed sentences

    Get PDF
    The approach taken in this paper for the construction of a treebank is inspired by the skeleton parsing approach. From the PFR Chinese Corpus, a sample text of some 100,000 word tokens was chosen for the production of the treebank. A clear account of the 17 non terminal constituents that are defined and instantiated in the corpus texts will be provided in a parsing scheme. A set of parsing guidelines on practical issues related to map any parses on to sentences in the application of the parsing scheme will also be considered. It is noteworthy also to discuss the major difficulties encountered in the course of skeleton parsing, as this illuminates some of the peculiarities of the Chinese language. The conclusion is an evaluation of the success of the treebank compilation

    The CoreGram project: theoretical linguistics, theory development and verification

    Full text link

    An experimental approach to linguistic representation

    Get PDF

    A robust unification-based parser for Chinese natural language processing.

    Get PDF
    Chan Shuen-ti Roy.Thesis (M.Phil.)--Chinese University of Hong Kong, 2001.Includes bibliographical references (leaves 168-175).Abstracts in English and Chinese.Chapter 1. --- Introduction --- p.12Chapter 1.1. --- The nature of natural language processing --- p.12Chapter 1.2. --- Applications of natural language processing --- p.14Chapter 1.3. --- Purpose of study --- p.17Chapter 1.4. --- Organization of this thesis --- p.18Chapter 2. --- Organization and methods in natural language processing --- p.20Chapter 2.1. --- Organization of natural language processing system --- p.20Chapter 2.2. --- Methods employed --- p.22Chapter 2.3. --- Unification-based grammar processing --- p.22Chapter 2.3.1. --- Generalized Phase Structure Grammar (GPSG) --- p.27Chapter 2.3.2. --- Head-driven Phrase Structure Grammar (HPSG) --- p.31Chapter 2.3.3. --- Common drawbacks of UBGs --- p.33Chapter 2.4. --- Corpus-based processing --- p.34Chapter 2.4.1. --- Drawback of corpus-based processing --- p.35Chapter 3. --- Difficulties in Chinese language processing and its related works --- p.37Chapter 3.1. --- A glance at the history --- p.37Chapter 3.2. --- Difficulties in syntactic analysis of Chinese --- p.37Chapter 3.2.1. --- Writing system of Chinese causes segmentation problem --- p.38Chapter 3.2.2. --- Words serving multiple grammatical functions without inflection --- p.40Chapter 3.2.3. --- Word order of Chinese --- p.42Chapter 3.2.4. --- The Chinese grammatical word --- p.43Chapter 3.3. --- Related works --- p.45Chapter 3.3.1. --- Unification grammar processing approach --- p.45Chapter 3.3.2. --- Corpus-based processing approach --- p.48Chapter 3.4. --- Restatement of goal --- p.50Chapter 4. --- SERUP: Statistical-Enhanced Robust Unification Parser --- p.54Chapter 5. --- Step One: automatic preprocessing --- p.57Chapter 5.1. --- Segmentation of lexical tokens --- p.57Chapter 5.2. --- "Conversion of date, time and numerals" --- p.61Chapter 5.3. --- Identification of new words --- p.62Chapter 5.3.1. --- Proper nouns ´ؤ Chinese names --- p.63Chapter 5.3.2. --- Other proper nouns and multi-syllabic words --- p.67Chapter 5.4. --- Defining smallest parsing unit --- p.82Chapter 5.4.1. --- The Chinese sentence --- p.82Chapter 5.4.2. --- Breaking down the paragraphs --- p.84Chapter 5.4.3. --- Implementation --- p.87Chapter 6. --- Step Two: grammar construction --- p.91Chapter 6.1. --- Criteria in choosing a UBG model --- p.91Chapter 6.2. --- The grammar in details --- p.92Chapter 6.2.1. --- The PHON feature --- p.93Chapter 6.2.2. --- The SYN feature --- p.94Chapter 6.2.3. --- The SEM feature --- p.98Chapter 6.2.4. --- Grammar rules and features principles --- p.99Chapter 6.2.5. --- Verb phrases --- p.101Chapter 6.2.6. --- Noun phrases --- p.104Chapter 6.2.7. --- Prepositional phrases --- p.113Chapter 6.2.8. --- """Ba2"" and ""Bei4"" constructions" --- p.115Chapter 6.2.9. --- The terminal node S --- p.119Chapter 6.2.10. --- Summary of phrasal rules --- p.121Chapter 6.2.11. --- Morphological rules --- p.122Chapter 7. --- Step Three: resolving structural ambiguities --- p.128Chapter 7.1. --- Sources of ambiguities --- p.128Chapter 7.2. --- The traditional practices: an illustration --- p.132Chapter 7.3. --- Deficiency of current practices --- p.134Chapter 7.4. --- A new point of view: Wu (1999) --- p.140Chapter 7.5. --- Improvement over Wu (1999) --- p.142Chapter 7.6. --- Conclusion on semantic features --- p.146Chapter 8. --- "Implementation, performance and evaluation" --- p.148Chapter 8.1. --- Implementation --- p.148Chapter 8.2. --- Performance and evaluation --- p.150Chapter 8.2.1. --- The test set --- p.150Chapter 8.2.2. --- Segmentation of lexical tokens --- p.150Chapter 8.2.3. --- New word identification --- p.152Chapter 8.2.4. --- Parsing unit segmentation --- p.156Chapter 8.2.5. --- The grammar --- p.158Chapter 8.3. --- Overall performance of SERUP --- p.162Chapter 9. --- Conclusion --- p.164Chapter 9.1. --- Summary of this thesis --- p.164Chapter 9.2. --- Contribution of this thesis --- p.165Chapter 9.3. --- Future work --- p.166References --- p.168Appendix I --- p.176Appendix II --- p.181Appendix III --- p.18

    Treebank-based acquisition of Chinese LFG resources for parsing and generation

    Get PDF
    This thesis describes a treebank-based approach to automatically acquire robust,wide-coverage Lexical-Functional Grammar (LFG) resources for Chinese parsing and generation, which is part of a larger project on the rapid construction of deep, large-scale, constraint-based, multilingual grammatical resources. I present an application-oriented LFG analysis for Chinese core linguistic phenomena and (in cooperation with PARC) develop a gold-standard dependency-bank of Chinese f-structures for evaluation. Based on the Penn Chinese Treebank, I design and implement two architectures for inducing Chinese LFG resources, one annotation-based and the other dependency conversion-based. I then apply the f-structure acquisition algorithm together with external, state-of-the-art parsers to parsing new text into "proto" f-structures. In order to convert "proto" f-structures into "proper" f-structures or deep dependencies, I present a novel Non-Local Dependency (NLD) recovery algorithm using subcategorisation frames and f-structure paths linking antecedents and traces in NLDs extracted from the automatically-built LFG f-structure treebank. Based on the grammars extracted from the f-structure annotated treebank, I develop a PCFG-based chart generator and a new n-gram based pure dependency generator to realise Chinese sentences from LFG f-structures. The work reported in this thesis is the first effort to scale treebank-based, probabilistic Chinese LFG resources from proof-of-concept research to unrestricted, real text. Although this thesis concentrates on Chinese and LFG, many of the methodologies, e.g. the acquisition of predicate-argument structures, NLD resolution and the PCFG- and dependency n-gram-based generation models, are largely language and formalism independent and should generalise to diverse languages as well as to labelled bilexical dependency representations other than LFG

    A preliminary bibliography on focus

    Get PDF
    [I]n its present form, the bibliography contains approximately 1100 entries. Bibliographical work is never complete, and the present one is still modest in a number of respects. It is not annotated, and it still contains a lot of mistakes and inconsistencies. It has nevertheless reached a stage which justifies considering the possibility of making it available to the public. The first step towards this is its pre-publication in the form of this working paper. […] The bibliography is less complete for earlier years. For works before 1970, the bibliographies of Firbas and Golkova 1975 and Tyl 1970 may be consulted, which have not been included here

    THE COMPUTATION OF VERB-ARGUMENT RELATIONS IN ONLINE SENTENCE COMPREHENSION

    Get PDF
    Understanding how verbs are related to their arguments in real time is critical to building a theory of online language comprehension. This dissertation investigates the incremental processing of verb-argument relations with three interrelated approaches that use the event-related potential (ERP) methodology. First, although previous studies on verb-argument computations have mainly focused on relating nouns to simple events denoted by a simple verb, here I show by investigating compound verbs I can dissociate the timing of the subcomputations involved in argument role assignment. A set of ERP experiments in Mandarin comparing the processing of resultative compounds (Kid bit-broke lip: the kid bit his lip such that it broke) and coordinate compounds (Store owner hit-scolded employee: the store owner hit and scolded an employee) provides evidence for processing delays associated with verbs instantiating the causality relation (breaking-BY-biting) relative to the coordinate relation (hitting-AND-scolding). Second, I develop an extension of classic ERP work on the detection of argument role-reversals (the millionaire that the servant fired) that allows me to determine the temporal stages by which argument relations are computed, from argument identification to thematic roles. Our evidence supports a three-stage model where an initial word association stage is followed by a second stage where arguments of a verb are identified, and only at a later stage does the parser start to consider argument roles. Lastly, I investigate the extent to which native language (L1) subcategorization knowledge can interfere with second language (L2) processing of verb-argument relations, by examining the ERP responses to sentences with verbs that have mismatched subcategorization constraints in L1 Mandarin and L2 English (“My sister listened the music”). The results support my hypothesis that L1 subcategorization knowledge is difficult for L2 speakers to override online, as they show some sensitivity to subcategorization violations in offline responses but not in ERPs. These data indicate that computing verb-argument relations requires accessing lexical syntax, which is vulnerable to L1 interference in L2. Together, these three ERP studies allow us to begin to put together a full model of the sub-processes by which verb-argument relations are constructed in real time in L1 and L2
    corecore