14 research outputs found

    Data-Oriented Language Processing. An Overview

    Full text link
    During the last few years, a new approach to language processing has started to emerge, which has become known under various labels such as "data-oriented parsing", "corpus-based interpretation", and "tree-bank grammar" (cf. van den Berg et al. 1994; Bod 1992-96; Bod et al. 1996a/b; Bonnema 1996; Charniak 1996a/b; Goodman 1996; Kaplan 1996; Rajman 1995a/b; Scha 1990-92; Sekine & Grishman 1995; Sima'an et al. 1994; Sima'an 1995-96; Tugwell 1995). This approach, which we will call "data-oriented processing" or "DOP", embodies the assumption that human language perception and production works with representations of concrete past language experiences, rather than with abstract linguistic rules. The models that instantiate this approach therefore maintain large corpora of linguistic representations of previously occurring utterances. When processing a new input utterance, analyses of this utterance are constructed by combining fragments from the corpus; the occurrence-frequencies of the fragments are used to estimate which analysis is the most probable one. In this paper we give an in-depth discussion of a data-oriented processing model which employs a corpus of labelled phrase-structure trees. Then we review some other models that instantiate the DOP approach. Many of these models also employ labelled phrase-structure trees, but use different criteria for extracting fragments from the corpus or employ different disambiguation strategies (Bod 1996b; Charniak 1996a/b; Goodman 1996; Rajman 1995a/b; Sekine & Grishman 1995; Sima'an 1995-96); other models use richer formalisms for their corpus annotations (van den Berg et al. 1994; Bod et al., 1996a/b; Bonnema 1996; Kaplan 1996; Tugwell 1995).Comment: 34 pages, Postscrip

    Evaluation of the NLP Components of the OVIS2 Spoken Dialogue System

    Full text link
    The NWO Priority Programme Language and Speech Technology is a 5-year research programme aiming at the development of spoken language information systems. In the Programme, two alternative natural language processing (NLP) modules are developed in parallel: a grammar-based (conventional, rule-based) module and a data-oriented (memory-based, stochastic, DOP) module. In order to compare the NLP modules, a formal evaluation has been carried out three years after the start of the Programme. This paper describes the evaluation procedure and the evaluation results. The grammar-based component performs much better than the data-oriented one in this comparison.Comment: Proceedings of CLIN 9

    Vector Symbolic Architectures answer Jackendoff's challenges for cognitive neuroscience

    Full text link
    Jackendoff (2002) posed four challenges that linguistic combinatoriality and rules of language present to theories of brain function. The essence of these problems is the question of how to neurally instantiate the rapid construction and transformation of the compositional structures that are typically taken to be the domain of symbolic processing. He contended that typical connectionist approaches fail to meet these challenges and that the dialogue between linguistic theory and cognitive neuroscience will be relatively unproductive until the importance of these problems is widely recognised and the challenges answered by some technical innovation in connectionist modelling. This paper claims that a little-known family of connectionist models (Vector Symbolic Architectures) are able to meet Jackendoff's challenges.Comment: This is a slightly updated version of the paper presented at the Joint International Conference on Cognitive Science, 13-17 July 2003, University of New South Wales, Sydney, Australia. 6 page

    Robust Grammatical Analysis for Spoken Dialogue Systems

    Full text link
    We argue that grammatical analysis is a viable alternative to concept spotting for processing spoken input in a practical spoken dialogue system. We discuss the structure of the grammar, and a model for robust parsing which combines linguistic sources of information and statistical sources of information. We discuss test results suggesting that grammatical processing allows fast and accurate processing of spoken input.Comment: Accepted for JNL

    Habeant Corpus—they should have the body. Tools learners have the right to use

    Get PDF
    GrĂące Ă  des outils informatiques rapides, puissants, peu onĂ©reux et aisĂ©ment accessibles, l’utilisation des corpus a vu une vĂ©ritable explosion au cours des vingt derniĂšres annĂ©es. Dans le domaine de l’apprentissage des langues Ă©trangĂšres, cependant, l’exploitation des corpus est essentiellement le fait des chercheurs, des auteurs de manuels et des enseignants, tandis que les bĂ©nĂ©fices que les apprenants retirent de ces avancĂ©es sont la plupart du temps indirects. Rares, en effet, sont les enseignants qui permettent Ă  leurs Ă©tudiants un accĂšs direct aux corpus. Cet article dĂ©fend l’idĂ©e que rien ne s’oppose Ă  l’utilisation des corpus au moins par des apprenants « avancĂ©s » et que le fait d’encourager cette dĂ©marche active comporte des avantages considĂ©rables. AprĂšs avoir dĂ©fini briĂšvement la logique de l’approche prĂ©sentĂ©e, nous dĂ©crirons un cursus d’anglais dans lequel nous demandons aux apprenants d’appliquer les techniques d’analyse de corpus Ă  un corpus existant ou confectionnĂ© par leurs soins. Nous dĂ©crirons ensuite les productions de nos propres Ă©tudiants en utilisant les mĂȘmes techniques et outils, disponibles gratuitement sur Internet, et qui ne nĂ©cessitent qu’un degrĂ© minimal de maĂźtrise de l’informatique.With the advent of fast, powerful, cheap and accessible computer tools, the use of corpora has exploded in the last 20 years. In the field of language learning, however, their use is mainly restricted to researchers, course writers and teachers, while the benefits to the learner are largely second hand: rare is the teacher who allows a class direct access to corpus methodology. This paper argues that there is no reason not to trust at least advanced learners with corpus tools, and that there are significant advantages to encouraging a hands-on approach. After outlining the rationale underpinning this approach, we describe an English course where learners are required to apply corpus techniques to an existing corpus or one of their own devising. We then go on to describe our students’ own productions, using only corpus techniques and tools used by the learners themselves, all freely available on the internet and requiring minimal training

    The Baby project: processing character patterns in textual representations of language.

    Get PDF
    This thesis describes an investigation into a proposed theory of AI. The theory postulates that a machine can be programmed to predict aspects of human behaviour by selecting and processing stored, concrete examples of previously experienced patterns of behaviour. Validity is tested in the domain of natural language. Externalisations that model the resulting theory of NLP entail fuzzy components. Fuzzy formalisms may exhibit inaccuracy and/or over productivity. A research strategy is developed, designed to investigate this aspect of the theory. The strategy includes two experimental hypotheses designed to test, 1) whether the model can process simple language interaction, and 2) the effect of fuzzy processes on such language interaction. Experimental design requires three implementations, each with progressive degrees of fuzziness in their processes. They are respectively named: Nonfuzz Babe, CorrBab and FuzzBabe. Nonfuzz Babe is used to test the first hypothesis and all three implementations are used to test the second hypothesis. A system description is presented for Nonfuzz Babe. Testing the first hypothesis provides results that show NonfuzzBabe is able to process simple language interaction. A system description for CorrBabe and FuzzBabe is presented. Testing the second hypothesis, provides results that show a positive correlation between degree of fuzzy processes and improved simple language performance. FuzzBabe's ability to process more complex language interaction is then investigated and model-intrinsic limitations are found. Research to overcome this problem is designed to illustrate the potential of externalisation of the theory and is conducted less rigorously than previous part of this investigation. Augmenting FuzzBabe to include fuzzy evaluation of non-pattern elements of interaction is hypothesised as a possible solution. The term FuzzyBaby was coined for augmented implementation. Results of a pilot study designed to measure FuzzyBaby's reading comprehension are given. Little research has been conducted that investigates NLP by the fuzzy processing of concrete patterns in language. Consequently, it is proposed that this research contributes to the intellectual disciplines of NLP and AI in general

    An evolutionary algorithm approach to poetry generation

    Get PDF
    Institute for Communicating and Collaborative SystemsPoetry is a unique artifact of the human language faculty, with its defining feature being a strong unity between content and form. Contrary to the opinion that the automatic generation of poetry is a relatively easy task, we argue that it is in fact an extremely difficult task that requires intelligence, world and linguistic knowledge, and creativity. We propose a model of poetry generation as a state space search problem, where a goal state is a text that satisfies the three properties of meaningfulness, grammaticality, and poeticness. We argue that almost all existing work on poetry generation only properly addresses a subset of these properties. In designing a computational approach for solving this problem, we draw upon the wealth of work in natural language generation (NLG). Although the emphasis of NLG research is on the generation of informative texts, recent work has highlighted the need for more flexible models which can be cast as one end of a spectrum of search sophistication, where the opposing end is the deterministically goal-directed planning of traditional NLG. We propose satisfying the properties of poetry through the application to NLG of evolutionary algorithms (EAs), a wellstudied heuristic search method. MCGONAGALL is our implemented instance of this approach. We use a linguistic representation based on Lexicalized Tree Adjoining Grammar (LTAG) that we argue is appropriate for EA-based NLG. Several genetic operators are implemented, ranging from baseline operators based on LTAG syntactic operations to heuristic semantic goal-directed operators. Two evaluation functions are implemented: one that measures the isomorphism between a solution’s stress pattern and a target metre using the edit distance algorithm, and one that measures the isomorphism between a solution’s propositional semantics and a target semantics using structural similarity metrics. We conducted an empirical study using MCGONAGALL to test the validity of employing EAs in solving the search problem, and to test whether our evaluation functions adequately capture the notions of semantic and metrical faithfulness. We conclude that our use of EAs offers an innovative approach to flexible NLG, as demonstrated by its successful application to the poetry generation task

    Contextually-Dependent Lexical Semantics

    Get PDF
    Institute for Communicating and Collaborative SystemsThis thesis is an investigation of phenomena at the interface between syntax, semantics, and pragmatics, with the aim of arguing for a view of semantic interpretation as lexically driven yet contextually dependent. I examine regular, generative processes which operate over the lexicon to induce verbal sense shifts, and discuss the interaction of these processes with the linguistic or discourse context. I concentrate on phenomena where only an interaction between all three linguistic knowledge sources can explain the constraints on verb use: conventionalised lexical semantic knowledge constrains productive syntactic processes, while pragmatic reasoning is both constrained by and constrains the potential interpretations given to certain verbs. The phenomena which are closely examined are the behaviour of PP sentential modifiers (specifically dative and directional PPs) with respect to the lexical semantic representation of the verb phrases they modify, resultative constructions, and logical metonymy. The analysis is couched in terms of a lexical semantic representation drawing on Davis (1995), Jackendoff (1983, 1990), and Pustejovsky (1991, 1995) which aims to capture “linguistically relevant” components of meaning. The representation is shown to have utility for modeling of the interaction between the syntactic form of an utterance and its meaning. I introduce a formalisation of the representation within the framework of Head Driven Phrase Structure Grammar (Pollard and Sag 1994), and rely on the model of discourse coherence proposed by Lascarides and Asher (1992), Discourse in Commonsense Entailment. I furthermore discuss the implications of the contextual dependency of semantic interpretation for lexicon design and computational processing in Natural Language Understanding systems
    corecore