Search CORE

14 research outputs found

Data-Oriented Language Processing. An Overview

Author: Bod Rens
Scha Remko
Publication venue
Publication date: 01/01/1996
Field of study

During the last few years, a new approach to language processing has started to emerge, which has become known under various labels such as "data-oriented parsing", "corpus-based interpretation", and "tree-bank grammar" (cf. van den Berg et al. 1994; Bod 1992-96; Bod et al. 1996a/b; Bonnema 1996; Charniak 1996a/b; Goodman 1996; Kaplan 1996; Rajman 1995a/b; Scha 1990-92; Sekine & Grishman 1995; Sima'an et al. 1994; Sima'an 1995-96; Tugwell 1995). This approach, which we will call "data-oriented processing" or "DOP", embodies the assumption that human language perception and production works with representations of concrete past language experiences, rather than with abstract linguistic rules. The models that instantiate this approach therefore maintain large corpora of linguistic representations of previously occurring utterances. When processing a new input utterance, analyses of this utterance are constructed by combining fragments from the corpus; the occurrence-frequencies of the fragments are used to estimate which analysis is the most probable one. In this paper we give an in-depth discussion of a data-oriented processing model which employs a corpus of labelled phrase-structure trees. Then we review some other models that instantiate the DOP approach. Many of these models also employ labelled phrase-structure trees, but use different criteria for extracting fragments from the corpus or employ different disambiguation strategies (Bod 1996b; Charniak 1996a/b; Goodman 1996; Rajman 1995a/b; Sekine & Grishman 1995; Sima'an 1995-96); other models use richer formalisms for their corpus annotations (van den Berg et al. 1994; Bod et al., 1996a/b; Bonnema 1996; Kaplan 1996; Tugwell 1995).Comment: 34 pages, Postscrip

arXiv.org e-Print Archive

CiteSeerX

Evaluation of the NLP Components of the OVIS2 Spoken Dialogue System

Author: Bonnema Remko
Bouma Gosse
Sima'an Khalil
van Noord Gertjan
van Zanten Gert Veldhuijzen
Publication venue
Publication date: 01/01/1999
Field of study

The NWO Priority Programme Language and Speech Technology is a 5-year research programme aiming at the development of spoken language information systems. In the Programme, two alternative natural language processing (NLP) modules are developed in parallel: a grammar-based (conventional, rule-based) module and a data-oriented (memory-based, stochastic, DOP) module. In order to compare the NLP modules, a formal evaluation has been carried out three years after the start of the Programme. This paper describes the evaluation procedure and the evaluation results. The grammar-based component performs much better than the data-oriented one in this comparison.Comment: Proceedings of CLIN 9

arXiv.org e-Print Archive

CiteSeerX

Vector Symbolic Architectures answer Jackendoff's challenges for cognitive neuroscience

Author: Gayler Ross W.
Publication venue
Publication date: 01/01/2003
Field of study

Jackendoff (2002) posed four challenges that linguistic combinatoriality and rules of language present to theories of brain function. The essence of these problems is the question of how to neurally instantiate the rapid construction and transformation of the compositional structures that are typically taken to be the domain of symbolic processing. He contended that typical connectionist approaches fail to meet these challenges and that the dialogue between linguistic theory and cognitive neuroscience will be relatively unproductive until the importance of these problems is widely recognised and the challenges answered by some technical innovation in connectionist modelling. This paper claims that a little-known family of connectionist models (Vector Symbolic Architectures) are able to meet Jackendoff's challenges.Comment: This is a slightly updated version of the paper presented at the Joint International Conference on Cognitive Science, 13-17 July 2003, University of New South Wales, Sydney, Australia. 6 page

arXiv.org e-Print Archive

CiteSeerX

Robust Grammatical Analysis for Spoken Dialogue Systems

Author: Bouma Gosse
Koeling Rob
Nederhof Mark-Jan
van Noord Gertjan
Publication venue
Publication date: 01/01/1998
Field of study

We argue that grammatical analysis is a viable alternative to concept spotting for processing spoken input in a practical spoken dialogue system. We discuss the structure of the grammar, and a model for robust parsing which combines linguistic sources of information and statistical sources of information. We discuss test results suggesting that grammatical processing allows fast and accurate processing of spoken input.Comment: Accepted for JNL

arXiv.org e-Print Archive

CiteSeerX

Surface cues of content and tenor in text

Author: Brill E.
Grosz B. J.
Landauer T. K
Liesbeth Degand
Luuk Lagerwerf
Mann W. C.
Sanders T. J. M.
Wilbert Spooren
Publication venue
Publication date: 01/01/2006
Field of study

Crossref

VU Research Portal

DIAL UCLouvain

Habeant Corpus—they should have the body. Tools learners have the right to use

Author: Boulton Alex
Wilhelm Stephan
Publication venue: 'OpenEdition'
Publication date: 03/02/2010
Field of study

Grâce à des outils informatiques rapides, puissants, peu onéreux et aisément accessibles, l’utilisation des corpus a vu une véritable explosion au cours des vingt dernières années. Dans le domaine de l’apprentissage des langues étrangères, cependant, l’exploitation des corpus est essentiellement le fait des chercheurs, des auteurs de manuels et des enseignants, tandis que les bénéfices que les apprenants retirent de ces avancées sont la plupart du temps indirects. Rares, en effet, sont les enseignants qui permettent à leurs étudiants un accès direct aux corpus. Cet article défend l’idée que rien ne s’oppose à l’utilisation des corpus au moins par des apprenants « avancés » et que le fait d’encourager cette démarche active comporte des avantages considérables. Après avoir défini brièvement la logique de l’approche présentée, nous décrirons un cursus d’anglais dans lequel nous demandons aux apprenants d’appliquer les techniques d’analyse de corpus à un corpus existant ou confectionné par leurs soins. Nous décrirons ensuite les productions de nos propres étudiants en utilisant les mêmes techniques et outils, disponibles gratuitement sur Internet, et qui ne nécessitent qu’un degré minimal de maîtrise de l’informatique.With the advent of fast, powerful, cheap and accessible computer tools, the use of corpora has exploded in the last 20 years. In the field of language learning, however, their use is mainly restricted to researchers, course writers and teachers, while the benefits to the learner are largely second hand: rare is the teacher who allows a class direct access to corpus methodology. This paper argues that there is no reason not to trust at least advanced learners with corpus tools, and that there are significant advantages to encouraging a hands-on approach. After outlining the rationale underpinning this approach, we describe an English course where learners are required to apply corpus techniques to an existing corpus or one of their own devising. We then go on to describe our students’ own productions, using only corpus techniques and tools used by the learners themselves, all freely available on the internet and requiring minimal training

OpenEdition

The Baby project: processing character patterns in textual representations of language.

Author: Rogers Paul Anton Peter
Publication venue
Publication date
Field of study

This thesis describes an investigation into a proposed theory of AI. The theory postulates that a machine can be programmed to predict aspects of human behaviour by selecting and processing stored, concrete examples of previously experienced patterns of behaviour. Validity is tested in the domain of natural language. Externalisations that model the resulting theory of NLP entail fuzzy components. Fuzzy formalisms may exhibit inaccuracy and/or over productivity. A research strategy is developed, designed to investigate this aspect of the theory. The strategy includes two experimental hypotheses designed to test, 1) whether the model can process simple language interaction, and 2) the effect of fuzzy processes on such language interaction. Experimental design requires three implementations, each with progressive degrees of fuzziness in their processes. They are respectively named: Nonfuzz Babe, CorrBab and FuzzBabe. Nonfuzz Babe is used to test the first hypothesis and all three implementations are used to test the second hypothesis. A system description is presented for Nonfuzz Babe. Testing the first hypothesis provides results that show NonfuzzBabe is able to process simple language interaction. A system description for CorrBabe and FuzzBabe is presented. Testing the second hypothesis, provides results that show a positive correlation between degree of fuzzy processes and improved simple language performance. FuzzBabe's ability to process more complex language interaction is then investigated and model-intrinsic limitations are found. Research to overcome this problem is designed to illustrate the potential of externalisation of the theory and is conducted less rigorously than previous part of this investigation. Augmenting FuzzBabe to include fuzzy evaluation of non-pattern elements of interaction is hypothesised as a possible solution. The term FuzzyBaby was coined for augmented implementation. Results of a pilot study designed to measure FuzzyBaby's reading comprehension are given. Little research has been conducted that investigates NLP by the fuzzy processing of concrete patterns in language. Consequently, it is proposed that this research contributes to the intellectual disciplines of NLP and AI in general

Bournemouth University Research Online

An evolutionary algorithm approach to poetry generation

Author: Manurung Hisar
Publication venue: University of Edinburgh. College of Science and Engineering. School of Informatics.
Publication date: 01/07/2004
Field of study

Institute for Communicating and Collaborative SystemsPoetry is a unique artifact of the human language faculty, with its defining feature being a strong unity between content and form. Contrary to the opinion that the automatic generation of poetry is a relatively easy task, we argue that it is in fact an extremely difficult task that requires intelligence, world and linguistic knowledge, and creativity. We propose a model of poetry generation as a state space search problem, where a goal state is a text that satisfies the three properties of meaningfulness, grammaticality, and poeticness. We argue that almost all existing work on poetry generation only properly addresses a subset of these properties. In designing a computational approach for solving this problem, we draw upon the wealth of work in natural language generation (NLG). Although the emphasis of NLG research is on the generation of informative texts, recent work has highlighted the need for more flexible models which can be cast as one end of a spectrum of search sophistication, where the opposing end is the deterministically goal-directed planning of traditional NLG. We propose satisfying the properties of poetry through the application to NLG of evolutionary algorithms (EAs), a wellstudied heuristic search method. MCGONAGALL is our implemented instance of this approach. We use a linguistic representation based on Lexicalized Tree Adjoining Grammar (LTAG) that we argue is appropriate for EA-based NLG. Several genetic operators are implemented, ranging from baseline operators based on LTAG syntactic operations to heuristic semantic goal-directed operators. Two evaluation functions are implemented: one that measures the isomorphism between a solution’s stress pattern and a target metre using the edit distance algorithm, and one that measures the isomorphism between a solution’s propositional semantics and a target semantics using structural similarity metrics. We conducted an empirical study using MCGONAGALL to test the validity of employing EAs in solving the search problem, and to test whether our evaluation functions adequately capture the notions of semantic and metrical faithfulness. We conclude that our use of EAs offers an innovative approach to flexible NLG, as demonstrated by its successful application to the poetry generation task

Edinburgh Research Archive

Contextually-Dependent Lexical Semantics

Author: Verspoor Cornelia M
Publication venue: University of Edinburgh. College of Science and Engineering. School of Informatics.
Publication date: 01/12/1997
Field of study

Institute for Communicating and Collaborative SystemsThis thesis is an investigation of phenomena at the interface between syntax, semantics, and pragmatics, with the aim of arguing for a view of semantic interpretation as lexically driven yet contextually dependent. I examine regular, generative processes which operate over the lexicon to induce verbal sense shifts, and discuss the interaction of these processes with the linguistic or discourse context. I concentrate on phenomena where only an interaction between all three linguistic knowledge sources can explain the constraints on verb use: conventionalised lexical semantic knowledge constrains productive syntactic processes, while pragmatic reasoning is both constrained by and constrains the potential interpretations given to certain verbs. The phenomena which are closely examined are the behaviour of PP sentential modifiers (specifically dative and directional PPs) with respect to the lexical semantic representation of the verb phrases they modify, resultative constructions, and logical metonymy. The analysis is couched in terms of a lexical semantic representation drawing on Davis (1995), Jackendoff (1983, 1990), and Pustejovsky (1991, 1995) which aims to capture “linguistically relevant” components of meaning. The representation is shown to have utility for modeling of the interaction between the syntactic form of an utterance and its meaning. I introduce a formalisation of the representation within the framework of Head Driven Phrase Structure Grammar (Pollard and Sag 1994), and rely on the model of discourse coherence proposed by Lascarides and Asher (1992), Discourse in Commonsense Entailment. I furthermore discuss the implications of the contextual dependency of semantic interpretation for lexicon design and computational processing in Natural Language Understanding systems

Edinburgh Research Archive

Data-Oriented Language Processing: An Overview

Author: Bod L.W.M.
Publication venue: ILLC
Publication date: 01/01/1996
Field of study

International Migration, Integration and Social Cohesion online publications