Search CORE

1,495 research outputs found

Data-Oriented Language Processing. An Overview

Author: Bod Rens
Scha Remko
Publication venue
Publication date: 01/01/1996
Field of study

During the last few years, a new approach to language processing has started to emerge, which has become known under various labels such as "data-oriented parsing", "corpus-based interpretation", and "tree-bank grammar" (cf. van den Berg et al. 1994; Bod 1992-96; Bod et al. 1996a/b; Bonnema 1996; Charniak 1996a/b; Goodman 1996; Kaplan 1996; Rajman 1995a/b; Scha 1990-92; Sekine & Grishman 1995; Sima'an et al. 1994; Sima'an 1995-96; Tugwell 1995). This approach, which we will call "data-oriented processing" or "DOP", embodies the assumption that human language perception and production works with representations of concrete past language experiences, rather than with abstract linguistic rules. The models that instantiate this approach therefore maintain large corpora of linguistic representations of previously occurring utterances. When processing a new input utterance, analyses of this utterance are constructed by combining fragments from the corpus; the occurrence-frequencies of the fragments are used to estimate which analysis is the most probable one. In this paper we give an in-depth discussion of a data-oriented processing model which employs a corpus of labelled phrase-structure trees. Then we review some other models that instantiate the DOP approach. Many of these models also employ labelled phrase-structure trees, but use different criteria for extracting fragments from the corpus or employ different disambiguation strategies (Bod 1996b; Charniak 1996a/b; Goodman 1996; Rajman 1995a/b; Sekine & Grishman 1995; Sima'an 1995-96); other models use richer formalisms for their corpus annotations (van den Berg et al. 1994; Bod et al., 1996a/b; Bonnema 1996; Kaplan 1996; Tugwell 1995).Comment: 34 pages, Postscrip

arXiv.org e-Print Archive

CiteSeerX

Automata Tutor v3

Author: D'Antoni Loris
Helfrich Martin
Kretinsky Jan
Ramneantu Emanuel
Weininger Maximilian
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 14/05/2020
Field of study

Computer science class enrollments have rapidly risen in the past decade. With current class sizes, standard approaches to grading and providing personalized feedback are no longer possible and new techniques become both feasible and necessary. In this paper, we present the third version of Automata Tutor, a tool for helping teachers and students in large courses on automata and formal languages. The second version of Automata Tutor supported automatic grading and feedback for finite-automata constructions and has already been used by thousands of users in dozens of countries. This new version of Automata Tutor supports automated grading and feedback generation for a greatly extended variety of new problems, including problems that ask students to create regular expressions, context-free grammars, pushdown automata and Turing machines corresponding to a given description, and problems about converting between equivalent models - e.g., from regular expressions to nondeterministic finite automata. Moreover, for several problems, this new version also enables teachers and students to automatically generate new problem instances. We also present the results of a survey run on a class of 950 students, which shows very positive results about the usability and usefulness of the tool

arXiv.org e-Print Archive

Crossref

Combining semantic and syntactic structure for language modeling

Author: Bod Rens
Publication venue
Publication date: 01/01/2000
Field of study

Structured language models for speech recognition have been shown to remedy the weaknesses of n-gram models. All current structured language models are, however, limited in that they do not take into account dependencies between non-headwords. We show that non-headword dependencies contribute to significantly improved word error rate, and that a data-oriented parsing model trained on semantically and syntactically annotated data can exploit these dependencies. This paper also contains the first DOP model trained by means of a maximum likelihood reestimation procedure, which solves some of the theoretical shortcomings of previous DOP models.Comment: 4 page

arXiv.org e-Print Archive

CiteSeerX

International Migration, Integration and Social Cohesion online publications

An improved parser for data-oriented lexical-functional analysis

Author: Bod Rens
Publication venue
Publication date: 01/01/2000
Field of study

We present an LFG-DOP parser which uses fragments from LFG-annotated sentences to parse new sentences. Experiments with the Verbmobil and Homecentre corpora show that (1) Viterbi n best search performs about 100 times faster than Monte Carlo search while both achieve the same accuracy; (2) the DOP hypothesis which states that parse accuracy increases with increasing fragment size is confirmed for LFG-DOP; (3) LFG-DOP's relative frequency estimator performs worse than a discounted frequency estimator; and (4) LFG-DOP significantly outperforms Tree-DOP is evaluated on tree structures only.Comment: 8 page

arXiv.org e-Print Archive

CiteSeerX

International Migration, Integration and Social Cohesion online publications

Streaming algorithms for language recognition problems

Author: Babu Ajesh
Limaye Nutan
Radhakrishnan Jaikumar
Varma Girish
Publication venue
Publication date: 01/01/2011
Field of study

We study the complexity of the following problems in the streaming model. Membership testing for \DLIN We show that every language in \DLIN\ can be recognised by a randomized one-pass

O(\log n)

space algorithm with inverse polynomial one-sided error, and by a deterministic p-pass

O(n/p)

space algorithm. We show that these algorithms are optimal. Membership testing for \LL

(k)

For languages generated by \LL

(k)

grammars with a bound of

r

on the number of nonterminals at any stage in the left-most derivation, we show that membership can be tested by a randomized one-pass

O(r\log n)

space algorithm with inverse polynomial (in

n

) one-sided error. Membership testing for \DCFL We show that randomized algorithms as efficient as the ones described above for \DLIN\ and \LL(k) (which are subclasses of \DCFL) cannot exist for all of \DCFL: there is a language in \VPL\ (a subclass of \DCFL) for which any randomized p-pass algorithm with error bounded by

\epsilon < 1/2

must use

\Omega(n/p)

space. Degree sequence problem We study the problem of determining, given a sequence

d_1, d_2,..., d_n

and a graph

G

, whether the degree sequence of

G

is precisely

d_1, d_2,..., d_n

. We give a randomized one-pass

O(\log n)

space algorithm with inverse polynomial one-sided error probability. We show that our algorithms are optimal. Our randomized algorithms are based on the recent work of Magniez et al. \cite{MMN09}; our lower bounds are obtained by considering related communication complexity problems

arXiv.org e-Print Archive

Discontinuity and asymmetry in phrase structure grammars

Author: Matthews G.H.
Publication venue: Published by Elsevier Inc.
Publication date: 30/06/1963
Field of study

Elsevier - Publisher Connector

Compressed and Practical Data Structures for Strings

Author: Christiansen Anders Roy
Publication venue: DTU Compute
Publication date: 01/01/2018
Field of study

Online Research Database In Technology