Search CORE

1,127 research outputs found

Learning Tree Adjoining Grammars from structures and strings

Author: Costa Florêncio C.
Publication venue
Publication date: 01/01/2012
Field of study

International Migration, Integration and Social Cohesion online publications

Data-Oriented Language Processing. An Overview

Author: Bod Rens
Scha Remko
Publication venue
Publication date: 01/01/1996
Field of study

During the last few years, a new approach to language processing has started to emerge, which has become known under various labels such as "data-oriented parsing", "corpus-based interpretation", and "tree-bank grammar" (cf. van den Berg et al. 1994; Bod 1992-96; Bod et al. 1996a/b; Bonnema 1996; Charniak 1996a/b; Goodman 1996; Kaplan 1996; Rajman 1995a/b; Scha 1990-92; Sekine & Grishman 1995; Sima'an et al. 1994; Sima'an 1995-96; Tugwell 1995). This approach, which we will call "data-oriented processing" or "DOP", embodies the assumption that human language perception and production works with representations of concrete past language experiences, rather than with abstract linguistic rules. The models that instantiate this approach therefore maintain large corpora of linguistic representations of previously occurring utterances. When processing a new input utterance, analyses of this utterance are constructed by combining fragments from the corpus; the occurrence-frequencies of the fragments are used to estimate which analysis is the most probable one. In this paper we give an in-depth discussion of a data-oriented processing model which employs a corpus of labelled phrase-structure trees. Then we review some other models that instantiate the DOP approach. Many of these models also employ labelled phrase-structure trees, but use different criteria for extracting fragments from the corpus or employ different disambiguation strategies (Bod 1996b; Charniak 1996a/b; Goodman 1996; Rajman 1995a/b; Sekine & Grishman 1995; Sima'an 1995-96); other models use richer formalisms for their corpus annotations (van den Berg et al. 1994; Bod et al., 1996a/b; Bonnema 1996; Kaplan 1996; Tugwell 1995).Comment: 34 pages, Postscrip

arXiv.org e-Print Archive

CiteSeerX

Probabilistic parsing

Author: Nederhof Mark Jan
Satta Giorgio
Publication venue: Springer
Publication date: 06/01/2011
Field of study

Postprin

St Andrews Research Repository

Viterbi Training for PCFGs: Hardness Results and Competitiveness of Uniform Initialization

Author: Cohen S. B.
Smith N. A.
Publication venue
Publication date: 01/01/2010
Field of study

We consider the search for a maximum likelihood assignment of hidden derivations and grammar weights for a probabilistic context-free grammar, the problem approximately solved by “Viterbi training.” We show that solving and even approximating Viterbi training for PCFGs is NP-hard. We motivate the use of uniformat-random initialization for Viterbi EM as an optimal initializer in absence of further information about the correct model parameters, providing an approximate bound on the log-likelihood.

CiteSeerX

Edinburgh Research Explorer

Recommended from our members

Strong Learning of Probabilistic Tree Adjoining Grammars

Author: Clark Alexander
Publication venue: ScholarWorks@UMass Amherst
Publication date: 01/01/2021
Field of study

ScholarWorks@UMass Amherst

Inducing Compact but Accurate Tree-Substitution Grammars

Author: Blunsom Phil
Cohn Trevor
Goldwater Sharon
Publication venue
Publication date: 01/01/2009
Field of study

Tree substitution grammars (TSGs) are a compelling alternative to context-free grammars for modelling syntax. However, many popular techniques for estimating weighted TSGs (under the moniker of Data Oriented Parsing) suffer from the problems of inconsistency and over-fitting. We present a theoretically principled model which solves these problems using a Bayesian non-parametric formulation. Our model learns compact and simple grammars, uncovering latent linguistic structures (e.g., verb subcategorisation), and in doing so far out-performs a standard PCFG.

CiteSeerX

Crossref

Edinburgh Research Explorer

Oxford University Research Archive

Polynomial Learnability and Locality of Formal Grammars

Author: Abe Naoki
Publication venue: ScholarlyCommons
Publication date: 01/01/1988
Field of study

We apply a complexity theoretic notion of feasible learnability called polynomial learnability to the evaluation of grammatical formalisms for linguistic description. We show that a novel, nontrivial constraint on the degree of locality of grammars allows not only context free languages but also a rich class of mildly context sensitive languages to be polynomially learnable. We discuss possible implications of this result to the theory of natural language acquisition

Crossref

ScholarlyCommons@Penn