Search CORE

22 research outputs found

Combining semantic and syntactic structure for language modeling

Author: Bod Rens
Publication venue
Publication date: 01/01/2000
Field of study

Structured language models for speech recognition have been shown to remedy the weaknesses of n-gram models. All current structured language models are, however, limited in that they do not take into account dependencies between non-headwords. We show that non-headword dependencies contribute to significantly improved word error rate, and that a data-oriented parsing model trained on semantically and syntactically annotated data can exploit these dependencies. This paper also contains the first DOP model trained by means of a maximum likelihood reestimation procedure, which solves some of the theoretical shortcomings of previous DOP models.Comment: 4 page

arXiv.org e-Print Archive

CiteSeerX

International Migration, Integration and Social Cohesion online publications

Elimination of Spurious Ambiguity in Transition-Based Dependency Parsing

Author: Cohen Shay B.
Gómez-Rodríguez Carlos
Satta Giorgio
Publication venue
Publication date: 01/01/2012
Field of study

We present a novel technique to remove spurious ambiguity from transition systems for dependency parsing. Our technique chooses a canonical sequence of transition operations (computation) for a given dependency tree. Our technique can be applied to a large class of bottom-up transition systems, including for instance Nivre (2004) and Attardi (2006)

arXiv.org e-Print Archive

Edinburgh Research Explorer

Probabilistic parsing

Author: Nederhof Mark Jan
Satta Giorgio
Publication venue: Springer
Publication date: 06/01/2011
Field of study

Postprin

St Andrews Research Repository

Learning tree patterns for syntactic parsing

Author: Hócza András
Publication venue
Publication date: 01/01/2006
Field of study

This paper presents a method for parsing Hungarian texts using a machine learning approach. The method collects the initial grammar for a learner from an annotated corpus with the help of tree shapes. The PGS algorithm, an improved version of the RGLearn algorithm, was developed and applied to learning tree patterns with various phrase types described by regular expressions. The method also calculates the probability values of the learned tree patterns. The syntactic parser of learned grammar using the Viterbi algorithm performs a quick search for finding the most probable derivation of a sentence. The results were built into an information extraction pipeline

University of Szeged

Grasp: Randomised Semiring Parsing

Author: Aziz W.
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/10/2015
Field of study

We present a suite of algorithms for inference tasks over (finite and infinite) context-free sets. For generality and clarity, we have chosen the framework of semiring parsing with support to the most common semirings (e.g. Forest, Viterbi, k-best and Inside). We see parsing from the more general viewpoint of weighted deduction allowing for arbitrary weighted finite-state input and provide implementations of both bottom-up (CKY-inspired) and top-down (Earley-inspired) algorithms. We focus on approximate inference by Monte Carlo methods and provide implementations of ancestral sampling and slice sampling. In principle, sampling methods can deal with models whose independence assumptions are weaker than what is feasible by standard dynamic programming. We envision applications such as monolingual constituency parsing, synchronous parsing, context-free models of reordering for machine translation, and machine translation decoding

Crossref

Directory of Open Access Journals

International Migration, Integration and Social Cohesion online publications

UvA-DARE

Data-Oriented Language Processing. An Overview

Author: Bod Rens
Scha Remko
Publication venue
Publication date: 01/01/1996
Field of study

During the last few years, a new approach to language processing has started to emerge, which has become known under various labels such as "data-oriented parsing", "corpus-based interpretation", and "tree-bank grammar" (cf. van den Berg et al. 1994; Bod 1992-96; Bod et al. 1996a/b; Bonnema 1996; Charniak 1996a/b; Goodman 1996; Kaplan 1996; Rajman 1995a/b; Scha 1990-92; Sekine & Grishman 1995; Sima'an et al. 1994; Sima'an 1995-96; Tugwell 1995). This approach, which we will call "data-oriented processing" or "DOP", embodies the assumption that human language perception and production works with representations of concrete past language experiences, rather than with abstract linguistic rules. The models that instantiate this approach therefore maintain large corpora of linguistic representations of previously occurring utterances. When processing a new input utterance, analyses of this utterance are constructed by combining fragments from the corpus; the occurrence-frequencies of the fragments are used to estimate which analysis is the most probable one. In this paper we give an in-depth discussion of a data-oriented processing model which employs a corpus of labelled phrase-structure trees. Then we review some other models that instantiate the DOP approach. Many of these models also employ labelled phrase-structure trees, but use different criteria for extracting fragments from the corpus or employ different disambiguation strategies (Bod 1996b; Charniak 1996a/b; Goodman 1996; Rajman 1995a/b; Sekine & Grishman 1995; Sima'an 1995-96); other models use richer formalisms for their corpus annotations (van den Berg et al. 1994; Bod et al., 1996a/b; Bonnema 1996; Kaplan 1996; Tugwell 1995).Comment: 34 pages, Postscrip

arXiv.org e-Print Archive

CiteSeerX

A Distributed Inflection Model for Translating into Morphologically Rich Languages

Author: Bisazza A.
Monz C.
Tran K.
Publication venue: Association for Machine Translation in the Americas
Publication date: 01/01/2015
Field of study

International Migration, Integration and Social Cohesion online publications

UvA-DARE