Search CORE

583 research outputs found

Probabilistic parsing

Author: Nederhof Mark Jan
Satta Giorgio
Publication venue: Springer
Publication date: 06/01/2011
Field of study

Postprin

St Andrews Research Repository

Multiple Context-Free Tree Grammars: Lexicalization and Characterization

Author: Engelfriet Joost
Maletti Andreas
Maneth Sebastian
Publication venue
Publication date: 11/07/2017
Field of study

Multiple (simple) context-free tree grammars are investigated, where "simple" means "linear and nondeleting". Every multiple context-free tree grammar that is finitely ambiguous can be lexicalized; i.e., it can be transformed into an equivalent one (generating the same tree language) in which each rule of the grammar contains a lexical symbol. Due to this transformation, the rank of the nonterminals increases at most by 1, and the multiplicity (or fan-out) of the grammar increases at most by the maximal rank of the lexical symbols; in particular, the multiplicity does not increase when all lexical symbols have rank 0. Multiple context-free tree grammars have the same tree generating power as multi-component tree adjoining grammars (provided the latter can use a root-marker). Moreover, every multi-component tree adjoining grammar that is finitely ambiguous can be lexicalized. Multiple context-free tree grammars have the same string generating power as multiple context-free (string) grammars and polynomial time parsing algorithms. A tree language can be generated by a multiple context-free tree grammar if and only if it is the image of a regular tree language under a deterministic finite-copying macro tree transducer. Multiple context-free tree grammars can be used as a synchronous translation device.Comment: 78 pages, 13 figure

arXiv.org e-Print Archive

Leiden University Scholary Publications

Probabilistic Constraint Logic Programming

Author: Riezler Stefan
Publication venue
Publication date: 11/11/1997
Field of study

This paper addresses two central problems for probabilistic processing models: parameter estimation from incomplete data and efficient retrieval of most probable analyses. These questions have been answered satisfactorily only for probabilistic regular and context-free models. We address these problems for a more expressive probabilistic constraint logic programming model. We present a log-linear probability model for probabilistic constraint logic programming. On top of this model we define an algorithm to estimate the parameters and to select the properties of log-linear models from incomplete data. This algorithm is an extension of the improved iterative scaling algorithm of Della-Pietra, Della-Pietra, and Lafferty (1995). Our algorithm applies to log-linear models in general and is accompanied with suitable approximation methods when applied to large data spaces. Furthermore, we present an approach for searching for most probable analyses of the probabilistic constraint logic programming model. This method can be applied to the ambiguity resolution problem in natural language processing applications.Comment: 35 pages, uses sfbart.cl

arXiv.org e-Print Archive

CiteSeerX

Exploring the N-th Dimension of Language

Author: Mondal Prakash
Publication venue: Instituto Politécnico Nacional (IPN)
Publication date: 01/01/2010
Field of study

This paper is aimed at exploring the hidden fundamental\ud computational property of natural language that has been so elusive that it has made all attempts to characterize its real computational property ultimately fail. Earlier natural language was thought to be context-free. However, it was gradually realized that this does not hold much water given that a range of natural language phenomena have been found as being of non-context-free character that they have almost scuttled plans to brand natural language contextfree. So it has been suggested that natural language is mildly context-sensitive and to some extent context-free. In all, it seems that the issue over the exact computational property has not yet been solved. Against this background it will be proposed that this exact computational property of natural language is perhaps the N-th dimension of language, if what we mean by dimension is\ud nothing but universal (computational) property of natural language

CogPrints Cognitive Sciences Eprint Archive

An automata characterisation for multiple context-free languages

Author: AV Aho
D Scott
H Seki
I Guessarian
L Herrmann
MP Schützenberger
N Chomsky
Publication venue
Publication date: 23/09/2016
Field of study

We introduce tree stack automata as a new class of automata with storage and identify a restricted form of tree stack automata that recognises exactly the multiple context-free languages.Comment: This is an extended version of a paper with the same title accepted at the 20th International Conference on Developments in Language Theory (DLT 2016

arXiv.org e-Print Archive

Crossref

Parsing With Lexicalized Tree Adjoining Grammar

Author: Joshi Aravind K
Schabes Yves
Publication venue: ScholarlyCommons
Publication date: 01/02/1990
Field of study

Most current linguistic theories give lexical accounts of several phenomena that used to be considered purely syntactic. The information put in the lexicon is thereby increased in both amount and complexity: see, for example, lexical rules in LFG (Kaplan and Bresnan, 1983), GPSG (Gazdar, Klein, Pullum and Sag, 1985), HPSG (Pollard and Sag, 1987), Combinatory Categorial Grammars (Steedman, 1987), Karttunen\u27s version of Categorial Grammar (Karttunen 1986, 1988), some versions of GB theory (Chomsky 1981), and Lexicon-Grammars (Gross 1984). We would like to take into account this fact while defining a formalism. We therefore explore the view that syntactical rules are not separated from lexical items. We say that a grammar is lexicalized (Schabes, AbeilK and Joshi, 1988) if it consists of: (1) a finite set of structures each associated with lexical items; each lexical item will be called the anchor of the corresponding structure; the structures define the domain of locality over which constraints are specified; (2) an operation or operations for composing the structures. The notion of anchor is closely related to the word associated with a functor-argument category in Categorial Grammars. Categorial Grammar (as used for example by Steedman, 1987) are \u27lexicalized\u27 according to our definition since each basic category has a lexical item associated with it

ScholarlyCommons@Penn

Descriptional Succinctness of Some Grammatical Formalisms for Natrual Language

Author: Palis Michael A
Shende Sunil
Publication venue: ScholarlyCommons
Publication date: 25/10/1990
Field of study

We investigate the problem of describing languages compactly in different grammatical formalisms for natural languages. In particular, the problem is studied from the point of view of some newly developed natural language formalisms like linear control grammars (LCGs) and tree adjoining grammars (TAGs); these formalisms not only generate non-context-free languages that capture a wide variety of syntactic phenomena found in natural language, but also have computationally efficient polynomial time recognition algorithms. We prove that the formalisms enjoy the property of unbounded succinctness over the family of context-grammars, i.e. they are, in general, able to provide more compact representations of natural languages as compared to standard context-free grammars

ScholarlyCommons@Penn

Two characterisation results of multiple context-free grammars and their application to parsing

Author: Denkinger Tobias
Publication venue
Publication date: 20/02/2020
Field of study

In the first part of this thesis, a Chomsky-Schützenberger characterisation and an automaton characterisation of multiple context-free grammars are proved. Furthermore, a framework for approximation of automata with storage is described. The second part develops each of the three theoretical results into a parsing algorithm

Technische Universität Dresden: Qucosa

Syntactic phrase-based statistical machine translation

Author: Hassan Hany
Hearne Mary
Sima'an Khalil
Way Andy
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2006
Field of study

Phrase-based statistical machine translation (PBSMT) systems represent the dominant approach in MT today. However, unlike systems in other paradigms, it has proven difficult to date to incorporate syntactic knowledge in order to improve translation quality. This paper improves on recent research which uses 'syntactified' target language phrases, by incorporating supertags as constraints to better resolve parse tree fragments. In addition, we do not impose any sentence-length limit, and using a log-linear decoder, we outperform a state-of-the-art PBSMT system by over 1.3 BLEU points (or 3.51% relative) on the NIST 2003 Arabic-English test corpus

Crossref

Irish Universities

DCU Online Research Access Service

International Migration, Integration and Social Cohesion online publications