Search CORE

1,402 research outputs found

A syntactified direct translation model with linear-time decoding

Author: Hassan Hany
Sima'an Khalil
Way Andy
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2009
Field of study

Recent syntactic extensions of statistical translation models work with a synchronous context-free or tree-substitution grammar extracted from an automatically parsed parallel corpus. The decoders accompanying these extensions typically exceed quadratic time complexity. This paper extends the Direct Translation Model 2 (DTM2) with syntax while maintaining linear-time decoding. We employ a linear-time parsing algorithm based on an eager, incremental interpretation of Combinatory Categorial Grammar (CCG). As every input word is processed, the local parsing decisions resolve ambiguity eagerly, by selecting a single supertag–operator pair for extending the dependency parse incrementally. Alongside translation features extracted from the derived parse tree, we explore syntactic features extracted from the incremental derivation process. Our empirical experiments show that our model significantly outperforms the state-of-the art DTM2 system

Crossref

Irish Universities

DCU Online Research Access Service

International Migration, Integration and Social Cohesion online publications

Lexicalized semi-incremental dependency parsing

Author: Hassan Hany
Sima'an Khalil
Way Andy
Publication venue
Publication date: 01/01/2009
Field of study

Even leaving aside concerns of cognitive plausibility, incremental parsing is appealing for applications such as speech recognition and machine translation because it could allow for incorporating syntactic features into the decoding process without blowing up the search space. Yet, incremental parsing is often associated with greedy parsing decisions and intolerable loss of accuracy. Would the use of lexicalized grammars provide a new perspective on incremental parsing? In this paper we explore incremental left-to-right dependency parsing using a lexicalized grammatical formalism that works with lexical categories (supertags) and a small set of combinatory operators. A strictly incremental parser would conduct only a single pass over the input, use no lookahead and make only local decisions at every word. We show that such a parser suffers heavy loss of accuracy. Instead, we explore the utility of a two-pass approach that incrementally builds a dependency structure by first assigning a supertag to every input word and then selecting an incremental operator that allows assembling every supertag with the dependency structure built so-far to its left. We instantiate this idea in different models that allow a trade-off between aspects of full incrementality and performance, and explore the differences between these models empirically. Our exploration shows that a semi-incremental (two-pass), linear-time parser that employs fixed and limited look-ahead exhibits an appealing balance between the efficiency advantages of incrementality and the achieved accuracy. Surprisingly, taking local or global decisions matters very little for the accuracy of this linear-time parser. Such a parser fits seemlessly with the currently dominant finite-state decoders for machine translation

Irish Universities

DCU Online Research Access Service

UvA-DARE

International Migration, Integration and Social Cohesion online publications

A syntactic language model based on incremental CCG parsing

Author: Hassan Hany
Sima'an Khalil
Way Andy
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2008
Field of study

Syntactically-enriched language models (parsers) constitute a promising component in applications such as machine translation and speech-recognition. To maintain a useful level of accuracy, existing parsers are non-incremental and must span a combinatorially growing space of possible structures as every input word is processed. This prohibits their incorporation into standard linear-time decoders. In this paper, we present an incremental, linear-time dependency parser based on Combinatory Categorial Grammar (CCG) and classification techniques. We devise a deterministic transform of CCGbank canonical derivations into incremental ones, and train our parser on this data. We discover that a cascaded, incremental version provides an appealing balance between efficiency and accuracy

Crossref

Irish Universities

DCU Online Research Access Service

International Migration, Integration and Social Cohesion online publications

Graph Interpolation Grammars: a Rule-based Approach to the Incremental Parsing of Natural Languages

Author: Larcheveque John
Publication venue
Publication date: 01/01/1998
Field of study

Graph Interpolation Grammars are a declarative formalism with an operational semantics. Their goal is to emulate salient features of the human parser, and notably incrementality. The parsing process defined by GIGs incrementally builds a syntactic representation of a sentence as each successive lexeme is read. A GIG rule specifies a set of parse configurations that trigger its application and an operation to perform on a matching configuration. Rules are partly context-sensitive; furthermore, they are reversible, meaning that their operations can be undone, which allows the parsing process to be nondeterministic. These two factors confer enough expressive power to the formalism for parsing natural languages.Comment: 41 pages, Postscript onl

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Hal-Diderot

On empirical cumulant generating functions of code lengths for individual sequences

Author: Merhav Neri
Publication venue
Publication date: 04/05/2016
Field of study

We consider the problem of lossless compression of individual sequences using finite-state (FS) machines, from the perspective of the best achievable empirical cumulant generating function (CGF) of the code length, i.e., the normalized logarithm of the empirical average of the exponentiated code length. Since the probabilistic CGF is minimized in terms of the R\'enyi entropy of the source, one of the motivations of this study is to derive an individual-sequence analogue of the R\'enyi entropy, in the same way that the FS compressibility is the individual-sequence counterpart of the Shannon entropy. We consider the CGF of the code-length both from the perspective of fixed-to-variable (F-V) length coding and the perspective of variable-to-variable (V-V) length coding, where the latter turns out to yield a better result, that coincides with the FS compressibility. We also extend our results to compression with side information, available at both the encoder and decoder. In this case, the V-V version no longer coincides with the FS compressibility, but results in a different complexity measure.Comment: 15 pages; submitted for publicatio

arXiv.org e-Print Archive

Crossref

Universal Prediction of Individual Sequences

Author: M. Feder
M. Gutman
N. Mehrav
Publication venue
Publication date
Field of study

Research Papers in Economics

Graph Interpolation Grammars as Context-Free Automata

Author: Larcheveque John
Publication venue
Publication date: 01/01/1998
Field of study

A derivation step in a Graph Interpolation Grammar has the effect of scanning an input token. This feature, which aims at emulating the incrementality of the natural parser, restricts the formal power of GIGs. This contrasts with the fact that the derivation mechanism involves a context-sensitive device similar to tree adjunction in TAGs. The combined effect of input-driven derivation and restricted context-sensitiveness would be conceivably unfortunate if it turned out that Graph Interpolation Languages did not subsume Context Free Languages while being partially context-sensitive. This report sets about examining relations between CFGs and GIGs, and shows that GILs are a proper superclass of CFLs. It also brings out a strong equivalence between CFGs and GIGs for the class of CFLs. Thus, it lays the basis for meaningfully investigating the amount of context-sensitiveness supported by GIGs, but leaves this investigation for further research

arXiv.org e-Print Archive

CiteSeerX

INRIA a CCSD electronic archive server

Hal-Diderot