371 research outputs found
A syntactic language model based on incremental CCG parsing
Syntactically-enriched language models (parsers) constitute a promising component in applications such as machine translation and speech-recognition. To maintain a useful level of accuracy, existing parsers are non-incremental and must span a combinatorially growing space of possible structures as every input word is processed. This prohibits their incorporation into standard linear-time decoders. In this paper, we present an incremental, linear-time dependency parser based on Combinatory Categorial Grammar (CCG) and classification techniques. We devise a deterministic transform of CCGbank canonical derivations into incremental ones, and train our parser on this data. We discover that a cascaded, incremental version provides an appealing balance between efficiency and accuracy
A syntactified direct translation model with linear-time decoding
Recent syntactic extensions of statistical translation models work with a synchronous context-free or tree-substitution grammar extracted from an automatically parsed parallel corpus. The decoders accompanying these extensions typically exceed quadratic time complexity. This paper extends the Direct Translation Model 2 (DTM2) with syntax while maintaining linear-time decoding. We employ a linear-time parsing algorithm based on an eager, incremental interpretation of Combinatory Categorial Grammar
(CCG). As every input word is processed, the local parsing decisions resolve ambiguity eagerly, by selecting a single
supertag–operator pair for extending the dependency parse incrementally. Alongside translation features extracted from
the derived parse tree, we explore syntactic features extracted from the incremental derivation process. Our empirical experiments show that our model significantly
outperforms the state-of-the art DTM2 system
Paracompositionality, MWEs and Argument Substitution
Multi-word expressions, verb-particle constructions, idiomatically combining
phrases, and phrasal idioms have something in common: not all of their elements
contribute to the argument structure of the predicate implicated by the
expression.
Radically lexicalized theories of grammar that avoid string-, term-, logical
form-, and tree-writing, and categorial grammars that avoid wrap operation,
make predictions about the categories involved in verb-particles and phrasal
idioms. They may require singleton types, which can only substitute for one
value, not just for one kind of value. These types are asymmetric: they can be
arguments only. They also narrowly constrain the kind of semantic value that
can correspond to such syntactic categories. Idiomatically combining phrases do
not subcategorize for singleton types, and they exploit another locally
computable and compositional property of a correspondence, that every syntactic
expression can project its head word. Such MWEs can be seen as empirically
realized categorial possibilities, rather than lacuna in a theory of
lexicalizable syntactic categories.Comment: accepted version (pre-final) for 23rd Formal Grammar Conference,
August 2018, Sofi
Lexicalized semi-incremental dependency parsing
Even leaving aside concerns of cognitive plausibility,
incremental parsing is appealing for applications such
as speech recognition and machine translation because
it could allow for incorporating syntactic features into
the decoding process without blowing up the search
space. Yet, incremental parsing is often associated
with greedy parsing decisions and intolerable loss of
accuracy. Would the use of lexicalized grammars provide
a new perspective on incremental parsing? In this paper we explore incremental left-to-right dependency parsing using a lexicalized grammatical formalism that works with lexical categories (supertags) and a small set of combinatory operators. A strictly incremental parser would conduct only a single pass over the input, use no lookahead and make only local decisions at every word. We show that such a parser suffers heavy loss of accuracy. Instead, we explore
the utility of a two-pass approach that incrementally
builds a dependency structure by first assigning a supertag
to every input word and then selecting an incremental
operator that allows assembling every supertag with the dependency structure built so-far to its left. We instantiate this idea in different models that allow
a trade-off between aspects of full incrementality
and performance, and explore the differences between
these models empirically. Our exploration shows that
a semi-incremental (two-pass), linear-time parser that
employs fixed and limited look-ahead exhibits an appealing
balance between the efficiency advantages of incrementality and the achieved accuracy. Surprisingly, taking local or global decisions matters very little for the accuracy of this linear-time parser. Such a parser fits seemlessly with the currently dominant finite-state decoders for machine translation
Shift-Reduce CCG Parsing with a Dependency Model
This paper presents the first dependency model for a shift-reduce CCG parser. Modelling dependencies is desirable for a number of reasons, including handling the “spurious ” ambiguity of CCG; fitting well with the theory of CCG; and optimizing for structures which are evaluated at test time. We develop a novel training technique using a dependency oracle, in which all derivations are hidden. A challenge arises from the fact that the oracle needs to keep track of exponentially many goldstandard derivations, which is solved by integrating a packed parse forest with the beam-search decoder. Standard CCGBank tests show the model achieves up to 1.05 labeled F-score improvements over three existing, competitive CCG parsing models
- …