1,107 research outputs found
Latent Tree Language Model
In this paper we introduce Latent Tree Language Model (LTLM), a novel
approach to language modeling that encodes syntax and semantics of a given
sentence as a tree of word roles.
The learning phase iteratively updates the trees by moving nodes according to
Gibbs sampling. We introduce two algorithms to infer a tree for a given
sentence. The first one is based on Gibbs sampling. It is fast, but does not
guarantee to find the most probable tree. The second one is based on dynamic
programming. It is slower, but guarantees to find the most probable tree. We
provide comparison of both algorithms.
We combine LTLM with 4-gram Modified Kneser-Ney language model via linear
interpolation. Our experiments with English and Czech corpora show significant
perplexity reductions (up to 46% for English and 49% for Czech) compared with
standalone 4-gram Modified Kneser-Ney language model.Comment: Accepted to EMNLP 201
Deciding the Borel complexity of regular tree languages
We show that it is decidable whether a given a regular tree language belongs
to the class of the Borel hierarchy, or equivalently whether
the Wadge degree of a regular tree language is countable.Comment: 15 pages, 2 figure
Bottom Up Quotients and Residuals for Tree Languages
In this paper, we extend the notion of tree language quotients to bottom-up
quotients. Instead of computing the residual of a tree language from top to
bottom and producing a list of tree languages, we show how to compute a set of
k-ary trees, where k is an arbitrary integer. We define the quotient formula
for different combinations of tree languages: union, symbol products,
compositions, iterated symbol products and iterated composition. These
computations lead to the definition of the bottom-up quotient tree automaton,
that turns out to be the minimal deterministic tree automaton associated with a
regular tree language in the case of the 0-ary trees
Multiple Context-Free Tree Grammars: Lexicalization and Characterization
Multiple (simple) context-free tree grammars are investigated, where "simple"
means "linear and nondeleting". Every multiple context-free tree grammar that
is finitely ambiguous can be lexicalized; i.e., it can be transformed into an
equivalent one (generating the same tree language) in which each rule of the
grammar contains a lexical symbol. Due to this transformation, the rank of the
nonterminals increases at most by 1, and the multiplicity (or fan-out) of the
grammar increases at most by the maximal rank of the lexical symbols; in
particular, the multiplicity does not increase when all lexical symbols have
rank 0. Multiple context-free tree grammars have the same tree generating power
as multi-component tree adjoining grammars (provided the latter can use a
root-marker). Moreover, every multi-component tree adjoining grammar that is
finitely ambiguous can be lexicalized. Multiple context-free tree grammars have
the same string generating power as multiple context-free (string) grammars and
polynomial time parsing algorithms. A tree language can be generated by a
multiple context-free tree grammar if and only if it is the image of a regular
tree language under a deterministic finite-copying macro tree transducer.
Multiple context-free tree grammars can be used as a synchronous translation
device.Comment: 78 pages, 13 figure
Construction of rational expression from tree automata using a generalization of Arden's Lemma
Arden's Lemma is a classical result in language theory allowing the
computation of a rational expression denoting the language recognized by a
finite string automaton. In this paper we generalize this important lemma to
the rational tree languages. Moreover, we propose also a construction of a
rational tree expression which denotes the accepted tree language of a finite
tree automaton
The Wadge Hierarchy of Deterministic Tree Languages
We provide a complete description of the Wadge hierarchy for
deterministically recognisable sets of infinite trees. In particular we give an
elementary procedure to decide if one deterministic tree language is
continuously reducible to another. This extends Wagner's results on the
hierarchy of omega-regular languages of words to the case of trees.Comment: 44 pages, 8 figures; extended abstract presented at ICALP 2006,
Venice, Italy; full version appears in LMCS special issu
- …