23,179 research outputs found
Pushdown automata in statistical machine translation
This article describes the use of pushdown automata (PDA) in the context of statistical machine translation and alignment under a synchronous context-free grammar. We use PDAs to compactly represent the space of candidate translations generated by the grammar when applied to an input sentence. General-purpose PDA algorithms for replacement, composition, shortest path, and expansion are presented. We describe HiPDT, a hierarchical phrase-based decoder using the PDA representation and these algorithms. We contrast the complexity of this decoder with a decoder based on a finite state automata representation, showing that PDAs provide a more suitable framework to achieve exact decoding for larger synchronous context-free grammars and smaller language models. We assess this experimentally on a large-scale Chinese-to-English alignment and translation task. In translation, we propose a two-pass decoding strategy involving a weaker language model in the first-pass to address the results of PDA complexity analysis. We study in depth the experimental conditions and tradeoffs in which HiPDT can achieve state-of-the-art performance for large-scale SMT. </jats:p
DNA ANALYSIS USING GRAMMATICAL INFERENCE
An accurate language definition capable of distinguishing between coding and non-coding DNA has important applications and analytical significance to the field of computational biology. The method proposed here uses positive sample grammatical inference and statistical information to infer languages for coding DNA.
An algorithm is proposed for the searching of an optimal subset of input sequences for the inference of regular grammars by optimizing a relevant accuracy metric. The algorithm does not guarantee the finding of the optimal subset; however, testing shows improvement in accuracy and performance over the basis algorithm.
Testing shows that the accuracy of inferred languages for components of DNA are consistently accurate. By using the proposed algorithm languages are inferred for coding DNA with average conditional probability over 80%. This reveals that languages for components of DNA can be inferred and are useful independent of the process that created them. These languages can then be analyzed or used for other tasks in computational biology.
To illustrate potential applications of regular grammars for DNA components, an inferred language for exon sequences is applied as post processing to Hidden Markov exon prediction to reduce the number of wrong exons detected and improve the specificity of the model significantly
Practical experiments with regular approximation of context-free languages
Several methods are discussed that construct a finite automaton given a
context-free grammar, including both methods that lead to subsets and those
that lead to supersets of the original context-free language. Some of these
methods of regular approximation are new, and some others are presented here in
a more refined form with respect to existing literature. Practical experiments
with the different methods of regular approximation are performed for
spoken-language input: hypotheses from a speech recognizer are filtered through
a finite automaton.Comment: 28 pages. To appear in Computational Linguistics 26(1), March 200
Boolean Circuit Complexity of Regular Languages
In this paper we define a new descriptional complexity measure for
Deterministic Finite Automata, BC-complexity, as an alternative to the state
complexity. We prove that for two DFAs with the same number of states
BC-complexity can differ exponentially. In some cases minimization of DFA can
lead to an exponential increase in BC-complexity, on the other hand
BC-complexity of DFAs with a large state space which are obtained by some
standard constructions (determinization of NFA, language operations), is
reasonably small. But our main result is the analogue of the "Shannon effect"
for finite automata: almost all DFAs with a fixed number of states have
BC-complexity that is close to the maximum.Comment: In Proceedings AFL 2014, arXiv:1405.527
Finite-state Strategies in Delay Games (full version)
What is a finite-state strategy in a delay game? We answer this surprisingly
non-trivial question by presenting a very general framework that allows to
remove delay: finite-state strategies exist for all winning conditions where
the resulting delay-free game admits a finite-state strategy. The framework is
applicable to games whose winning condition is recognized by an automaton with
an acceptance condition that satisfies a certain aggregation property. Our
framework also yields upper bounds on the complexity of determining the winner
of such delay games and upper bounds on the necessary lookahead to win the
game. In particular, we cover all previous results of that kind as special
cases of our uniform approach
Automata theory in nominal sets
We study languages over infinite alphabets equipped with some structure that
can be tested by recognizing automata. We develop a framework for studying such
alphabets and the ensuing automata theory, where the key role is played by an
automorphism group of the alphabet. In the process, we generalize nominal sets
due to Gabbay and Pitts
Exact and Approximate Determinization of Discounted-Sum Automata
A discounted-sum automaton (NDA) is a nondeterministic finite automaton with
edge weights, valuing a run by the discounted sum of visited edge weights. More
precisely, the weight in the i-th position of the run is divided by
, where the discount factor is a fixed rational number
greater than 1. The value of a word is the minimal value of the automaton runs
on it. Discounted summation is a common and useful measuring scheme, especially
for infinite sequences, reflecting the assumption that earlier weights are more
important than later weights. Unfortunately, determinization of NDAs, which is
often essential in formal verification, is, in general, not possible. We
provide positive news, showing that every NDA with an integral discount factor
is determinizable. We complete the picture by proving that the integers
characterize exactly the discount factors that guarantee determinizability: for
every nonintegral rational discount factor , there is a
nondeterminizable -NDA. We also prove that the class of NDAs with
integral discount factors enjoys closure under the algebraic operations min,
max, addition, and subtraction, which is not the case for general NDAs nor for
deterministic NDAs. For general NDAs, we look into approximate determinization,
which is always possible as the influence of a word's suffix decays. We show
that the naive approach, of unfolding the automaton computations up to a
sufficient level, is doubly exponential in the discount factor. We provide an
alternative construction for approximate determinization, which is singly
exponential in the discount factor, in the precision, and in the number of
states. We also prove matching lower bounds, showing that the exponential
dependency on each of these three parameters cannot be avoided. All our results
hold equally for automata over finite words and for automata over infinite
words
- …