7,155 research outputs found
A unifying framework for seed sensitivity and its application to subset seeds
We propose a general approach to compute the seed sensitivity, that can be
applied to different definitions of seeds. It treats separately three
components of the seed sensitivity problem -- a set of target alignments, an
associated probability distribution, and a seed model -- that are specified by
distinct finite automata. The approach is then applied to a new concept of
subset seeds for which we propose an efficient automaton construction.
Experimental results confirm that sensitive subset seeds can be efficiently
designed using our approach, and can then be used in similarity search
producing better results than ordinary spaced seeds
XQuery Streaming by Forest Transducers
Streaming of XML transformations is a challenging task and only very few
systems support streaming. Research approaches generally define custom
fragments of XQuery and XPath that are amenable to streaming, and then design
custom algorithms for each fragment. These languages have several shortcomings.
Here we take a more principles approach to the problem of streaming
XQuery-based transformations. We start with an elegant transducer model for
which many static analysis problems are well-understood: the Macro Forest
Transducer (MFT). We show that a large fragment of XQuery can be translated
into MFTs --- indeed, a fragment of XQuery, that can express important features
that are missing from other XQuery stream engines, such as GCX: our fragment of
XQuery supports XPath predicates and let-statements. We then rely on a
streaming execution engine for MFTs, one which uses a well-founded set of
optimizations from functional programming, such as strictness analysis and
deforestation. Our prototype achieves time and memory efficiency comparable to
the fastest known engine for XQuery streaming, GCX. This is surprising because
our engine relies on the OCaml built in garbage collector and does not use any
specialized buffer management, while GCX's efficiency is due to clever and
explicit buffer management.Comment: Full version of the paper in the Proceedings of the 30th IEEE
International Conference on Data Engineering (ICDE 2014
Optimizing expected word error rate via sampling for speech recognition
State-level minimum Bayes risk (sMBR) training has become the de facto
standard for sequence-level training of speech recognition acoustic models. It
has an elegant formulation using the expectation semiring, and gives large
improvements in word error rate (WER) over models trained solely using
cross-entropy (CE) or connectionist temporal classification (CTC). sMBR
training optimizes the expected number of frames at which the reference and
hypothesized acoustic states differ. It may be preferable to optimize the
expected WER, but WER does not interact well with the expectation semiring, and
previous approaches based on computing expected WER exactly involve expanding
the lattices used during training. In this paper we show how to perform
optimization of the expected WER by sampling paths from the lattices used
during conventional sMBR training. The gradient of the expected WER is itself
an expectation, and so may be approximated using Monte Carlo sampling. We show
experimentally that optimizing WER during acoustic model training gives 5%
relative improvement in WER over a well-tuned sMBR baseline on a 2-channel
query recognition task (Google Home)
DFKI finite-state machine toolkit
Finite-state devices such as finite-state automata and finite-state transducers have been known since the emergence of computer science and are recently extensively used in many areas of language technology. The use of finite-state devices is mainly motivated by their time and space efficiency. In this paper we present the Finite-State Machine Toolkit for building, combining and optimizing the finite-state machines, developed at the Language Technology Lab of the German Research Center for Artificial Intelligence
Linear Bounded Composition of Tree-Walking Tree Transducers: Linear Size Increase and Complexity
Compositions of tree-walking tree transducers form a hierarchy with respect
to the number of transducers in the composition. As main technical result it is
proved that any such composition can be realized as a linear bounded
composition, which means that the sizes of the intermediate results can be
chosen to be at most linear in the size of the output tree. This has
consequences for the expressiveness and complexity of the translations in the
hierarchy. First, if the computed translation is a function of linear size
increase, i.e., the size of the output tree is at most linear in the size of
the input tree, then it can be realized by just one, deterministic,
tree-walking tree transducer. For compositions of deterministic transducers it
is decidable whether or not the translation is of linear size increase. Second,
every composition of deterministic transducers can be computed in deterministic
linear time on a RAM and in deterministic linear space on a Turing machine,
measured in the sum of the sizes of the input and output tree. Similarly, every
composition of nondeterministic transducers can be computed in simultaneous
polynomial time and linear space on a nondeterministic Turing machine. Their
output tree languages are deterministic context-sensitive, i.e., can be
recognized in deterministic linear space on a Turing machine. The membership
problem for compositions of nondeterministic translations is nondeterministic
polynomial time and deterministic linear space. The membership problem for the
composition of a nondeterministic and a deterministic tree-walking tree
translation (for a nondeterministic IO macro tree translation) is log-space
reducible to a context-free language, whereas the membership problem for the
composition of a deterministic and a nondeterministic tree-walking tree
translation (for a nondeterministic OI macro tree translation) is possibly
NP-complete
Automata-based adaptive behavior for economic modeling using game theory
In this paper, we deal with some specific domains of applications to game
theory. This is one of the major class of models in the new approaches of
modelling in the economic domain. For that, we use genetic automata which allow
to buid adaptive strategies for the players. We explain how the automata-based
formalism proposed - matrix representation of automata with multiplicities -
allows to define a semi-distance between the strategy behaviors. With that
tools, we are able to generate an automatic processus to compute emergent
systems of entities whose behaviors are represented by these genetic automata
- …