4,674 research outputs found
Computation of distances for regular and context-free probabilistic languages
Several mathematical distances between probabilistic languages have been investigated in the literature, motivated by applications in language modeling, computational biology, syntactic pattern matching and machine learning. In most cases, only pairs of probabilistic regular languages were considered. In this paper we extend the previous results to pairs of languages generated by a probabilistic context-free grammar and a probabilistic finite automaton.PostprintPeer reviewe
Streaming Property Testing of Visibly Pushdown Languages
In the context of language recognition, we demonstrate the superiority of
streaming property testers against streaming algorithms and property testers,
when they are not combined. Initiated by Feigenbaum et al., a streaming
property tester is a streaming algorithm recognizing a language under the
property testing approximation: it must distinguish inputs of the language from
those that are -far from it, while using the smallest possible
memory (rather than limiting its number of input queries).
Our main result is a streaming -property tester for visibly
pushdown languages (VPL) with one-sided error using memory space
.
This constructions relies on a (non-streaming) property tester for weighted
regular languages based on a previous tester by Alon et al. We provide a simple
application of this tester for streaming testing special cases of instances of
VPL that are already hard for both streaming algorithms and property testers.
Our main algorithm is a combination of an original simulation of visibly
pushdown automata using a stack with small height but possible items of linear
size. In a second step, those items are replaced by small sketches. Those
sketches relies on a notion of suffix-sampling we introduce. This sampling is
the key idea connecting our streaming tester algorithm to property testers.Comment: 23 pages. Major modifications in the presentatio
Practical experiments with regular approximation of context-free languages
Several methods are discussed that construct a finite automaton given a
context-free grammar, including both methods that lead to subsets and those
that lead to supersets of the original context-free language. Some of these
methods of regular approximation are new, and some others are presented here in
a more refined form with respect to existing literature. Practical experiments
with the different methods of regular approximation are performed for
spoken-language input: hypotheses from a speech recognizer are filtered through
a finite automaton.Comment: 28 pages. To appear in Computational Linguistics 26(1), March 200
Speech Recognition by Composition of Weighted Finite Automata
We present a general framework based on weighted finite automata and weighted
finite-state transducers for describing and implementing speech recognizers.
The framework allows us to represent uniformly the information sources and data
structures used in recognition, including context-dependent units,
pronunciation dictionaries, language models and lattices. Furthermore, general
but efficient algorithms can used for combining information sources in actual
recognizers and for optimizing their application. In particular, a single
composition algorithm is used both to combine in advance information sources
such as language models and dictionaries, and to combine acoustic observations
and information sources dynamically during recognition.Comment: 24 pages, uses psfig.st
- âŠ