18,480 research outputs found
Rational stochastic languages
The goal of the present paper is to provide a systematic and comprehensive
study of rational stochastic languages over a semiring K \in {Q, Q +, R, R+}. A
rational stochastic language is a probability distribution over a free monoid
\Sigma^* which is rational over K, that is which can be generated by a
multiplicity automata with parameters in K. We study the relations between the
classes of rational stochastic languages S rat K (\Sigma). We define the notion
of residual of a stochastic language and we use it to investigate properties of
several subclasses of rational stochastic languages. Lastly, we study the
representation of rational stochastic languages by means of multiplicity
automata.Comment: 35 page
Learning rational stochastic languages
15 pagesGiven a finite set of words w1,...,wn independently drawn according to a fixed unknown distribution law P called a stochastic language, an usual goal in Grammatical Inference is to infer an estimate of P in some class of probabilistic models, such as Probabilistic Automata (PA). Here, we study the class of rational stochastic languages, which consists in stochastic languages that can be generated by Multiplicity Automata (MA) and which strictly includes the class of stochastic languages generated by PA. Rational stochastic languages have minimal normal representation which may be very concise, and whose parameters can be efficiently estimated from stochastic samples. We design an efficient inference algorithm DEES which aims at building a minimal normal representation of the target. Despite the fact that no recursively enumerable class of MA computes exactly the set of rational stochastic languages over Q, we show that DEES strongly identifies tis set in the limit. We study the intermediary MA output by DEES and show that they compute rational series which converge absolutely to one and which can be used to provide stochastic languages which closely estimate the target
Relevant Representations for the Inference of Rational Stochastic Tree Languages
International audienceRecently, an algorithm, DEES, was proposed for learning rational stochastic tree languages. Given an independantly and identically distributed sample of trees, drawn according to a rational stochastic language, DEES outputs a linear representation of a rational series which converges to the target. DEES can then be used to identify in the limit with probability one rational stochastic tree languages. However, when DEES deals with finite samples, it often outputs a rational tree series which does not define a stochastic language. Moreover, the linear representation can not be directly used as a generative model. In this paper, we show that any representation of a rational stochastic tree language can be transformed in a reduced normalised representation that can be used to generate trees from the underlying distribution. We also study some properties of consistency for rational stochastic tree languages and discuss their implication for the inference. We finally consider the applicability of DEES to trees built over an unranked alphabet
Parametrized Stochastic Grammars for RNA Secondary Structure Prediction
We propose a two-level stochastic context-free grammar (SCFG) architecture
for parametrized stochastic modeling of a family of RNA sequences, including
their secondary structure. A stochastic model of this type can be used for
maximum a posteriori estimation of the secondary structure of any new sequence
in the family. The proposed SCFG architecture models RNA subsequences
comprising paired bases as stochastically weighted Dyck-language words, i.e.,
as weighted balanced-parenthesis expressions. The length of each run of
unpaired bases, forming a loop or a bulge, is taken to have a phase-type
distribution: that of the hitting time in a finite-state Markov chain. Without
loss of generality, each such Markov chain can be taken to have a bounded
complexity. The scheme yields an overall family SCFG with a manageable number
of parameters.Comment: 5 pages, submitted to the 2007 Information Theory and Applications
Workshop (ITA 2007
Calibrating Generative Models: The Probabilistic Chomsky-Schützenberger Hierarchy
A probabilistic Chomsky–Schützenberger hierarchy of grammars is introduced and studied, with the aim of understanding the expressive power of generative models. We offer characterizations of the distributions definable at each level of the hierarchy, including probabilistic regular, context-free, (linear) indexed, context-sensitive, and unrestricted grammars, each corresponding to familiar probabilistic machine classes. Special attention is given to distributions on (unary notations for) positive integers. Unlike in the classical case where the "semi-linear" languages all collapse into the regular languages, using analytic tools adapted from the classical setting we show there is no collapse in the probabilistic hierarchy: more distributions become definable at each level. We also address related issues such as closure under probabilistic conditioning
Computable de Finetti measures
We prove a computable version of de Finetti's theorem on exchangeable
sequences of real random variables. As a consequence, exchangeable stochastic
processes expressed in probabilistic functional programming languages can be
automatically rewritten as procedures that do not modify non-local state. Along
the way, we prove that a distribution on the unit interval is computable if and
only if its moments are uniformly computable.Comment: 32 pages. Final journal version; expanded somewhat, with minor
corrections. To appear in Annals of Pure and Applied Logic. Extended abstract
appeared in Proceedings of CiE '09, LNCS 5635, pp. 218-23
Criticality in Formal Languages and Statistical Physics
We show that the mutual information between two symbols, as a function of the
number of symbols between the two, decays exponentially in any probabilistic
regular grammar, but can decay like a power law for a context-free grammar.
This result about formal languages is closely related to a well-known result in
classical statistical mechanics that there are no phase transitions in
dimensions fewer than two. It is also related to the emergence of power-law
correlations in turbulence and cosmological inflation through recursive
generative processes. We elucidate these physics connections and comment on
potential applications of our results to machine learning tasks like training
artificial recurrent neural networks. Along the way, we introduce a useful
quantity which we dub the rational mutual information and discuss
generalizations of our claims involving more complicated Bayesian networks.Comment: Replaced to match final published version. Discussion improved,
references adde
- …