30,596 research outputs found
Streaming Property Testing of Visibly Pushdown Languages
In the context of language recognition, we demonstrate the superiority of
streaming property testers against streaming algorithms and property testers,
when they are not combined. Initiated by Feigenbaum et al., a streaming
property tester is a streaming algorithm recognizing a language under the
property testing approximation: it must distinguish inputs of the language from
those that are -far from it, while using the smallest possible
memory (rather than limiting its number of input queries).
Our main result is a streaming -property tester for visibly
pushdown languages (VPL) with one-sided error using memory space
.
This constructions relies on a (non-streaming) property tester for weighted
regular languages based on a previous tester by Alon et al. We provide a simple
application of this tester for streaming testing special cases of instances of
VPL that are already hard for both streaming algorithms and property testers.
Our main algorithm is a combination of an original simulation of visibly
pushdown automata using a stack with small height but possible items of linear
size. In a second step, those items are replaced by small sketches. Those
sketches relies on a notion of suffix-sampling we introduce. This sampling is
the key idea connecting our streaming tester algorithm to property testers.Comment: 23 pages. Major modifications in the presentatio
Uniform Random Sampling of Traces in Very Large Models
This paper presents some first results on how to perform uniform random walks
(where every trace has the same probability to occur) in very large models. The
models considered here are described in a succinct way as a set of
communicating reactive modules. The method relies upon techniques for counting
and drawing uniformly at random words in regular languages. Each module is
considered as an automaton defining such a language. It is shown how it is
possible to combine local uniform drawings of traces, and to obtain some global
uniform random sampling, without construction of the global model
Flexible RNA design under structure and sequence constraints using formal languages
The problem of RNA secondary structure design (also called inverse folding)
is the following: given a target secondary structure, one aims to create a
sequence that folds into, or is compatible with, a given structure. In several
practical applications in biology, additional constraints must be taken into
account, such as the presence/absence of regulatory motifs, either at a
specific location or anywhere in the sequence. In this study, we investigate
the design of RNA sequences from their targeted secondary structure, given
these additional sequence constraints. To this purpose, we develop a general
framework based on concepts of language theory, namely context-free grammars
and finite automata. We efficiently combine a comprehensive set of constraints
into a unifying context-free grammar of moderate size. From there, we use
generic generic algorithms to perform a (weighted) random generation, or an
exhaustive enumeration, of candidate sequences. The resulting method, whose
complexity scales linearly with the length of the RNA, was implemented as a
standalone program. The resulting software was embedded into a publicly
available dedicated web server. The applicability demonstrated of the method on
a concrete case study dedicated to Exon Splicing Enhancers, in which our
approach was successfully used in the design of \emph{in vitro} experiments.Comment: ACM BCB 2013 - ACM Conference on Bioinformatics, Computational
Biology and Biomedical Informatics (2013
Multi-dimensional Boltzmann Sampling of Languages
This paper addresses the uniform random generation of words from a
context-free language (over an alphabet of size ), while constraining every
letter to a targeted frequency of occurrence. Our approach consists in a
multidimensional extension of Boltzmann samplers \cite{Duchon2004}. We show
that, under mostly \emph{strong-connectivity} hypotheses, our samplers return a
word of size in and exact frequency in
expected time. Moreover, if we accept tolerance
intervals of width in for the number of occurrences of each
letters, our samplers perform an approximate-size generation of words in
expected time. We illustrate these techniques on the
generation of Tetris tessellations with uniform statistics in the different
types of tetraminoes.Comment: 12p
Linear Distances between Markov Chains
We introduce a general class of distances (metrics) between Markov chains,
which are based on linear behaviour. This class encompasses distances given
topologically (such as the total variation distance or trace distance) as well
as by temporal logics or automata. We investigate which of the distances can be
approximated by observing the systems, i.e. by black-box testing or simulation,
and we provide both negative and positive results
Sampling Geometric Inhomogeneous Random Graphs in Linear Time
Real-world networks, like social networks or the internet infrastructure,
have structural properties such as large clustering coefficients that can best
be described in terms of an underlying geometry. This is why the focus of the
literature on theoretical models for real-world networks shifted from classic
models without geometry, such as Chung-Lu random graphs, to modern
geometry-based models, such as hyperbolic random graphs.
With this paper we contribute to the theoretical analysis of these modern,
more realistic random graph models. Instead of studying directly hyperbolic
random graphs, we use a generalization that we call geometric inhomogeneous
random graphs (GIRGs). Since we ignore constant factors in the edge
probabilities, GIRGs are technically simpler (specifically, we avoid hyperbolic
cosines), while preserving the qualitative behaviour of hyperbolic random
graphs, and we suggest to replace hyperbolic random graphs by this new model in
future theoretical studies.
We prove the following fundamental structural and algorithmic results on
GIRGs. (1) As our main contribution we provide a sampling algorithm that
generates a random graph from our model in expected linear time, improving the
best-known sampling algorithm for hyperbolic random graphs by a substantial
factor O(n^0.5). (2) We establish that GIRGs have clustering coefficients in
{\Omega}(1), (3) we prove that GIRGs have small separators, i.e., it suffices
to delete a sublinear number of edges to break the giant component into two
large pieces, and (4) we show how to compress GIRGs using an expected linear
number of bits.Comment: 25 page
- …