200 research outputs found
Multi-dimensional Boltzmann Sampling of Languages
This paper addresses the uniform random generation of words from a
context-free language (over an alphabet of size ), while constraining every
letter to a targeted frequency of occurrence. Our approach consists in a
multidimensional extension of Boltzmann samplers \cite{Duchon2004}. We show
that, under mostly \emph{strong-connectivity} hypotheses, our samplers return a
word of size in and exact frequency in
expected time. Moreover, if we accept tolerance
intervals of width in for the number of occurrences of each
letters, our samplers perform an approximate-size generation of words in
expected time. We illustrate these techniques on the
generation of Tetris tessellations with uniform statistics in the different
types of tetraminoes.Comment: 12p
Non-redundant random generation from weighted context-free languages
International audienceWe address the non-redundant random generation of k words of length n from a context-free language. Additionally, we want to avoid a prede¯ned set of words. We study the limits of a rejection-based approach, whose time complexity is shown to grow exponentially in k in some cases. We propose an alternative recursive algorithm, whose careful implementation allows for a non-redundant generation of k words of size n in O(kn log n) arithmetic operations after the precomputation of O(n) numbers. The overall complexity is therefore dominated by the generation of k words, and the non-redundancy comes at a negligible cost
Estimating seed sensitivity on homogeneous alignments
We address the problem of estimating the sensitivity of seed-based similarity
search algorithms. In contrast to approaches based on Markov models [18, 6, 3,
4, 10], we study the estimation based on homogeneous alignments. We describe an
algorithm for counting and random generation of those alignments and an
algorithm for exact computation of the sensitivity for a broad class of seed
strategies. We provide experimental results demonstrating a bias introduced by
ignoring the homogeneousness condition
Assessing the robustness of parsimonious predictions for gene neighborhoods from reconciled phylogenies
The availability of a large number of assembled genomes opens the way to
study the evolution of syntenic character within a phylogenetic context. The
DeCo algorithm, recently introduced by B{\'e}rard et al. allows the computation
of parsimonious evolutionary scenarios for gene adjacencies, from pairs of
reconciled gene trees. Following the approach pioneered by Sturmfels and
Pachter, we describe how to modify the DeCo dynamic programming algorithm to
identify classes of cost schemes that generates similar parsimonious
evolutionary scenarios for gene adjacencies, as well as the robustness to
changes to the cost scheme of evolutionary events of the presence or absence of
specific ancestral gene adjacencies. We apply our method to six thousands
mammalian gene families, and show that computing the robustness to changes to
cost schemes provides new and interesting insights on the evolution of gene
adjacencies and the DeCo model.Comment: Accepted, to appear in ISBRA - 11th International Symposium on
Bioinformatics Research and Applications - 2015, Jun 2015, Norfolk, Virginia,
United State
Tree decomposition and parameterized algorithms for RNA structure-sequence alignment including tertiary interactions and pseudoknots
We present a general setting for structure-sequence comparison in a large
class of RNA structures that unifies and generalizes a number of recent works
on specific families on structures. Our approach is based on tree decomposition
of structures and gives rises to a general parameterized algorithm, where the
exponential part of the complexity depends on the family of structures. For
each of the previously studied families, our algorithm has the same complexity
as the specific algorithm that had been given before.Comment: (2012
Automatic Exploration of the Natural Variability of RNA Non-Canonical Geometric Patterns with a Parameterized Sampling Technique
Motivation. Recurrent substructures in RNA, known as 3D motifs, consist of networks of base pair interactions and are critical to understanding the relationship between structure and function. Their structure is naturally expressed as a graph which has led to many graph-based algorithms to automatically catalog identical motifs found in 3D structures. Yet, due to the complexity of the problem, state-of-the-art methods are often optimized to find exact matches, limiting the search to a subset of potential solutions, or do not allow explicit control over the desired variability.
Results. We developed FuzzTree, a method able to efficiently sample approximate instances of an RNA motif, abstracted as a subgraph within a target RNA structure. It is the first method that allows explicit control over (1) the admissible geometric variability in the interactions; (2) the number of missing edges; and (3) the introduction of discontinuities in the backbone given close distances in the 3D structure. Our tool relies on a multidimensional Boltzmann sampling, having complexity parameterized by the treewidth of the requested motif. We applied our method to the well-known internal loop Kink-Turn motif, which can be divided into 12 subgroups. Given only the graph representing the main Kink-Turn subgroup, FuzzTree retrieved over 3/4 of all kink-turns. We also highlighted two occurrences of new sampled patterns. Our tool is available as free software and can be customized for different parameters and types of graphs
Combinatorial RNA Design Designability and Structure-Approximating Algorithm in Watson-Crick and Nussinov-Jacobson Energy Models
We consider the Combinatorial RNA Design problem, a minimal instance of RNA
design where one must produce an RNA sequence that adopts a given secondary
structure as its minimal free-energy structure. We consider two free-energy
models where the contributions of base pairs are additive and independent: the
purely combinatorial Watson-Crick model, which only allows equally-contributing
A -- U and C -- G base pairs, and the real-valued Nussinov-Jacobson model,
which associates arbitrary energies to A -- U, C -- G and G -- U base pairs. We
first provide a complete characterization of designable structures using
restricted alphabets and, in the four-letter alphabet, provide a complete
characterization for designable structures without unpaired bases. When
unpaired bases are allowed, we characterize extensive classes of
(non-)designable structures, and prove the closure of the set of designable
structures under the stutter operation. Membership of a given structure to any
of the classes can be tested in (n) time, including the generation of a
solution sequence for positive instances. Finally, we consider a
structure-approximating relaxation of the design, and provide a (n)
algorithm which, given a structure S that avoids two trivially non-designable
motifs, transforms S into a designable structure constructively by adding at
most one base-pair to each of its stems.Comment: To appea
Using Structural and Evolutionary Information to Detect and Correct Pyrosequencing Errors in Noncoding RNAs.
Extended version of RECOMB'13International audienceThe analysis of the sequence-structure relationship in RNA molecules is not only essential for evolutionary studies but also for concrete applications such as error-correction in next generation sequencing (NGS) technologies. The prohibitive sizes of the mutational and conformational landscapes, combined with the volume of data to process, require efficient algorithms to compute sequence-structure properties. In this article, we address the correction of NGS errors by calculating which mutations most increase the likelihood of a sequence to a given structure and RNA family. We introduce RNApyro, an efficient, linear time and space inside-outside algorithm that computes exact mutational probabilities under secondary structure and evolutionary constraints given as a multiple sequence alignment with a consensus structure. We develop a scoring scheme combining classical stacking base-pair energies to novel isostericity scores and apply our techniques to correct pointwise errors in 5s and 16s rRNA sequences. Our results suggest that RNApyro is a promising algorithm to complement existing tools in the NGS error-correction pipeline
A linear inside-outside algorithm for correcting sequencing errors in structured RNA sequences
International audienceAnalysis of the sequence-structure relationship in RNA molecules are essential to evolutionary studies but also to concrete applications such as error-correction methodologies in sequencing technologies. The prohibitive sizes of the mutational and conformational landscapes combined with the volume of data to proceed require e cient algorithms to compute sequence-structure properties. More speci cally, here we aim to calculate which mutations increase the most the likelihood of a sequence to a given structure and RNA family. In this paper, we introduce RNApyro, an e cient linear-time and space inside-outside algorithm that computes exact mutational probabilities under secondary structure and evolutionary constraints given as a multiple sequence alignment with a consensus structure. We develop a scoring scheme combining classical stacking base pair energies to novel isostericity scales, and apply our techniques to correct point-wise errors in 5s rRNA sequences. Our results suggest that RNApyro is a promising algorithm to complement existing tools in the NGS error-correction pipeline
- …