407 research outputs found
A polynomial delay algorithm for the enumeration of bubbles with length constraints in directed graphs and its application to the detection of alternative splicing in RNA-seq data
We present a new algorithm for enumerating bubbles with length constraints in
directed graphs. This problem arises in transcriptomics, where the question is
to identify all alternative splicing events present in a sample of mRNAs
sequenced by RNA-seq. This is the first polynomial-delay algorithm for this
problem and we show that in practice, it is faster than previous approaches.
This enables us to deal with larger instances and therefore to discover novel
alternative splicing events, especially long ones, that were previously
overseen using existing methods.Comment: Peer-reviewed and presented as part of the 13th Workshop on
Algorithms in Bioinformatics (WABI2013
Efficiently listing bounded length st-paths
The problem of listing the shortest simple (loopless) -paths in a
graph has been studied since the early 1960s. For a non-negatively weighted
graph with vertices and edges, the most efficient solution is an
algorithm for directed graphs by Yen and Lawler
[Management Science, 1971 and 1972], and an algorithm for
the undirected version by Katoh et al. [Networks, 1982], both using
space. In this work, we consider a different parameterization for this problem:
instead of bounding the number of -paths output, we bound their length. For
the bounded length parameterization, we propose new non-trivial algorithms
matching the time complexity of the classic algorithms but using only
space. Moreover, we provide a unified framework such that the solutions to both
parameterizations -- the classic -shortest and the new length-bounded paths
-- can be seen as two different traversals of a same tree, a Dijkstra-like and
a DFS-like traversal, respectively.Comment: 12 pages, accepted to IWOCA 201
A multiple layer model to compare RNA secondary structures
International audienceWe formally introduce a new data structure, called MiGaL for ``Multiple Graph Layers'', that is composed of various graphs linked together by relations of abstraction/refinement. The new structure is useful for representing information that can be described at different levels of abstraction, each level corresponding to a graph. We then propose an algorithm for comparing two MiGaLs. The algorithm performs a step-by-step comparison starting with the most ``abstract'' level. The result of the comparison at a given step is communicated to the next step using a special colouring scheme. MiGaLs represent a very natural model for comparing RNA secondary structures that may be seen at different levels of detail, going from the sequence of nucleotides, single or paired with another to participate in a helix, to the network of multiple loops that is believed to represent the most conserved part of RNAs having similar function. We therefore show how to use MiGaLs to very efficiently compare two RNAs of any size at different levels of detail
Geometric medians in reconciliation spaces
In evolutionary biology, it is common to study how various entities evolve
together, for example, how parasites coevolve with their host, or genes with
their species. Coevolution is commonly modelled by considering certain maps or
reconciliations from one evolutionary tree to another , all of which
induce the same map between the leaf-sets of and (corresponding
to present-day associations). Recently, there has been much interest in
studying spaces of reconciliations, which arise by defining some metric on
the set of all possible reconciliations between and .
In this paper, we study the following question: How do we compute a geometric
median for a given subset of relative to , i.e. an
element such that holds for all
? For a model where so-called host-switches or
transfers are not allowed, and for a commonly used metric called the
edit-distance, we show that although the cardinality of can be
super-exponential, it is still possible to compute a geometric median for a set
in in polynomial time. We expect that this result could
be useful for computing a summary or consensus for a set of reconciliations
(e.g. for a set of suboptimal reconciliations).Comment: 12 pages, 1 figur
On maximal chain subgraphs and covers of bipartite graphs
In this paper, we address three related problems. One is the enumeration of all the maximal edge induced chain subgraphs of a bipartite graph, for which we provide a polynomial delay algorithm. We give bounds on the number of maximal chain subgraphs for a bipartite graph and use them to establish the input-sensitive complexity of the enumeration problem.
The second problem we treat is the one of finding the minimum number of chain subgraphs needed to cover all the edges a bipartite graph. For this we provide an exact exponential algorithm with a non trivial complexity. Finally, we approach the problem of enumerating all minimal chain subgraph covers of a bipartite graph and show that it can be solved in quasi-polynomial time
Incremental complexity of a bi-objective hypergraph transversal problem
The hypergraph transversal problem has been intensively studied, from both a
theoretical and a practical point of view. In particular , its incremental
complexity is known to be quasi-polynomial in general and polynomial for
bounded hypergraphs. Recent applications in computational biology however
require to solve a generalization of this problem, that we call bi-objective
transversal problem. The instance is in this case composed of a pair of
hypergraphs (A, B), and the aim is to find minimal sets which hit all the
hyperedges of A while intersecting a minimal set of hyperedges of B. In this
paper, we formalize this problem, link it to a problem on monotone boolean
-- formulae of depth 3 and study its incremental complexity
The Gapped-Factor Tree
International audienceWe present a data structure to index a specific kind of factors, that is of substrings, called gapped-factors. A gapped-factor is a factor containing a gap that is ignored during the indexation. The data structure presented is based on the suffix tree and indexes all the gapped-factors of a text with a fixed size of gap, and only those. The construction of this data structure is done online in linear time and space. Such a data structure may play an important role in various pattern matching and motif inference problems, for instance in text filtration
Longest Motifs with a Functionally Equivalent Central Block
International audienceThis paper presents a generalization of the notion of longest repeats with a block of k don't care symbols introduced by [Crochemore et al., LATIN 2004] (for k fixed) to longest motifs composed of three parts: a first and last that parameterize match (that is, match via some symbol renaming, initially unknown), and a functionally equivalent central block. Such three-part motifs are called longest block motifs. Different types of functional equivalence, and thus of matching criteria for the central block are considered, which include as a subcase the one treated in [Crochemore et al., LATIN 2004] and extend to the case of regular expressions with no Kleene closure or complement operation. We show that a single general algorithmic tool that is a non-trivial extension of the ideas introduced in [Crochemore et al., LATIN 2004] can handle all the various kinds of longest block motifs defined in this paper. The algorithm complexity is, in all cases, in O(n log n)
Assessing the Exceptionality of Coloured Motifs in Networks
Various methods have been recently employed to characterise the structure of biological networks. In particular, the concept of network motif and the related one of coloured motif have proven useful to model the notion of a functional/evolutionary building block. However, algorithms that enumerate all the motifs of a network may produce a very large output, and methods to decide which motifs should be selected for downstream analysis are needed. A widely used method is to assess if the motif is exceptional, that is, over- or under-represented with respect to a null hypothesis. Much effort has been put in the last thirty years to derive P-values for the frequencies of topological motifs, that is, fixed subgraphs. They rely either on (compound) Poisson and Gaussian approximations for the motif count distribution in Erdös-Rényi random graphs or on simulations in other models. We focus on a different definition of graph motifs that corresponds to coloured motifs. A coloured motif is a connected subgraph with fixed vertex colours but unspecified topology. Our work is the first analytical attempt to assess the exceptionality of coloured motifs in networks without any simulation. We first establish analytical formulae for the mean and the variance of the count of a coloured motif in an Erdös-Rényi random graph model. Using simulations under this model, we further show that a Pólya-Aeppli distribution bette
- âŠ