Search CORE

407 research outputs found

A polynomial delay algorithm for the enumeration of bubbles with length constraints in directed graphs and its application to the detection of alternative splicing in RNA-seq data

Author: Lacroix Vincent
Sacomoto Gustavo
Sagot Marie-France
Publication venue
Publication date: 30/07/2013
Field of study

We present a new algorithm for enumerating bubbles with length constraints in directed graphs. This problem arises in transcriptomics, where the question is to identify all alternative splicing events present in a sample of mRNAs sequenced by RNA-seq. This is the first polynomial-delay algorithm for this problem and we show that in practice, it is faster than previous approaches. This enables us to deal with larger instances and therefore to discover novel alternative splicing events, especially long ones, that were previously overseen using existing methods.Comment: Peer-reviewed and presented as part of the 13th Workshop on Algorithms in Bioinformatics (WABI2013

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Hal-Diderot

HAL-Rennes 1

Efficiently listing bounded length st-paths

Author: Rizzi Romeo
Sacomoto Gustavo
Sagot Marie-France
Publication venue
Publication date: 01/01/2014
Field of study

The problem of listing the

K

shortest simple (loopless)

st

-paths in a graph has been studied since the early 1960s. For a non-negatively weighted graph with

n

vertices and

m

edges, the most efficient solution is an

O(K(mn + n^2 \log n))

algorithm for directed graphs by Yen and Lawler [Management Science, 1971 and 1972], and an

O(K(m+n \log n))

algorithm for the undirected version by Katoh et al. [Networks, 1982], both using

O(Kn + m)

space. In this work, we consider a different parameterization for this problem: instead of bounding the number of

st

-paths output, we bound their length. For the bounded length parameterization, we propose new non-trivial algorithms matching the time complexity of the classic algorithms but using only

O(m+n)

space. Moreover, we provide a unified framework such that the solutions to both parameterizations -- the classic

K

-shortest and the new length-bounded paths -- can be seen as two different traversals of a same tree, a Dijkstra-like and a DFS-like traversal, respectively.Comment: 12 pages, accepted to IWOCA 201

arXiv.org e-Print Archive

CiteSeerX

INRIA a CCSD electronic archive server

HAL Descartes

Catalogo dei prodotti della ricerca

A multiple layer model to compare RNA secondary structures

Author: Allali Julien
Sagot Marie-France
Publication venue: 'Royal College of Obstetricians & Gynaecologists (RCOG)'
Publication date: 01/01/2008
Field of study

International audienceWe formally introduce a new data structure, called MiGaL for ``Multiple Graph Layers'', that is composed of various graphs linked together by relations of abstraction/refinement. The new structure is useful for representing information that can be described at different levels of abstraction, each level corresponding to a graph. We then propose an algorithm for comparing two MiGaLs. The algorithm performs a step-by-step comparison starting with the most ``abstract'' level. The result of the comparison at a given step is communicated to the next step using a special colouring scheme. MiGaLs represent a very natural model for comparing RNA secondary structures that may be seen at different levels of detail, going from the sequence of nucleotides, single or paired with another to participate in a helix, to the network of multiple loops that is believed to represent the most conserved part of RNAs having similar function. We therefore show how to use MiGaLs to very efficiently compare two RNAs of any size at different levels of detail

INRIA a CCSD electronic archive server

Hal-Diderot

HAL-Ecole des Ponts ParisTech

HAL - UPEC / UPEM

Geometric medians in reconciliation spaces

Author: Huber Katharina T.
Moulton Vincent
Sagot Marie-France
Sinaimeri Blerina
Publication venue
Publication date: 03/07/2017
Field of study

In evolutionary biology, it is common to study how various entities evolve together, for example, how parasites coevolve with their host, or genes with their species. Coevolution is commonly modelled by considering certain maps or reconciliations from one evolutionary tree

P

to another

H

, all of which induce the same map

\phi

between the leaf-sets of

P

and

H

(corresponding to present-day associations). Recently, there has been much interest in studying spaces of reconciliations, which arise by defining some metric

d

on the set

Rec(P,H,\phi)

of all possible reconciliations between

P

and

H

. In this paper, we study the following question: How do we compute a geometric median for a given subset

\Psi

Rec(P,H,\phi)

relative to

d

, i.e. an element

\psi_{med} \in Rec(P,H,\phi)

such that

\sum_{\psi' \in \Psi} d(\psi_{med},\psi') \le \sum_{\psi' \in \Psi} d(\psi,\psi')

holds for all

\psi \in Rec(P,H,\phi)

? For a model where so-called host-switches or transfers are not allowed, and for a commonly used metric

d

called the edit-distance, we show that although the cardinality of

Rec(P,H,\phi)

can be super-exponential, it is still possible to compute a geometric median for a set

\Psi

Rec(P,H,\phi)

in polynomial time. We expect that this result could be useful for computing a summary or consensus for a set of reconciliations (e.g. for a set of suboptimal reconciliations).Comment: 12 pages, 1 figur

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

University of East Anglia digital repository

Hal-Diderot

On maximal chain subgraphs and covers of bipartite graphs

Author: Calamoneri Tiziana
Gastaldello Mattia
Mary Arnaud
Sagot Marie France
Sinaimeri Blerina
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

In this paper, we address three related problems. One is the enumeration of all the maximal edge induced chain subgraphs of a bipartite graph, for which we provide a polynomial delay algorithm. We give bounds on the number of maximal chain subgraphs for a bipartite graph and use them to establish the input-sensitive complexity of the enumeration problem. The second problem we treat is the one of finding the minimum number of chain subgraphs needed to cover all the edges a bipartite graph. For this we provide an exact exponential algorithm with a non trivial complexity. Finally, we approach the problem of enumerating all minimal chain subgraph covers of a bipartite graph and show that it can be solved in quasi-polynomial time

Archivio della ricerca- Università di Roma La Sapienza

Incremental complexity of a bi-objective hypergraph transversal problem

Author: Andrade Ricardo
Birmelé Etienne
Mary Arnaud
Picchetti Thomas
Sagot Marie-France
Publication venue
Publication date: 01/01/2015
Field of study

The hypergraph transversal problem has been intensively studied, from both a theoretical and a practical point of view. In particular , its incremental complexity is known to be quasi-polynomial in general and polynomial for bounded hypergraphs. Recent applications in computational biology however require to solve a generalization of this problem, that we call bi-objective transversal problem. The instance is in this case composed of a pair of hypergraphs (A, B), and the aim is to find minimal sets which hit all the hyperedges of A while intersecting a minimal set of hyperedges of B. In this paper, we formalize this problem, link it to a problem on monotone boolean

\land

\lor

formulae of depth 3 and study its incremental complexity

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

HAL Descartes

The Gapped-Factor Tree

Author: Allali Julien
Peterlongo Pierre
Sagot Marie-France
Publication venue: HAL CCSD
Publication date: 01/01/2006
Field of study

International audienceWe present a data structure to index a specific kind of factors, that is of substrings, called gapped-factors. A gapped-factor is a factor containing a gap that is ignored during the indexation. The data structure presented is based on the suffix tree and indexes all the gapped-factors of a text with a fixed size of gap, and only those. The construction of this data structure is done online in linear time and space. Such a data structure may play an important role in various pattern matching and motif inference problems, for instance in text filtration

CiteSeerX

INRIA a CCSD electronic archive server

Hal-Diderot

HAL-Ecole des Ponts ParisTech

HAL - UPEC / UPEM

Longest Motifs with a Functionally Equivalent Central Block

Author: Crochemore Maxime
Giancarlo Raffaele
Sagot Marie-France
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2004
Field of study

International audienceThis paper presents a generalization of the notion of longest repeats with a block of k don't care symbols introduced by [Crochemore et al., LATIN 2004] (for k fixed) to longest motifs composed of three parts: a first and last that parameterize match (that is, match via some symbol renaming, initially unknown), and a functionally equivalent central block. Such three-part motifs are called longest block motifs. Different types of functional equivalence, and thus of matching criteria for the central block are considered, which include as a subcase the one treated in [Crochemore et al., LATIN 2004] and extend to the case of regular expressions with no Kleene closure or complement operation. We show that a single general algorithmic tool that is a non-trivial extension of the ideas introduced in [Crochemore et al., LATIN 2004] can handle all the various kinds of longest block motifs defined in this paper. The algorithm complexity is, in all cases, in O(n log n)

Crossref

INRIA a CCSD electronic archive server

Hal-Diderot

HAL-Ecole des Ponts ParisTech

HAL - UPEC / UPEM

Assessing the Exceptionality of Coloured Motifs in Networks

Author: Lacroix Vincent
Sagot Marie-France
Schbath Sophie
Publication venue: BioMed Central
Publication date: 26/10/2008
Field of study

Various methods have been recently employed to characterise the structure of biological networks. In particular, the concept of network motif and the related one of coloured motif have proven useful to model the notion of a functional/evolutionary building block. However, algorithms that enumerate all the motifs of a network may produce a very large output, and methods to decide which motifs should be selected for downstream analysis are needed. A widely used method is to assess if the motif is exceptional, that is, over- or under-represented with respect to a null hypothesis. Much effort has been put in the last thirty years to derive P-values for the frequencies of topological motifs, that is, fixed subgraphs. They rely either on (compound) Poisson and Gaussian approximations for the motif count distribution in Erdös-Rényi random graphs or on simulations in other models. We focus on a different definition of graph motifs that corresponds to coloured motifs. A coloured motif is a connected subgraph with fixed vertex colours but unspecified topology. Our work is the first analytical attempt to assess the exceptionality of coloured motifs in networks without any simulation. We first establish analytical formulae for the mean and the variance of the count of a coloured motif in an Erdös-Rényi random graph model. Using simulations under this model, we further show that a Pólya-Aeppli distribution bette

CiteSeerX

Crossref

Springer - Publisher Connector

INRIA a CCSD electronic archive server

Directory of Open Access Journals

PubMed Central

HAL Descartes

UPF Digital Repository