Search CORE

1,406 research outputs found

A polynomial delay algorithm for the enumeration of bubbles with length constraints in directed graphs and its application to the detection of alternative splicing in RNA-seq data

Author: Lacroix Vincent
Sacomoto Gustavo
Sagot Marie-France
Publication venue
Publication date: 30/07/2013
Field of study

We present a new algorithm for enumerating bubbles with length constraints in directed graphs. This problem arises in transcriptomics, where the question is to identify all alternative splicing events present in a sample of mRNAs sequenced by RNA-seq. This is the first polynomial-delay algorithm for this problem and we show that in practice, it is faster than previous approaches. This enables us to deal with larger instances and therefore to discover novel alternative splicing events, especially long ones, that were previously overseen using existing methods.Comment: Peer-reviewed and presented as part of the 13th Workshop on Algorithms in Bioinformatics (WABI2013

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Hal-Diderot

Navigating in a sea of repeats in RNA-seq without drowning

Author: Lacroix Vincent
Marchet Camille
Miele Vincent
Sacomoto Gustavo
Sagot Marie-France
Sinaimeri Blerina
Publication venue
Publication date: 01/01/2014
Field of study

The main challenge in de novo assembly of NGS data is certainly to deal with repeats that are longer than the reads. This is particularly true for RNA- seq data, since coverage information cannot be used to flag repeated sequences, of which transposable elements are one of the main examples. Most transcriptome assemblers are based on de Bruijn graphs and have no clear and explicit model for repeats in RNA-seq data, relying instead on heuristics to deal with them. The results of this work are twofold. First, we introduce a formal model for repre- senting high copy number repeats in RNA-seq data and exploit its properties for inferring a combinatorial characteristic of repeat-associated subgraphs. We show that the problem of identifying in a de Bruijn graph a subgraph with this charac- teristic is NP-complete. In a second step, we show that in the specific case of a local assembly of alternative splicing (AS) events, we can implicitly avoid such subgraphs. In particular, we designed and implemented an algorithm to efficiently identify AS events that are not included in repeated regions. Finally, we validate our results using synthetic data. We also give an indication of the usefulness of our method on real data

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

HAL Descartes

A polynomial delay algorithm for the enumeration of bubbles with length constraints in directed graphs

Author
Publication venue: BioMed Central
Publication date: 27/06/2015
Field of study

Springer - Publisher Connector

Simplicial and Cellular Trees

Author: Duval Art M.
Klivans Caroline J.
Martin Jeremy L.
Publication venue
Publication date: 22/06/2015
Field of study

Much information about a graph can be obtained by studying its spanning trees. On the other hand, a graph can be regarded as a 1-dimensional cell complex, raising the question of developing a theory of trees in higher dimension. As observed first by Bolker, Kalai and Adin, and more recently by numerous authors, the fundamental topological properties of a tree --- namely acyclicity and connectedness --- can be generalized to arbitrary dimension as the vanishing of certain cellular homology groups. This point of view is consistent with the matroid-theoretic approach to graphs, and yields higher-dimensional analogues of classical enumerative results including Cayley's formula and the matrix-tree theorem. A subtlety of the higher-dimensional case is that enumeration must account for the possibility of torsion homology in trees, which is always trivial for graphs. Cellular trees are the starting point for further high-dimensional extensions of concepts from algebraic graph theory including the critical group, cut and flow spaces, and discrete dynamical systems such as the abelian sandpile model.Comment: 39 pages (including 5-page bibliography); 5 figures. Chapter for forthcoming IMA volume "Recent Trends in Combinatorics

arXiv.org e-Print Archive

KU ScholarWorks

Linear-Time Superbubble Identification Algorithm for Genome Assembly

Author: Brankovic Ljiljana
Iliopoulos Costas S.
Kundu Ritu
Mohamed Manal
Pissis Solon P.
Vayani Fatima
Publication venue
Publication date: 17/09/2015
Field of study

DNA sequencing is the process of determining the exact order of the nucleotide bases of an individual's genome in order to catalogue sequence variation and understand its biological implications. Whole-genome sequencing techniques produce masses of data in the form of short sequences known as reads. Assembling these reads into a whole genome constitutes a major algorithmic challenge. Most assembly algorithms utilize de Bruijn graphs constructed from reads for this purpose. A critical step of these algorithms is to detect typical motif structures in the graph caused by sequencing errors and genome repeats, and filter them out; one such complex subgraph class is a so-called superbubble. In this paper, we propose an O(n+m)-time algorithm to detect all superbubbles in a directed acyclic graph with n nodes and m (directed) edges, improving the best-known O(m log m)-time algorithm by Sung et al

arXiv.org e-Print Archive

University of Newcastle's Digital Repository

Elsevier - Publisher Connector

King's Research Portal

A polynomial delay algorithm for the enumeration of bubbles with length constraints in directed graphs

Author: A Dobin
A Mortazavi
E Wang
G Robertson
GAT Sacomoto
Gustavo Sacomoto
Marie-France Sagot
MG Grabherr
MH Schulz
MR Bussieck
P Flicek
RK Ahuja
TH Cormen
Vincent Lacroix
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Detecting Superbubbles in Assembly Graphs

Author: Onodera Taku
Sadakane Kunihiko
Shibuya Tetsuo
Publication venue
Publication date: 30/07/2013
Field of study

We introduce a new concept of a subgraph class called a superbubble for analyzing assembly graphs, and propose an efficient algorithm for detecting it. Most assembly algorithms utilize assembly graphs like the de Bruijn graph or the overlap graph constructed from reads. From these graphs, many assembly algorithms first detect simple local graph structures (motifs), such as tips and bubbles, mainly to find sequencing errors. These motifs are easy to detect, but they are sometimes too simple to deal with more complex errors. The superbubble is an extension of the bubble, which is also important for analyzing assembly graphs. Though superbubbles are much more complex than ordinary bubbles, we show that they can be efficiently enumerated. We propose an average-case linear time algorithm (i.e., O(n+m) for a graph with n vertices and m edges) for graphs with a reasonable model, though the worst-case time complexity of our algorithm is quadratic (i.e., O(n(n+m))). Moreover, the algorithm is practically very fast: Our experiments show that our algorithm runs in reasonable time with a single CPU core even against a very large graph of a whole human genome.Comment: Peer-reviewed and presented as part of the 13th Workshop on Algorithms in Bioinformatics (WABI2013

arXiv.org e-Print Archive

CiteSeerX

eScholarship - University of California