Search CORE

5 research outputs found

A polynomial delay algorithm for the enumeration of bubbles with length constraints in directed graphs and its application to the detection of alternative splicing in RNA-seq data

Author: Lacroix Vincent
Sacomoto Gustavo
Sagot Marie-France
Publication venue
Publication date: 30/07/2013
Field of study

We present a new algorithm for enumerating bubbles with length constraints in directed graphs. This problem arises in transcriptomics, where the question is to identify all alternative splicing events present in a sample of mRNAs sequenced by RNA-seq. This is the first polynomial-delay algorithm for this problem and we show that in practice, it is faster than previous approaches. This enables us to deal with larger instances and therefore to discover novel alternative splicing events, especially long ones, that were previously overseen using existing methods.Comment: Peer-reviewed and presented as part of the 13th Workshop on Algorithms in Bioinformatics (WABI2013

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Hal-Diderot

HAL-Rennes 1

Efficient Algorithms for Listing k Disjoint st-Paths in Graphs

Author: Grossi Roberto,
Marino Andrea
Versari Luca,
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

International audienceGiven a connected graph G of m edges and n vertices, we consider the basic problem of listing all the choices of k vertex-disjoint st-paths, for any two input vertices s, t of G and a positive integer k. Our algorithm takes O(m) time per solution, using O(m) space and requiring O(F k (G)) setup time, where

F k (G) = O(m min{k, n 2/3 log n, √ m log n})

is the cost of running a max-flow algorithm on G to compute a flow of size k. The proposed techniques are simple and apply to other related listing problems discussed in the paper

Crossref

INRIA a CCSD electronic archive server

Archivio della Ricerca - Università di Pisa

HAL Descartes

Hal-Diderot

Playing hide and seek with repeats in local and global de novo transcriptome assembly of short RNA-seq reads

Author: Lacroix Vincent
Lima Leandro
Lopez-Maestre Helene
Marchet Camille
Miele Vincent
Sacomoto Gustavo
Sagot Marie-France
Sinaimeri Blerina
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

International audienceAbstractBackground The main challenge in de novo genome assembly of DNA-seq data is certainly to deal with repeats that are longer than the reads. In de novo transcriptome assembly of RNA-seq reads, on the other hand, this problem has been underestimated so far. Even though we have fewer and shorter repeated sequences in transcriptomics, they do create ambiguities and confuse assemblers if not addressed properly. Most transcriptome assemblers of short reads are based on de Bruijn graphs (DBG) and have no clear and explicit model for repeats in RNA-seq data, relying instead on heuristics to deal with them.ResultsThe results of this work are threefold. First, we introduce a formal model for representing high copy-number and low-divergence repeats in RNA-seq data and exploit its properties to infer a combinatorial characteristic of repeat-associated subgraphs. We show that the problem of identifying such subgraphs in a DBG is NP-complete. Second, we show that in the specific case of local assembly of alternative splicing (AS) events, we can implicitly avoid such subgraphs, and we present an efficient algorithm to enumerate AS events that are not included in repeats. Using simulated data, we show that this strategy is significantly more sensitive and precise than the previous version of KisSplice (Sacomoto et al. in WABI, pp 99–111, 1), Trinity (Grabherr et al. in Nat Biotechnol 29(7):644–652, 2), and Oases (Schulz et al. in Bioinformatics 28(8):1086–1092, 3), for the specific task of calling AS events. Third, we turn our focus to full-length transcriptome assembly, and we show that exploring the topology of DBGs can improve de novo transcriptome evaluation methods. Based on the observation that repeats create complicated regions in a DBG, and when assemblers try to traverse these regions, they can infer erroneous transcripts, we propose a measure to flag transcripts traversing such troublesome regions, thereby giving a confidence level for each transcript. The originality of our work when compared to other transcriptome evaluation methods is that we use only the topology of the DBG, and not read nor coverage information. We show that our simple method gives better results than Rsem-Eval (Li et al. in Genome Biol 15(12):553, 4) and TransRate (Smith-Unna et al. in Genome Res 26(8):1134–1144, 5) on both real and simulated datasets for detecting chimeras, and therefore is able to capture assembly errors missed by these methods

Crossref

INRIA a CCSD electronic archive server

PubMed Central

Archivio della ricerca- LUISS Libera Università Internazionale degli Studi Sociali Guido Carli di Roma

Hal-Diderot

HAL-Rennes 1

Playing hide and seek with repeats in local and global de novo transcriptome assembly of short RNA-seq reads

Author: B Li
Blerina Sinaimeri
Camille Marchet
EW Myers
F Freyermuth
G Robertson
G Sacomoto
Gustavo Sacomoto
H Lopez-Maestre
H Tilgner
Helene Lopez-Maestre
J Jurka
JT Robinson
Leandro Lima
M Bern
M Schulz
Marie-France Sagot
MG Grabherr
ML Carroll
P Novák
R Smith-Unna
S Djebali
T Griebel
Vincent Lacroix
Vincent Miele
W Bao
WJ Kent
Y Peng
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref