Search CORE

22 research outputs found

Safe and complete contig assembly via omnitigs

Author: A Bankevich
A Guénoche
AR Rubinov
AS Motahari
C Kingsford
D Haussler
DR Zerbino
E Kapun
E Kapun
ES Lander
G Bresler
G Narzisi
I Lysov
JD Kececioglu
JR Miller
JT Simpson
JT Simpson
K Lam
K Sahlin
L Salmela
M Boetzer
M Boetzer
N Nagarajan
N Nagarajan
N Vyahhi
P Medvedev
P Medvedev
P Medvedev
PA Pevzner
PA Pevzner
R Chikhi
R Chikhi
R Luo
R Uricaru
RM Idury
SL Salzberg
Publication venue
Publication date: 16/08/2016
Field of study

Contig assembly is the first stage that most assemblers solve when reconstructing a genome from a set of reads. Its output consists of contigs -- a set of strings that are promised to appear in any genome that could have generated the reads. From the introduction of contigs 20 years ago, assemblers have tried to obtain longer and longer contigs, but the following question was never solved: given a genome graph

G

(e.g. a de Bruijn, or a string graph), what are all the strings that can be safely reported from

G

as contigs? In this paper we finally answer this question, and also give a polynomial time algorithm to find them. Our experiments show that these strings, which we call omnitigs, are 66% to 82% longer on average than the popular unitigs, and 29% of dbSNP locations have more neighbors in omnitigs than in unitigs.Comment: Full version of the paper in the proceedings of RECOMB 201

arXiv.org e-Print Archive

Crossref

Identification of Variant Compositions in Related Strains Without Reference

Author: D Aguiar
D He
DS Correll
E Berger
F Deng
I Astrovskaya
J Neigenfind
JC Stephens
M Patterson
R Cilibrasi
R Lippert
R Tewhey
R Uricaru
S Bayzid
S Das
S Lin
SY Su
V Kuleshov
Z Chen
Publication venue: Springer International Publishing
Publication date: 01/01/2016
Field of study

Peer reviewe

Crossref

Helsingin yliopiston digitaalinen arkisto

4Pipe4-A 454 data analysis pipeline for SNP detection in datasets with no reference sequence or strain information

Author: A Conesa
A Papanicolaou
A Ratan
AE Savage
B Chevreux
B Langmead
Bruno M. Vieira
C Tollenaere
Dora Batista
FM You
Francisco Pina-Martins
H Li
H Nijveen
IS Modesto
J Tang
KD Broders
Octávio S. Paulo
P Rice
R Leinonen
R Uricaru
SC Schuster
SF Altschul
Sofia G. Seabra
Y Shen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

This work was fully supported by projects SOBREIRO/0036/2009 (under the framework of the Cork Oak ESTs Consortium), PTDC/BIA-BEC/098783/2008 and PTDC/AGR-GPL/119943/2010 from Fundação para a Ciência e Tecnologia (FCT) – Portugal. F. Pina-Martins was funded by FCT grant SFRH/BD/51411/2011, under the PhD program “Biology and Ecology of Global Changes”, Univ. Aveiro & Univ. Lisbon, Portugal. D. Batista was funded by FCT grant SFRH/BPD/104629/2014

Crossref

Springer - Publisher Connector

PubMed Central

Universidade de Lisboa: Repositório.UL

Queen Mary Research Online

On Bubble Generators in Directed Graphs

Author: B Bollobás
D Zerbino
E Birmelé
G Sacomoto
G Sacomoto
H Li
JA Bondy
JT Simpson
L Brankovic
L Lima
M Sammeth
N Deo
PA Pevzner
PM Gleiss
R Uricaru
R Younsi
S MacLane
T Kavitha
T Kavitha
T Onodera
WK Sung
Z Iqbal
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

International audienceBubbles are pairs of internally vertex-disjoint (s, t)-paths with applications in the processing of DNA and RNA data. For example, enumerating alternative splicing events in a reference-free context can be done by enumerating all bubbles in a de Bruijn graph built from RNA-seq reads [16]. However, listing and analysing all bubbles in a given graph is usually unfeasible in practice, due to the exponential number of bubbles present in real data graphs. In this paper, we propose a notion of a bubble generator set, i.e. a polynomial-sized subset of bubbles from which all the others can be obtained through the application of a specific symmetric difference operator. This set provides a compact representation of the bubble space of a graph, which can be useful in practice since some pertinent information about all the bubbles can be more conveniently extracted from this compact set. Furthermore, we provide a polynomial-time algorithm to decompose any bubble of a graph into the bubbles of such a generator in a tree-like fashion

Crossref

INRIA a CCSD electronic archive server

HAL Descartes

Archivio della Ricerca - Università di Pisa

Repositorio Académico de la Universidad de Chile

ART

A Family of Tree-Based Generators for Bubbles in Directed Graphs

Author: A Dobin
C Benoit-Pilven
E Birmelé
G Kirchhoff
G Sacomoto
G Sacomoto
J Cheriyan
JR Miller
K Klemm
L Brankovic
L Lima
M Sammeth
PC Kainen
PM Gleiss
R Uricaru
R Younsi
RH Hammack
S MacLane
T Kavitha
T Kavitha
T Onodera
TH Cormen
V Acuña
W-K Sung
Z Iqbal
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 08/06/2020
Field of study

International audienceBubbles are pairs of internally vertex-disjoint (s, t)-paths in a directed graph. In de Bruijn graphs built from reads of RNA and DNA data, bubbles represent interesting biological events, such as alternative splicing (AS) and allelic differences (SNPs and indels). However, the set of all bubbles in a de Bruijn graph built from real data is usually too large to be efficiently enumerated and analysed in practice. In particular, despite significant research done in this area, listing bubbles still remains the main bottleneck for tools that detect AS events in a reference-free context. Recently, in [1] the concept of a bubble generator was introduced as a way for obtaining a compact representation of the bubble space of a graph. Although this generator was quite effective in finding AS events, preliminary experiments showed that it is about 5 times slower than state-of-art methods. In this paper we propose a new family of bubble generators which improve substantially on the previous generator: generators in this new family are about two orders of magnitude faster and are still able to achieve similar precision in identifying AS events. To highlight the practical value of our new generators, we also report some experimental results on a real dataset

Crossref

INRIA a CCSD electronic archive server

HAL Descartes

Archivio della ricerca- LUISS Libera Università Internazionale degli Studi Sociali Guido Carli di Roma

Hal-Diderot

HAL-Rennes 1

Using Minimum Path Cover to Boost Dynamic Programming on DAGs : Co-linear Chaining Extended

Author: A Amir
A Limasset
AI Tomescu
AM Novak
C-P Schnorr
D Belazzougui
D Eppstein
D Haussler
DM Church
DR Fulkerson
E Cohen
G Navarro
HV Jagadish
J Sirén
JE Hopcroft
K Park
M Abouelhoda
M Vyverman
R Patro
R Rizzi
R Uricaru
RK Ahuja
S Felsner
S Heber
S Wandelt
SC Ntafos
T Shibuya
V Mäkinen
V Mäkinen
VV Vazirani
Publication venue: Springer International Publishing AG
Publication date: 29/01/2018
Field of study

Peer reviewe

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

HAL Descartes

Helsingin yliopiston digitaalinen arkisto

Hal-Diderot

Redefining the structural motifs that determine RNA

Author: Uricaru R.
Publication venue: 'Wiley'
Publication date
Field of study

Crossref

Novel Definition and Algorithm for Chaining Fragments with Proportional Overlaps

Author: Alban Mancheron
Cormen T.H.
Eric Rivals
Knuth D.
Myers G.
Raluca Uricaru
Uricaru R.
Publication venue: 'Mary Ann Liebert Inc'
Publication date
Field of study

Crossref

Development of a set of SNP markers for population genetics of the red gorgonian (Paramuricea clavata), an emblematic species of the Mediterranean coralligenous

Author: AM Bolger
E Ballesteros
E. Guichoux
G Santangelo
J Pilczynska
M Grabherr
M Padrón
M. Massot
M. Milhes
M. Padrón
P Meirmans
R Cupido
R Uricaru
S Gabriel
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

Transcriptome sequencing was used for the development of single nucleotide polymorphisms (SNP) for the red gorgonian (Paramuricea clavata). A total of 20,736 SNPs were identified, and 1718 had a coverage of over 100 reads. Of the 480 SNPs tested, 347 SNPs were successfully genotyped at 95 samples from the NW Mediterranean using a MassARRAY System. This set of markers will be of great value for population genetics and phylogeography.Plateforme d'Innovation " Forêt-Bois-Fibre-Biomasse du Futur

Reference-free compression of high throughput sequencing data with a probabilistic de Bruijn graph

Author: AJ Cox
C Kingsford
C Lemaitre
Claire Lemaitre
DC Jones
Dominique Lavenier
Erwan Drezen
F Hach
G Rizk
Gaëtan Benoit
Guillaume Rizk
H Li
H Li
I Witten
J Pell
JK Bonfield
K Salikhov
L Janin
L Janin
MHY Fritz
R Chikhi
R Chikhi
R Cánovas
R Leinonen
R Patro
R Uricaru
R Wan
Raluca Uricaru
S Deorowicz
S Deorowicz
S Grabowski
Thibault Dayris
YW Yu
Z Iqbal
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref