Search CORE

18 research outputs found

Safe and Complete Contig Assembly Through Omnitigs

Author: Medvedev Paul
Tomescu Alexandru I.
Publication venue
Publication date: 01/06/2017
Field of study

Peer reviewe

Crossref

Helsingin yliopiston digitaalinen arkisto

Safe and complete contig assembly via omnitigs

Author: A Bankevich
A Guénoche
AR Rubinov
AS Motahari
C Kingsford
D Haussler
DR Zerbino
E Kapun
E Kapun
ES Lander
G Bresler
G Narzisi
I Lysov
JD Kececioglu
JR Miller
JT Simpson
JT Simpson
K Lam
K Sahlin
L Salmela
M Boetzer
M Boetzer
N Nagarajan
N Nagarajan
N Vyahhi
P Medvedev
P Medvedev
P Medvedev
PA Pevzner
PA Pevzner
R Chikhi
R Chikhi
R Luo
R Uricaru
RM Idury
SL Salzberg
Publication venue
Publication date: 16/08/2016
Field of study

Contig assembly is the first stage that most assemblers solve when reconstructing a genome from a set of reads. Its output consists of contigs -- a set of strings that are promised to appear in any genome that could have generated the reads. From the introduction of contigs 20 years ago, assemblers have tried to obtain longer and longer contigs, but the following question was never solved: given a genome graph

G

(e.g. a de Bruijn, or a string graph), what are all the strings that can be safely reported from

G

as contigs? In this paper we finally answer this question, and also give a polynomial time algorithm to find them. Our experiments show that these strings, which we call omnitigs, are 66% to 82% longer on average than the popular unitigs, and 29% of dbSNP locations have more neighbors in omnitigs than in unitigs.Comment: Full version of the paper in the proceedings of RECOMB 201

arXiv.org e-Print Archive

Crossref

Safe solutions for walks on graphs

Author: Obscura Acosta Nidia
Publication venue: Helsingfors universitet
Publication date: 01/01/2018
Field of study

In this thesis we study the concept of “safe solutions” in different problems whose solutions are walks on graphs. A safe solution to a problem X can be understood as a partial solution common to all solutions to problem X. In problems whose solutions are walks on graphs, safe solutions refer to walks common to all walks which are solutions to the problem. In this thesis, we focused on formulating four main graph traversal problems and finding characterizations for those walks contained in all their solutions. We give formulations for these graph traversal problems, we prove some of their combinatorial and structural properties, and we give safe and complete algorithms for finding their safe solutions based on their characterizations. We use the genome assembly problem and its applications as our main motivating example for finding safe solutions in these graph traversal problems. We begin by motivating and exemplifying the notion of safe solutions through a problem on s-t paths in undirected graphs with at least two non-trivial biconnected components S and T and with s ∈ S, t ∈ T . We continue by reviewing similar and related notions in other fields, especially in combinatorial optimization and previous work on the bioinformatics problem of genome assembly. We then proceed to characterize the safe solutions to the Eulerian cycle problem, where one must find a circular walk in a graph G which traverses each edge exactly once. We suggest a characterization for them by improving on (Nagarajan, Pop, JCB 2009) and a polynomial-time algorithm for finding them. We then study edge-covering circular walks in a graph G. We look at the characterization from (Tomescu, Medvedev, JCB 2017) for their safe solutions and their suggested polynomial-time algorithm and we show an optimal O(mn)-time algorithm that we proposed in (Cairo et al. CPM 2017). Finally, we generalize this to edge-covering collections of circular walks. We characterize safe solutions in an edge-covering setting and provide a polynomial-time algorithm for computing them. We suggested these originally in (Obscura et al. ALMOB 2018)

Helsingin yliopiston digitaalinen arkisto

Genome Assembly, from Practice to Theory: Safe, Complete and Linear-Time

Author: Cairo Massimo
Rizzi Romeo
Tomescu Alexandru
Zirondelli Elia C.
Publication venue: Schloss Dagstuhl - Leibniz-Zentrum für Informatik
Publication date: 08/11/2020
Field of study

Peer reviewe

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Catalogo dei prodotti della ricerca

Helsingin yliopiston digitaalinen arkisto

An Optimal O(nm) Algorithm for Enumerating All Walks Common to All Closed Edge-covering Walks of a Graph

Author: Acosta Nidia Obscura
Cairo Massimo
Medvedev Paul
Rizzi Romeo
Tomescu Alexandru I.
Publication venue
Publication date: 01/01/2019
Field of study

In this article, we consider the following problem. Given a directed graph G, output all walks of G that are sub-walks of all closed edge-covering walks of G. This problem was first considered by Tomescu and Medvedev (RECOMB 2016), who characterized these walks through the notion of omnitig. Omnitigs were shown to be relevant for the genome assembly problem from bioinformatics, where a genome sequence must be assembled from a set of reads from a sequencing experiment. Tomescu and Medvedev (RECOMB 2016) also proposed an algorithm for listing all maximal omnitigs, by launching an exhaustive visit from every edge. In this article, we prove new insights about the structure of omnitigs and solve several open questions about them. We combine these to achieve an O(nm)-time algorithm for outputting all the maximal omnitigs of a graph (with n nodes and m edges). This is also optimal, as we show families of graphs whose total omnitig length is Omega(nm). We implement this algorithm arid show that it is 9-12 times faster in practice than the one of Tomescu and Medvedev (RECOMB 2016).Peer reviewe

Aaltodoc Publication Archive

Catalogo dei prodotti della ricerca

Helsingin yliopiston digitaalinen arkisto

A safe and complete algorithm for metagenomic assembly

Author: A Schrijver
Alexandru I. Tomescu
B Haider
C Kingsford
D Eppstein
DR Zerbino
E Boros
E Kapun
EW Myers
FM Pajouh
G Narzisi
GF Italiano
GW Tyson
HN Gabow
IP Lysov
J Butler
J Laserson
J Qin
JC Venter
JD Kececioglu
JR Miller
JT Simpson
JT Simpson
JT Simpson
K Cechlárová
K-M Chao
M Costa
M Crochemore
M Vingron
M Vingron
MC Costa
N Nagarajan
N Nagarajan
Nidia Obscura Acosta
P Medvedev
P Medvedev
P Veiga
PA Pevzner
PJ Turnbaugh
R Li
R Zenklusen
RM Idury
S Boisvert
S Koren
T Namiki
V Lacko
V Mäkinen
Veli Mäkinen
Y Peng
Y Peng
Z Iqbal
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Safety in multi-assembly via paths appearing in all path covers of a DAG

Author: Caceres Reyes Manuel Ariel
Cairo Massimo
Husic Edin
Mumey Brendan
Rizzi Romeo
Sahlin Kristoffer
Tomescu Alexandru
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2022
Field of study

Peer reviewe

Helsingin yliopiston digitaalinen arkisto

Safety in s-t Paths, Trails and Walks

Author: Cairo Massimo
Khan Shahbaz
Rizzi Romeo
Schmidt Sebastian Stefan
Tomescu Alexandru
Publication venue
Publication date: 17/07/2020
Field of study

Given a directed graph G and a pair of nodes s and t, an s-t bridge of G is an edge whose removal breaks all s-t paths of G (and thus appears in all s-t paths). Computing all s-t bridges of G is a basic graph problem, solvable in linear time. In this paper, we consider a natural generalisation of this problem, with the notion of “safety” from bioinformatics. We say that a walk W is safe with respect to a set W' of s-t walks, if W is a subwalk of all walks in W'. We start by considering the maximal safe walks when consists of: all s-t paths, all s-t trails, or all s-t walks of G. We show that the solutions for the first two problems immediately follow from finding all s-t bridges after incorporating simple characterisations. However, solving the third problem requires non-trivial techniques for incorporating its characterisation. In particular, we show that there exists a compact representation computable in linear time, that allows outputting all maximal safe walks in time linear in their length. Our solutions also directly extend to multigraphs, except for the second problem, which requires a more involved approach. We further generalise these problems, by assuming that safety is defined only with respect to a subset of visible edges. Here we prove a dichotomy between the s-t paths and s-t trails cases, and the s-t walks case: the former two are NP-hard, while the latter is solvable with the same complexity as when all edges are visible. We also show that the same complexity results hold for the analogous generalisations of s-t articulation points (nodes appearing in all s-t paths). We thus obtain the best possible results for natural “safety”-generalisations of these two fundamental graph problems. Moreover, our algorithms are simple and do not employ any complex data structures, making them ideal for use in practice.Peer reviewe

arXiv.org e-Print Archive

Catalogo dei prodotti della ricerca

Helsingin yliopiston digitaalinen arkisto