Search CORE

123 research outputs found

Detecting Superbubbles in Assembly Graphs

Author: Onodera Taku
Sadakane Kunihiko
Shibuya Tetsuo
Publication venue
Publication date: 30/07/2013
Field of study

We introduce a new concept of a subgraph class called a superbubble for analyzing assembly graphs, and propose an efficient algorithm for detecting it. Most assembly algorithms utilize assembly graphs like the de Bruijn graph or the overlap graph constructed from reads. From these graphs, many assembly algorithms first detect simple local graph structures (motifs), such as tips and bubbles, mainly to find sequencing errors. These motifs are easy to detect, but they are sometimes too simple to deal with more complex errors. The superbubble is an extension of the bubble, which is also important for analyzing assembly graphs. Though superbubbles are much more complex than ordinary bubbles, we show that they can be efficiently enumerated. We propose an average-case linear time algorithm (i.e., O(n+m) for a graph with n vertices and m edges) for graphs with a reasonable model, though the worst-case time complexity of our algorithm is quadratic (i.e., O(n(n+m))). Moreover, the algorithm is practically very fast: Our experiments show that our algorithm runs in reasonable time with a single CPU core even against a very large graph of a whole human genome.Comment: Peer-reviewed and presented as part of the 13th Workshop on Algorithms in Bioinformatics (WABI2013

arXiv.org e-Print Archive

CiteSeerX

eScholarship - University of California

Indexing Graph Search Trees and Applications

Author: Chakraborty Sankardeep
Sadakane Kunihiko
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 44th International Symposium on Mathematical Foundations of Computer Science (MFCS 2019)
Publication date: 01/01/2019
Field of study

We consider the problem of compactly representing the Depth First Search (DFS) tree of a given undirected or directed graph having n vertices and m edges while supporting various DFS related queries efficiently in the RAM with logarithmic word size. We study this problem in two well-known models: indexing and encoding models. While most of these queries can be supported easily in constant time using O(n lg n) bits of extra space, our goal here is, more specifically, to beat this trivial O(n lg n) bit space bound, yet not compromise too much on the running time of these queries. In the indexing model, the space bound of our solution involves the quantity m, hence, we obtain different bounds for sparse and dense graphs respectively. In the encoding model, we first give a space lower bound, followed by an almost optimal data structure with extremely fast query time. Central to our algorithm is a partitioning of the DFS tree into connected subtrees, and a compact way to store these connections. Finally, we also apply these techniques to compactly index the shortest path structure, biconnectivity structures among others

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Compression with the tudocomp Framework

Author: Dinklage Patrick
Fischer Johannes
Sadakane Kunihiko
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 16th International Symposium on Experimental Algorithms (SEA 2017)
Publication date: 01/01/2017
Field of study

We present a framework facilitating the implementation and comparison of text compression algorithms. We evaluate its features by a case study on two novel compression algorithms based on the Lempel-Ziv compression schemes that perform well on highly repetitive texts

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Storing Set Families More Compactly with Top ZDDs

Author: Denzumi Shuhei
Matsuda Kotaro
Sadakane Kunihiko
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 18th International Symposium on Experimental Algorithms (SEA 2020)
Publication date: 01/01/2020
Field of study

Zero-suppressed Binary Decision Diagrams (ZDDs) are data structures for representing set families in a compressed form. With ZDDs, many valuable operations on set families can be done in time polynomial in ZDD size. In some cases, however, the size of ZDDs for representing large set families becomes too huge to store them in the main memory. This paper proposes top ZDD, a novel representation of ZDDs which uses less space than existing ones. The top ZDD is an extension of top tree, which compresses trees, to compress directed acyclic graphs by sharing identical subgraphs. We prove that navigational operations on ZDDs can be done in time poly-logarithmic in ZDD size, and show that there exist set families for which the size of the top ZDD is exponentially smaller than that of the ZDD. We also show experimentally that our top ZDDs have smaller size than ZDDs for real data

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

MEGAHIT: An ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph

Author: Lam Tak-Wah
Li Dinghua
Liu Chi-Man
Luo Ruibang
Sadakane Kunihiko
Publication venue
Publication date: 23/12/2014
Field of study

MEGAHIT is a NGS de novo assembler for assembling large and complex metagenomics data in a time- and cost-efficient manner. It finished assembling a soil metagenomics dataset with 252Gbps in 44.1 hours and 99.6 hours on a single computing node with and without a GPU, respectively. MEGAHIT assembles the data as a whole, i.e., it avoids pre-processing like partitioning and normalization, which might compromise on result integrity. MEGAHIT generates 3 times larger assembly, with longer contig N50 and average contig length than the previous assembly. 55.8% of the reads were aligned to the assembly, which is 4 times higher than the previous. The source code of MEGAHIT is freely available at https://github.com/voutcn/megahit under GPLv3 license.Comment: 2 pages, 2 tables, 1 figure, submitted to Oxford Bioinformatics as an Application Not

arXiv.org e-Print Archive

HKU Scholars Hub

連続確率分布枝重み付きDAGに対する最長路長さ分布の計算 (理論計算機科学の深化と応用)

Author: Ando Ei
Ono Hirotaka
Sadakane Kunihiko
Yamashita Masafumi
Publication venue: 京都大学数理解析研究所
Publication date: 01/05/2009
Field of study

Kyoto University Research Information Repository

Improving the Speed of LZ77 Compression by Hashing and Suffix Sorting

Author: IMAI Hiroshi
SADAKANE Kunihiko
Publication venue: 'The Institute of Electronics, Information and Communication Engineers'
Publication date: 25/12/2000
Field of study

Two new algorithms for improving the speed of the LZ77 compression are proposed. One is based on a new hashing algorithm named two-level hashing that enables fast longest match searching from a sliding dictionary, and the other uses suffix sorting. The former is suitable for small dictionaries and it significantly improves the speed of gzip, which uses a naive hashing algorithm. The latter is suitable for large dictionaries which improve compression ratio for large files. We also experiment on the compression ratio and the speed of block sorting compression, which uses suffix sorting in its compression algorithm. The results show that the LZ77 using the two-level hash is suitable for small dictionaries, the LZ77 using suffix sorting is good for large dictionaries when fast decompression speed and efficient use of memory are necessary, and block sorting is good for large dictionaries.PAPE