Search CORE

22,982 research outputs found

On-Disk Data Processing: Issues and Future Directions

Author: Mishra Mayank
Somani Arun
Somani Arun
Publication venue
Publication date: 01/01/2017
Field of study

In this paper, we present a survey of "on-disk" data processing (ODDP). ODDP, which is a form of near-data processing, refers to the computing arrangement where the secondary storage drives have the data processing capability. Proposed ODDP schemes vary widely in terms of the data processing capability, target applications, architecture and the kind of storage drive employed. Some ODDP schemes provide only a specific but heavily used operation like sort whereas some provide a full range of operations. Recently, with the advent of Solid State Drives, powerful and extensive ODDP solutions have been proposed. In this paper, we present a thorough review of architectures developed for different on-disk processing approaches along with current and future challenges and also identify the future directions which ODDP can take.Comment: 24 pages, 17 Figures, 3 Table

arXiv.org e-Print Archive

Digital Repository @ Iowa State University (ISU)

Simple I/O-efficient flow accumulation on grid terrains

Author: Haverkort Herman
Janssen Jeffrey
Publication venue
Publication date: 01/01/2012
Field of study

The flow accumulation problem for grid terrains takes as input a matrix of flow directions, that specifies for each cell of the grid to which of its eight neighbours any incoming water would flow. The problem is to compute, for each cell c, from how many cells of the terrain water would reach c. We show that this problem can be solved in O(scan(N)) I/Os for a terrain of N cells. Taking constant factors in the I/O-efficiency into account, our algorithm may be an order of magnitude faster than the previously known algorithm that is based on time-forward processing and needs O(sort(N)) I/Os.Comment: This paper is an exact copy of the paper that appeared in the abstract collection of the Workshop on Massive Data Algorithms, Aarhus, 200

arXiv.org e-Print Archive

Repository TU/e

Pure OAI Repository

Gerbil: A Fast and Memory-Efficient $k$ -mer Counter with GPU-Support

Author: Erbert Marius
Müller-Hannemann Matthias
Rechner Steffen
Publication venue
Publication date: 22/07/2016
Field of study

A basic task in bioinformatics is the counting of

k

-mers in genome strings. The

k

-mer counting problem is to build a histogram of all substrings of length

k

in a given genome sequence. We present the open source

k

-mer counting software Gerbil that has been designed for the efficient counting of

k

-mers for

k\geq32

. Given the technology trend towards long reads of next-generation sequencers, support for large

k

becomes increasingly important. While existing

k

-mer counting tools suffer from excessive memory resource consumption or degrading performance for large

k

, Gerbil is able to efficiently support large

k

without much loss of performance. Our software implements a two-disk approach. In the first step, DNA reads are loaded from disk and distributed to temporary files that are stored at a working disk. In a second step, the temporary files are read again, split into

k

-mers and counted via a hash table approach. In addition, Gerbil can optionally use GPUs to accelerate the counting step. For large

k

, we outperform state-of-the-art open source

k

-mer counting tools for large genome data sets.Comment: A short version of this paper will appear in the proceedings of WABI 201

arXiv.org e-Print Archive

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

FORM version 4.0

Author: Berlekamp
Blumlein
Char
Gailly
Geddes
Gorishnii
J. Kuipers
J. Vollinga
J.A.M. Vermaseren
Knuth
Kuipers
Lewis
Strubbe
T. Ueda
Tentyukov
Tentyukov
Tentyukov
Tentyukov
Vermaseren
Wang
Zassenhaus
Zippel
Publication venue: 'Elsevier BV'
Publication date: 29/03/2012
Field of study

We present version 4.0 of the symbolic manipulation system FORM. The most important new features are manipulation of rational polynomials and the factorization of expressions. Many other new functions and commands are also added; some of them are very general, while others are designed for building specific high level packages, such as one for Groebner bases. New is also the checkpoint facility, that allows for periodic backups during long calculations. Lastly, FORM 4.0 has become available as open source under the GNU General Public License version 3.Comment: 26 pages. Uses axodra

arXiv.org e-Print Archive

Crossref

I/O-optimal algorithms on grid graphs

Author: Haverkort Herman
Publication venue
Publication date: 01/01/2012
Field of study

Given a graph of which the n vertices form a regular two-dimensional grid, and in which each (possibly weighted and/or directed) edge connects a vertex to one of its eight neighbours, the following can be done in O(scan(n)) I/Os, provided M = Omega(B^2): computation of shortest paths with non-negative edge weights from a single source, breadth-first traversal, computation of a minimum spanning tree, topological sorting, time-forward processing (if the input is a plane graph), and an Euler tour (if the input graph is a tree). The minimum-spanning tree algorithm is cache-oblivious. The best previously published algorithms for these problems need Theta(sort(n)) I/Os. Estimates of the actual I/O volume show that the new algorithms may often be very efficient in practice.Comment: 12 pages' extended abstract plus 12 pages' appendix with details, proofs and calculations. Has not been published in and is currently not under review of any conference or journa

arXiv.org e-Print Archive

Repository TU/e

Pure OAI Repository