Search CORE

5,082 research outputs found

Run Generation Revisited: What Goes Up May or May Not Come Down

Author: A Aggarwal
BJ Gassner
CL Mallows
DE Knuth
DE Knuth
EH Friend
G Graefe
MA Goetz
V Estivill-Castro
W Frazer
X Martinez-Palau
YC Lin
YC Lin
Publication venue
Publication date: 24/04/2015
Field of study

In this paper, we revisit the classic problem of run generation. Run generation is the first phase of external-memory sorting, where the objective is to scan through the data, reorder elements using a small buffer of size M , and output runs (contiguously sorted chunks of elements) that are as long as possible. We develop algorithms for minimizing the total number of runs (or equivalently, maximizing the average run length) when the runs are allowed to be sorted or reverse sorted. We study the problem in the online setting, both with and without resource augmentation, and in the offline setting. (1) We analyze alternating-up-down replacement selection (runs alternate between sorted and reverse sorted), which was studied by Knuth as far back as 1963. We show that this simple policy is asymptotically optimal. Specifically, we show that alternating-up-down replacement selection is 2-competitive and no deterministic online algorithm can perform better. (2) We give online algorithms having smaller competitive ratios with resource augmentation. Specifically, we exhibit a deterministic algorithm that, when given a buffer of size 4M , is able to match or beat any optimal algorithm having a buffer of size M . Furthermore, we present a randomized online algorithm which is 7/4-competitive when given a buffer twice that of the optimal. (3) We demonstrate that performance can also be improved with a small amount of foresight. We give an algorithm, which is 3/2-competitive, with foreknowledge of the next 3M elements of the input stream. For the extreme case where all future elements are known, we design a PTAS for computing the optimal strategy a run generation algorithm must follow. (4) Finally, we present algorithms tailored for nearly sorted inputs which are guaranteed to have optimal solutions with sufficiently long runs

arXiv.org e-Print Archive

Crossref

A Bicriteria Approximation for the Reordering Buffer Problem

Author: Barman Siddharth
Chawla Shuchi
Umboh Seeun
Publication venue
Publication date: 25/04/2012
Field of study

In the reordering buffer problem (RBP), a server is asked to process a sequence of requests lying in a metric space. To process a request the server must move to the corresponding point in the metric. The requests can be processed slightly out of order; in particular, the server has a buffer of capacity k which can store up to k requests as it reads in the sequence. The goal is to reorder the requests in such a manner that the buffer constraint is satisfied and the total travel cost of the server is minimized. The RBP arises in many applications that require scheduling with a limited buffer capacity, such as scheduling a disk arm in storage systems, switching colors in paint shops of a car manufacturing plant, and rendering 3D images in computer graphics. We study the offline version of RBP and develop bicriteria approximations. When the underlying metric is a tree, we obtain a solution of cost no more than 9OPT using a buffer of capacity 4k + 1 where OPT is the cost of an optimal solution with buffer capacity k. Constant factor approximations were known previously only for the uniform metric (Avigdor-Elgrabli et al., 2012). Via randomized tree embeddings, this implies an O(log n) approximation to cost and O(1) approximation to buffer size for general metrics. Previously the best known algorithm for arbitrary metrics by Englert et al. (2007) provided an O(log^2 k log n) approximation without violating the buffer constraint.Comment: 13 page

arXiv.org e-Print Archive

CiteSeerX

Efficient Processing of Spatial Joins Using R-Trees

Author: Becker L. A.
Bentley J.L.
Bernhard Seeger
Bureau of the Census
Gfinther O.
Hans-Peter Kriegel
Harada L.
Kriegel H.-P.
Merret T.
Rotem D.
Thomas Brinkhoff
Publication venue
Publication date: 01/01/1993
Field of study

Abstract: In this paper, we show that spatial joins are very suitable to be processed on a parallel hardware platform. The parallel system is equipped with a so-called shared virtual memory which is well-suited for the design and implementation of parallel spatial join algorithms. We start with an algorithm that consists of three phases: task creation, task assignment and parallel task execu-tion. In order to reduce CPU- and I/O-cost, the three phases are processed in a fashion that pre-serves spatial locality. Dynamic load balancing is achieved by splitting tasks into smaller ones and reassigning some of the smaller tasks to idle processors. In an experimental performance compar-ison, we identify the advantages and disadvantages of several variants of our algorithm. The most efficient one shows an almost optimal speed-up under the assumption that the number of disks is sufficiently large. Topics: spatial database systems, parallel database systems

CiteSeerX

Crossref

Open Access LMU

VLSI implementation of a fairness ATM buffer system

Author: Dittmann Lars
Lassen Peter Stuhr
Madsen Jens Kargaard
Nielsen J.V.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1996
Field of study

Crossref

Online Research Database In Technology

Gerbil: A Fast and Memory-Efficient $k$ -mer Counter with GPU-Support

Author: Erbert Marius
Müller-Hannemann Matthias
Rechner Steffen
Publication venue
Publication date: 22/07/2016
Field of study

A basic task in bioinformatics is the counting of

k

-mers in genome strings. The

k

-mer counting problem is to build a histogram of all substrings of length

k

in a given genome sequence. We present the open source

k

-mer counting software Gerbil that has been designed for the efficient counting of

k

-mers for

k\geq32

. Given the technology trend towards long reads of next-generation sequencers, support for large

k

becomes increasingly important. While existing

k

-mer counting tools suffer from excessive memory resource consumption or degrading performance for large

k

, Gerbil is able to efficiently support large

k

without much loss of performance. Our software implements a two-disk approach. In the first step, DNA reads are loaded from disk and distributed to temporary files that are stored at a working disk. In a second step, the temporary files are read again, split into

k

-mers and counted via a hash table approach. In addition, Gerbil can optionally use GPUs to accelerate the counting step. For large

k

, we outperform state-of-the-art open source

k

-mer counting tools for large genome data sets.Comment: A short version of this paper will appear in the proceedings of WABI 201

arXiv.org e-Print Archive

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central