Search CORE

464 research outputs found

Cache-Oblivious Selection in Sorted X+Y Matrices

Author: de Berg Mark
Thite Shripad
Publication venue
Publication date: 01/01/2008
Field of study

Let X[0..n-1] and Y[0..m-1] be two sorted arrays, and define the mxn matrix A by A[j][i]=X[i]+Y[j]. Frederickson and Johnson gave an efficient algorithm for selecting the k-th smallest element from A. We show how to make this algorithm IO-efficient. Our cache-oblivious algorithm performs O((m+n)/B) IOs, where B is the block size of memory transfers

arXiv.org e-Print Archive

CiteSeerX

Pure OAI Repository

Caltech Authors

Algorithmic ramifications of prefetching in memory hierarchy

Author: Sen Sandeep
Verma Akshat
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2006
Field of study

External Memory models, most notable being the I-O Model [3], capture the effects of memory hierarchy and aid in algorithm design. More than a decade of architectural advancements have led to new features not captured in the I-O model - most notably the prefetching capability. We propose a relatively simple Prefetch model that incorporates data prefetching in the traditional I-O models and show how to design algorithms that can attain close to peak memory bandwidth. Unlike (the inverse of) memory latency, the memory bandwidth is much closer to the processing speed, thereby, intelligent use of prefetching can considerably mitigate the I-O bottleneck. For some fundamental problems, our algorithms attain running times approaching that of the idealized Random Access Machines under reasonable assumptions. Our work also explains the significantly superior performance of the I-O efficient algorithms in systems that support prefetching compared to ones that do not

Efficient GPU Implementation of Affine Index Permutations on Arrays

Author: Bouverot-Dupuis Mathis
Sheeran Mary
Publication venue
Publication date: 01/01/2023
Field of study

Optimal usage of the memory system is a key element of fast GPU algorithms. Unfortunately many common algorithms fail in this regard despite exhibiting great regularity in memory access patterns. In this paper we propose efficient kernels to permute the elements of an array. We handle a class of permutations known as Bit Matrix Multiply Complement (BMMC) permutations, for which we design kernels of speed comparable to that of a simple array copy. This is a first step towards implementing a set of array combinators based on these permutations

Chalmers Research

Efficient GPU implementation of a class of array permutations

Author: Bouverot-Dupuis Mathis
Sheeran Mary
Publication venue
Publication date: 13/06/2023
Field of study

Optimal usage of the memory system is a key element of fast GPU algorithms. Unfortunately many common algorithms fail in this regard despite exhibiting great regularity in memory access patterns. In this paper we propose efficient kernels to permute the elements of an array, which can be used to improve the access patterns of many algorithms. We handle a class of permutations known as Bit Matrix Multiply Complement (BMMC) permutations, for which we design kernels of speed comparable to that of a simple array copy. This is a first step towards implementing a set of array combinators based on these permutations.Comment: Submitted to ACM SIGPLAN International Workshop on Functional High-Performance and Numerical Computing 202

arXiv.org e-Print Archive

WASP-43b: The closest-orbiting hot Jupiter

Author: A. Collier Cameron
A. H. M. J. Triaud
A. M. S. Smith
B. Smalley
Bakos
Barker
Barnes
Blackwell
Borucki
Brown
C. Hellier
Charbonneau
Claret
Collier Cameron
Collier Cameron
D. Pollacco
D. Queloz
D. R. Anderson
D. Ségransan
E. Jehin
F. Pepe
Fabrycky
Ford
Gillon
Girardi
Guillochon
Hebb
Hellier
Hellier
J. Southworth
Levrard
Lissauer
M. Gillon
M. Lendl
Magain
Matsumura
Matsumura
Maxted
Nagasawa
Naoz
P. F. L. Maxted
Penev
Pollacco
Pollacco
Queloz
R. G. West
Rasio
S. Udry
Sasselov
Schneider
Southworth
Southworth
Triaud
Winn
Winn
Winn
Zacharias
Publication venue: 'EDP Sciences'
Publication date: 01/01/2011
Field of study

We report the discovery of WASP-43b, a hot Jupiter transiting a K7V star every 0.81 d. At 0.6-Msun the host star has the lowest mass of any star hosting a hot Jupiter. It also shows a 15.6-d rotation period. The planet has a mass of 1.8 Mjup, a radius of 0.9 Rjup, and with a semi-major axis of only 0.014 AU has the smallest orbital distance of any known hot Jupiter. The discovery of such a planet around a K7V star shows that planets with apparently short remaining lifetimes owing to tidal decay of the orbit are also found around stars with deep convection zones.Comment: 4 page

arXiv.org e-Print Archive

Queen's University Belfast Research Portal

Crossref

EDP Sciences OAI-PMH repository (1.2.0)

University of Birmingham Research Portal

Warwick Research Archives Portal Repository

Open Repository and Bibliography - Liège

University of St. Andrews - Pure

Archive ouverte UNIGE

Leicester Research Archive

Hierarchical Bin Buffering: Online Local Moments for Dynamic External Memory Arrays

Author: Chakrabarti K.
Daniel Lemire
Geffner S.
Gray J.
Lemire D.
Li B.-C.
Moerkotte G.
Owen Kaser
Schmidt R. R.
Scott D.
Vitter J. S.
Zhou F.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/05/2008
Field of study

Local moments are used for local regression, to compute statistical measures such as sums, averages, and standard deviations, and to approximate probability distributions. We consider the case where the data source is a very large I/O array of size n and we want to compute the first N local moments, for some constant N. Without precomputation, this requires O(n) time. We develop a sequence of algorithms of increasing sophistication that use precomputation and additional buffer space to speed up queries. The simpler algorithms partition the I/O array into consecutive ranges called bins, and they are applicable not only to local-moment queries, but also to algebraic queries (MAX, AVERAGE, SUM, etc.). With N buffers of size sqrt{n}, time complexity drops to O(sqrt n). A more sophisticated approach uses hierarchical buffering and has a logarithmic time complexity (O(b log_b n)), when using N hierarchical buffers of size n/b. Using Overlapped Bin Buffering, we show that only a single buffer is needed, as with wavelet-based algorithms, but using much less storage. Applications exist in multidimensional and statistical databases over massive data sets, interactive image processing, and visualization

arXiv.org e-Print Archive

R-libre

Crossref

Structured Permuting in Place on Parallel Disk Systems

Author: Wisniewski Leonard F
Publication venue: Dartmouth Digital Commons
Publication date: 01/01/1995
Field of study

The ability to perform permutations of large data sets in place reduces the amount of necessary available disk storage. The simplest way to perform a permutation often is to read the records of a data set from a source portion of data storage, permute them in memory, and write them to a separate target portion of the same size. It can be quite expensive, however, to provide disk storage that is twice the size of very large data sets. Permuting in place reduces the expense by using only a small amount of extra disk storage beyond the size of the data set. This paper features in-place algorithms for commonly used structured permutations. We have developed an asymptotically optimal algorithm for performing BMMC (bit-matrix-multiply/complement) permutations in place that requires at most \frac{2N}{BD}\left( 2\ceil{\frac{\rank{\gamma}}{\lg (M/B)}} + \frac{7}{2}\right) parallel disk accesses, as long as

M \geq 2BD

, where

N

is the number of records in the data set,

M

is the number of records that can fit in memory,

D

is the number of disks,

B

is the number of records in a block, and

\gamma

is the lower left

\lg (N/B) \times \lg B

submatrix of the characteristic matrix for the permutation. This algorithm uses

N+M

records of disk storage and requires only a constant factor more parallel disk accesses and insignificant additional computation than a previously published asymptotically optimal algorithm that uses

2N

records of disk storage. We also give algorithms to perform mesh and torus permutations on a

d

-dimensional mesh. The in-place algorithm for mesh permutations requires at most 3\ceil{N/BD} parallel I/Os and the in-place algorithm for torus permutations uses at most

4dN/BD

parallel I/Os. The algorithms for mesh and torus permutations require no extra disk space as long as the memory size~

M

is at least~

3BD

. The torus algorithm improves upon the previous best algorithm in terms of both time and space

CiteSeerX

Dartmouth Digital Commons (Dartmouth College)

Permuting and Batched Geometric Lower Bounds in the I/O Model

Author: Afshani Peyman
van Duijn Ingo
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 25th Annual European Symposium on Algorithms (ESA 2017)
Publication date: 01/01/2017
Field of study

We study permuting and batched orthogonal geometric reporting problems in the External Memory Model (EM), assuming indivisibility of the input records. Our main results are twofold. First, we prove a general simulation result that essentially shows that any permutation algorithm (resp. duplicate removal algorithm) that does alpha*N/B I/Os (resp. to remove a fraction of the existing duplicates) can be simulated with an algorithm that does alpha phases where each phase reads and writes each element once, but using a factor alpha smaller block size. Second, we prove two lower bounds for batched rectangle stabbing and batched orthogonal range reporting queries. Assuming a short cache, we prove very high lower bounds that currently are not possible with the existing techniques under the tall cache assumption

Dagstuhl Research Online Publication Server