Search CORE

9 research outputs found

An O(1) Solution to the Prefix Sum Problem on a Specialized Memory Architecture

Author: Brodnik Andrej
Karlsson Johan
Munro J. Ian
Nilsson Andreas
Publication venue
Publication date: 01/01/2006
Field of study

\Theta(\lg N)

time under the comparison based model of computation.Comment: 12 page

arXiv.org e-Print Archive

CiteSeerX

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Luleå University of Technology Publications

An O(1) solution to the prefix sum problem on a specialized memory architecture

Author: Brodnik Andrej
Karlsson Johan
Munro J. Ian
Nilsson Andreas
Publication venue
Publication date: 20/11/2012
Field of study

In this paper we study the Prefix Sum problem introduced by Fredman. We show that it is possible to perform both update and retrieval in O(1) time simultaneously under a memory model in which individual bits may be shared by several words. We also show that two variants (generalizations) of the problem can be solved optimally in Θ (lgN) time under the comparison based model of computation.4th IFIP International Conference on Theoretical Computer ScienceRed de Universidades con Carreras en Informática (RedUNCI

Servicio de Difusión de la Creación Intelectual

Partial Sums on the Ultra-Wide Word RAM

Author: A Brodnik
A Brodnik
A Farzan
AC Yao
AM Ben-Amram
AM Ben-Amram
BY Ryabko
E Lindholm
G Franceschini
GS Frandsen
H Hampapuram
J Reinders
J Salowe
JI Munro
JWJ Williams
M Pǎtraşcu
ML Fredman
ML Fredman
N Stephens
P Bille
P Bille
PF Dietz
PM Fenwick
R Raman
RE Ladner
T Chen
T Hagerup
T Husfeldt
T Husfeldt
WA Burkhard
WK Hon
Publication venue
Publication date: 01/01/2020
Field of study

We consider the classic partial sums problem on the ultra-wide word RAM model of computation. This model extends the classic

w

-bit word RAM model with special ultrawords of length

w^2

bits that support standard arithmetic and boolean operation and scattered memory access operations that can access

w

(non-contiguous) locations in memory. The ultra-wide word RAM model captures (and idealizes) modern vector processor architectures. Our main result is a new in-place data structure for the partial sum problem that only stores a constant number of ultraword in addition to the input and supports operations in doubly logarithmic time. This matches the best known time bounds for the problem (among polynomial space data structures) while improving the space from superlinear to a constant number of ultrawords. Our results are based on a simple and elegant in-place word RAM data structure, known as the Fenwick tree. Our main technical contribution is a new efficient parallel ultra-wide word RAM implementation of the Fenwick tree, which is likely of independent interest.Comment: Extended abstract appeared at TAMC 202

arXiv.org e-Print Archive

Crossref

Online Research Database In Technology

Practical Trade-Offs for the Prefix-Sum Problem

Author: Pibiri Giulio Ermanno
Venturini Rossano
Publication venue
Publication date: 06/10/2020
Field of study

Given an integer array A, the prefix-sum problem is to answer sum(i) queries that return the sum of the elements in A[0..i], knowing that the integers in A can be changed. It is a classic problem in data structure design with a wide range of applications in computing from coding to databases. In this work, we propose and compare several and practical solutions to this problem, showing that new trade-offs between the performance of queries and updates can be achieved on modern hardware.Comment: Accepted by "Software: Practice and Experience", 202

arXiv.org e-Print Archive

Crossref

Archivio della Ricerca - Università di Pisa

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

Algorithms in the Ultra-Wide Word Model

Author: A Andersson
A Brodnik
A Brodnik
D Pisinger
H Hampapuram
J Munro
JA Fisher
M Crochemore
M Thorup
ML Fredman
P Beame
R Baeza-Yates
R Bellman
RA Baeza-Yates
RM Russell
RN Horspool
S Wu
T Hagerup
TH Cormen
WJ Masek
Y Han
Publication venue
Publication date: 07/12/2014
Field of study

The effective use of parallel computing resources to speed up algorithms in current multi-core parallel architectures remains a difficult challenge, with ease of programming playing a key role in the eventual success of various parallel architectures. In this paper we consider an alternative view of parallelism in the form of an ultra-wide word processor. We introduce the Ultra-Wide Word architecture and model, an extension of the word-RAM model that allows for constant time operations on thousands of bits in parallel. Word parallelism as exploited by the word-RAM model does not suffer from the more difficult aspects of parallel programming, namely synchronization and concurrency. For the standard word-RAM algorithms, the speedups obtained are moderate, as they are limited by the word size. We argue that a large class of word-RAM algorithms can be implemented in the Ultra-Wide Word model, obtaining speedups comparable to multi-threaded computations while keeping the simplicity of programming of the sequential RAM model. We show that this is the case by describing implementations of Ultra-Wide Word algorithms for dynamic programming and string searching. In addition, we show that the Ultra-Wide Word model can be used to implement a nonstandard memory architecture, which enables the sidestepping of lower bounds of important data structure problems such as priority queues and dynamic prefix sums. While similar ideas about operating on large words have been mentioned before in the context of multimedia processors [Thorup 2003], it is only recently that an architecture like the one we propose has become feasible and that details can be worked out.Comment: 28 pages, 5 figures; minor change

arXiv.org e-Print Archive

Crossref

Coordinated Science Laboratory Summary progress report, Sep. - Nov. 1965

Author
Publication venue
Publication date
Field of study

Progress related to surface physics, computer programs, control systems, information science, superconductivity, and other research project

NASA Technical Reports Server

Lower bound techniques for data structures

Author: Pǎtraşcu Mihai
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2008
Field of study

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008.Includes bibliographical references (p. 135-143).We describe new techniques for proving lower bounds on data-structure problems, with the following broad consequences: * the first [omega](lg n) lower bound for any dynamic problem, improving on a bound that had been standing since 1989; * for static data structures, the first separation between linear and polynomial space. Specifically, for some problems that have constant query time when polynomial space is allowed, we can show [omega](lg n/ lg lg n) bounds when the space is O(n - polylog n). Using these techniques, we analyze a variety of central data-structure problems, and obtain improved lower bounds for the following: * the partial-sums problem (a fundamental application of augmented binary search trees); * the predecessor problem (which is equivalent to IP lookup in Internet routers); * dynamic trees and dynamic connectivity; * orthogonal range stabbing. * orthogonal range counting, and orthogonal range reporting; * the partial match problem (searching with wild-cards); * (1 + [epsilon])-approximate near neighbor on the hypercube; * approximate nearest neighbor in the l[infinity] metric. Our new techniques lead to surprisingly non-technical proofs. For several problems, we obtain simpler proofs for bounds that were already known.by Mihai Pǎtraşcu.Ph.D

CiteSeerX

DSpace@MIT

Models for Parallel Computation in Multi-Core, Heterogeneous, and Ultra Wide-Word Architectures

Author: Salinger Alejandro
Publication venue: 'University of Waterloo'
Publication date: 01/01/2013
Field of study

Multi-core processors have become the dominant processor architecture with 2, 4, and 8 cores on a chip being widely available and an increasing number of cores predicted for the future. In addition, the decreasing costs and increasing programmability of Graphic Processing Units (GPUs) have made these an accessible source of parallel processing power in general purpose computing. Among the many research challenges that this scenario has raised are the fundamental problems related to theoretical modeling of computation in these architectures. In this thesis we study several aspects of computation in modern parallel architectures, from modeling of computation in multi-cores and heterogeneous platforms, to multi-core cache management strategies, through the proposal of an architecture that exploits bit-parallelism on thousands of bits. Observing that in practice multi-cores have a small number of cores, we propose a model for low-degree parallelism for these architectures. We argue that assuming a small number of processors (logarithmic in a problem's input size) simplifies the design of parallel algorithms. We show that in this model a large class of divide-and-conquer and dynamic programming algorithms can be parallelized with simple modifications to sequential programs, while achieving optimal parallel speedups. We further explore low-degree-parallelism in computation, providing evidence of fundamental differences in practice and theory between systems with a sublinear and linear number of processors, and suggesting a sharp theoretical gap between the classes of problems that are efficiently parallelizable in each case. Efficient strategies to manage shared caches play a crucial role in multi-core performance. We propose a model for paging in multi-core shared caches, which extends classical paging to a setting in which several threads share the cache. We show that in this setting traditional cache management policies perform poorly, and that any effective strategy must partition the cache among threads, with a partition that adapts dynamically to the demands of each thread. Inspired by the shared cache setting, we introduce the minimum cache usage problem, an extension to classical sequential paging in which algorithms must account for the amount of cache they use. This cache-aware model seeks algorithms with good performance in terms of faults and the amount of cache used, and has applications in energy efficient caching and in shared cache scenarios. The wide availability of GPUs has added to the parallel power of multi-cores, however, most applications underutilize the available resources. We propose a model for hybrid computation in heterogeneous systems with multi-cores and GPU, and describe strategies for generic parallelization and efficient scheduling of a large class of divide-and-conquer algorithms. Lastly, we introduce the Ultra-Wide Word architecture and model, an extension of the word-RAM model, that allows for constant time operations on thousands of bits in parallel. We show that a large class of existing algorithms can be implemented in the Ultra-Wide Word model, achieving speedups comparable to those of multi-threaded computations, while avoiding the more difficult aspects of parallel programming

University of Waterloo's Institutional Repository