1,596 research outputs found
Energy-Efficient Algorithms
We initiate the systematic study of the energy complexity of algorithms (in
addition to time and space complexity) based on Landauer's Principle in
physics, which gives a lower bound on the amount of energy a system must
dissipate if it destroys information. We propose energy-aware variations of
three standard models of computation: circuit RAM, word RAM, and
transdichotomous RAM. On top of these models, we build familiar high-level
primitives such as control logic, memory allocation, and garbage collection
with zero energy complexity and only constant-factor overheads in space and
time complexity, enabling simple expression of energy-efficient algorithms. We
analyze several classic algorithms in our models and develop low-energy
variations: comparison sort, insertion sort, counting sort, breadth-first
search, Bellman-Ford, Floyd-Warshall, matrix all-pairs shortest paths, AVL
trees, binary heaps, and dynamic arrays. We explore the time/space/energy
trade-off and develop several general techniques for analyzing algorithms and
reducing their energy complexity. These results lay a theoretical foundation
for a new field of semi-reversible computing and provide a new framework for
the investigation of algorithms.Comment: 40 pages, 8 pdf figures, full version of work published in ITCS 201
A sparse octree gravitational N-body code that runs entirely on the GPU processor
We present parallel algorithms for constructing and traversing sparse octrees
on graphics processing units (GPUs). The algorithms are based on parallel-scan
and sort methods. To test the performance and feasibility, we implemented them
in CUDA in the form of a gravitational tree-code which completely runs on the
GPU.(The code is publicly available at:
http://castle.strw.leidenuniv.nl/software.html) The tree construction and
traverse algorithms are portable to many-core devices which have support for
CUDA or OpenCL programming languages. The gravitational tree-code outperforms
tuned CPU code during the tree-construction and shows a performance improvement
of more than a factor 20 overall, resulting in a processing rate of more than
2.8 million particles per second.Comment: Accepted version. Published in Journal of Computational Physics. 35
pages, 12 figures, single colum
Linear pattern matching on sparse suffix trees
Packing several characters into one computer word is a simple and natural way
to compress the representation of a string and to speed up its processing.
Exploiting this idea, we propose an index for a packed string, based on a {\em
sparse suffix tree} \cite{KU-96} with appropriately defined suffix links.
Assuming, under the standard unit-cost RAM model, that a word can store up to
characters ( the alphabet size), our index takes
space, i.e. the same space as the packed string itself.
The resulting pattern matching algorithm runs in time ,
where is the length of the pattern, is the actual number of characters
stored in a word and is the number of pattern occurrences
Parallel Weighted Random Sampling
Data structures for efficient sampling from a set of weighted items are an important building block of many applications. However, few parallel solutions are known. We close many of these gaps both for shared-memory and distributed-memory machines. We give efficient, fast, and practicable algorithms for sampling single items, k items with/without replacement, permutations, subsets, and reservoirs. We also give improved sequential algorithms for alias table construction and for sampling with replacement. Experiments on shared-memory parallel machines with up to 158 threads show near linear speedups both for construction and queries
Realizable paths and the NL vs L problem
A celebrated theorem of Savitch [Savitch'70] states that NSPACE(S) is contained in DSPACE(S²). In particular, Savitch gave a deterministic algorithm to solve ST-Connectivity (an NL-complete problem) using O({log}²{n}) space, implying NL (non-deterministic logspace) is contained in DSPACE({log}²{n}). While Savitch's theorem itself has not been improved in the last four decades, several graph connectivity problems are shown to lie between L and NL, providing new insights into the space-bounded complexity classes. All the connectivity problems considered in the literature so far are essentially special cases of ST-Connectivity.
In this dissertation, we initiate the study of auxiliary PDAs as graph connectivity problems and define sixteen different "graph realizability problems" and study their relationships. The complexity of these connectivity problems lie between L (logspace) and P (polynomial time). ST-Realizability, the most general graph realizability problem is P-complete. 1DSTREAL(poly), the most specific graph realizability problem is L-complete. As special cases of our graph realizability problems we define two natural problems, Balanced ST-Connectivity and Positive Balanced ST-Connectivity, that lie between L and NL.
We study the space complexity of SGSLOGCFL, a graph realizability problem lying between L and LOGCFL. We define generalizations of graph squaring and transitive closure, present efficient parallel algorithms for SGSLOGCFL and use the techniques of Trifonov to show that SGSLOGCFL is contained in DSPACE(lognloglogn). This implies that Balanced ST-Connectivity is contained in DSPACE(lognloglogn). We conclude with several interesting new research directions.PhDCommittee Chair: Richard Lipton; Committee Member: Anna Gal; Committee Member: Maria-Florina Balcan; Committee Member: Merrick Furst; Committee Member: William Coo
- …