1,596 research outputs found

    Energy-Efficient Algorithms

    Full text link
    We initiate the systematic study of the energy complexity of algorithms (in addition to time and space complexity) based on Landauer's Principle in physics, which gives a lower bound on the amount of energy a system must dissipate if it destroys information. We propose energy-aware variations of three standard models of computation: circuit RAM, word RAM, and transdichotomous RAM. On top of these models, we build familiar high-level primitives such as control logic, memory allocation, and garbage collection with zero energy complexity and only constant-factor overheads in space and time complexity, enabling simple expression of energy-efficient algorithms. We analyze several classic algorithms in our models and develop low-energy variations: comparison sort, insertion sort, counting sort, breadth-first search, Bellman-Ford, Floyd-Warshall, matrix all-pairs shortest paths, AVL trees, binary heaps, and dynamic arrays. We explore the time/space/energy trade-off and develop several general techniques for analyzing algorithms and reducing their energy complexity. These results lay a theoretical foundation for a new field of semi-reversible computing and provide a new framework for the investigation of algorithms.Comment: 40 pages, 8 pdf figures, full version of work published in ITCS 201

    A sparse octree gravitational N-body code that runs entirely on the GPU processor

    Get PDF
    We present parallel algorithms for constructing and traversing sparse octrees on graphics processing units (GPUs). The algorithms are based on parallel-scan and sort methods. To test the performance and feasibility, we implemented them in CUDA in the form of a gravitational tree-code which completely runs on the GPU.(The code is publicly available at: http://castle.strw.leidenuniv.nl/software.html) The tree construction and traverse algorithms are portable to many-core devices which have support for CUDA or OpenCL programming languages. The gravitational tree-code outperforms tuned CPU code during the tree-construction and shows a performance improvement of more than a factor 20 overall, resulting in a processing rate of more than 2.8 million particles per second.Comment: Accepted version. Published in Journal of Computational Physics. 35 pages, 12 figures, single colum

    Linear pattern matching on sparse suffix trees

    Get PDF
    Packing several characters into one computer word is a simple and natural way to compress the representation of a string and to speed up its processing. Exploiting this idea, we propose an index for a packed string, based on a {\em sparse suffix tree} \cite{KU-96} with appropriately defined suffix links. Assuming, under the standard unit-cost RAM model, that a word can store up to logσn\log_{\sigma}n characters (σ\sigma the alphabet size), our index takes O(n/logσn)O(n/\log_{\sigma}n) space, i.e. the same space as the packed string itself. The resulting pattern matching algorithm runs in time O(m+r2+rocc)O(m+r^2+r\cdot occ), where mm is the length of the pattern, rr is the actual number of characters stored in a word and occocc is the number of pattern occurrences

    Parallel Weighted Random Sampling

    Get PDF
    Data structures for efficient sampling from a set of weighted items are an important building block of many applications. However, few parallel solutions are known. We close many of these gaps both for shared-memory and distributed-memory machines. We give efficient, fast, and practicable algorithms for sampling single items, k items with/without replacement, permutations, subsets, and reservoirs. We also give improved sequential algorithms for alias table construction and for sampling with replacement. Experiments on shared-memory parallel machines with up to 158 threads show near linear speedups both for construction and queries

    Realizable paths and the NL vs L problem

    Get PDF
    A celebrated theorem of Savitch [Savitch'70] states that NSPACE(S) is contained in DSPACE(S²). In particular, Savitch gave a deterministic algorithm to solve ST-Connectivity (an NL-complete problem) using O({log}²{n}) space, implying NL (non-deterministic logspace) is contained in DSPACE({log}²{n}). While Savitch's theorem itself has not been improved in the last four decades, several graph connectivity problems are shown to lie between L and NL, providing new insights into the space-bounded complexity classes. All the connectivity problems considered in the literature so far are essentially special cases of ST-Connectivity. In this dissertation, we initiate the study of auxiliary PDAs as graph connectivity problems and define sixteen different "graph realizability problems" and study their relationships. The complexity of these connectivity problems lie between L (logspace) and P (polynomial time). ST-Realizability, the most general graph realizability problem is P-complete. 1DSTREAL(poly), the most specific graph realizability problem is L-complete. As special cases of our graph realizability problems we define two natural problems, Balanced ST-Connectivity and Positive Balanced ST-Connectivity, that lie between L and NL. We study the space complexity of SGSLOGCFL, a graph realizability problem lying between L and LOGCFL. We define generalizations of graph squaring and transitive closure, present efficient parallel algorithms for SGSLOGCFL and use the techniques of Trifonov to show that SGSLOGCFL is contained in DSPACE(lognloglogn). This implies that Balanced ST-Connectivity is contained in DSPACE(lognloglogn). We conclude with several interesting new research directions.PhDCommittee Chair: Richard Lipton; Committee Member: Anna Gal; Committee Member: Maria-Florina Balcan; Committee Member: Merrick Furst; Committee Member: William Coo
    corecore