7 research outputs found

    Efficient Algorithms with Asymmetric Read and Write Costs

    Get PDF
    In several emerging technologies for computer memory (main memory), the cost of reading is significantly cheaper than the cost of writing. Such asymmetry in memory costs poses a fundamentally different model from the RAM for algorithm design. In this paper we study lower and upper bounds for various problems under such asymmetric read and write costs. We consider both the case in which all but O(1) memory has asymmetric cost, and the case of a small cache of symmetric memory. We model both cases using the (M,omega)-ARAM, in which there is a small (symmetric) memory of size M and a large unbounded (asymmetric) memory, both random access, and where reading from the large memory has unit cost, but writing has cost omega >> 1. For FFT and sorting networks we show a lower bound cost of Omega(omega*n*log_{omega*M}(n)), which indicates that it is not possible to achieve asymptotic improvements with cheaper reads when omega is bounded by a polynomial in M. Moreover, there is an asymptotic gap (of min(omega,log(n)/log(omega*M)) between the cost of sorting networks and comparison sorting in the model. This contrasts with the RAM, and most other models, in which the asymptotic costs are the same. We also show a lower bound for computations on an n*n diamond DAG of Omega(omega*n^2/M) cost, which indicates no asymptotic improvement is achievable with fast reads. However, we show that for the minimum edit distance problem (and related problems), which would seem to be a diamond DAG, we can beat this lower bound with an algorithm with only O(omega*n^2/(M*min(omega^{1/3},M^{1/2}))) cost. To achieve this we make use of a "path sketch" technique that is forbidden in a strict DAG computation. Finally, we show several interesting upper bounds for shortest path problems, minimum spanning trees, and other problems. A common theme in many of the upper bounds is that they require redundant computation and a tradeoff between reads and writes

    Implicit Decomposition for Write-Efficient Connectivity Algorithms

    Full text link
    The future of main memory appears to lie in the direction of new technologies that provide strong capacity-to-performance ratios, but have write operations that are much more expensive than reads in terms of latency, bandwidth, and energy. Motivated by this trend, we propose sequential and parallel algorithms to solve graph connectivity problems using significantly fewer writes than conventional algorithms. Our primary algorithmic tool is the construction of an o(n)o(n)-sized "implicit decomposition" of a bounded-degree graph GG on nn nodes, which combined with read-only access to GG enables fast answers to connectivity and biconnectivity queries on GG. The construction breaks the linear-write "barrier", resulting in costs that are asymptotically lower than conventional algorithms while adding only a modest cost to querying time. For general non-sparse graphs on mm edges, we also provide the first o(m)o(m) writes and O(m)O(m) operations parallel algorithms for connectivity and biconnectivity. These algorithms provide insight into how applications can efficiently process computations on large graphs in systems with read-write asymmetry

    A Lower Bound Technique for Communication in BSP

    Get PDF
    Communication is a major factor determining the performance of algorithms on current computing systems; it is therefore valuable to provide tight lower bounds on the communication complexity of computations. This paper presents a lower bound technique for the communication complexity in the bulk-synchronous parallel (BSP) model of a given class of DAG computations. The derived bound is expressed in terms of the switching potential of a DAG, that is, the number of permutations that the DAG can realize when viewed as a switching network. The proposed technique yields tight lower bounds for the fast Fourier transform (FFT), and for any sorting and permutation network. A stronger bound is also derived for the periodic balanced sorting network, by applying this technique to suitable subnetworks. Finally, we demonstrate that the switching potential captures communication requirements even in computational models different from BSP, such as the I/O model and the LPRAM

    The Parallel Persistent Memory Model

    Full text link
    We consider a parallel computational model that consists of PP processors, each with a fast local ephemeral memory of limited size, and sharing a large persistent memory. The model allows for each processor to fault with bounded probability, and possibly restart. On faulting all processor state and local ephemeral memory are lost, but the persistent memory remains. This model is motivated by upcoming non-volatile memories that are as fast as existing random access memory, are accessible at the granularity of cache lines, and have the capability of surviving power outages. It is further motivated by the observation that in large parallel systems, failure of processors and their caches is not unusual. Within the model we develop a framework for developing locality efficient parallel algorithms that are resilient to failures. There are several challenges, including the need to recover from failures, the desire to do this in an asynchronous setting (i.e., not blocking other processors when one fails), and the need for synchronization primitives that are robust to failures. We describe approaches to solve these challenges based on breaking computations into what we call capsules, which have certain properties, and developing a work-stealing scheduler that functions properly within the context of failures. The scheduler guarantees a time bound of O(W/PA+D(P/PA)log1/fW)O(W/P_A + D(P/P_A) \lceil\log_{1/f} W\rceil) in expectation, where WW and DD are the work and depth of the computation (in the absence of failures), PAP_A is the average number of processors available during the computation, and f1/2f \le 1/2 is the probability that a capsule fails. Within the model and using the proposed methods, we develop efficient algorithms for parallel sorting and other primitives.Comment: This paper is the full version of a paper at SPAA 2018 with the same nam

    Efficient online weighted multi-level paging

    Get PDF
    We study the writeback-aware caching problem, a variant of classic paging where paging requests that modify data and requests that leave data intact are treated differently. We give an O(log^2 k) competitive randomized algorithm, answering an open question of Beckmann et al. [8] and Even et
    corecore