Search CORE

1,847 research outputs found

The Parallel Persistent Memory Model

Author: Berryhill R.
Blelloch G. E.
Buettner M.
Chauhan H.
Herlihy M.
JaJa J.
Lee S. K.
Meena J. S.
Nawab F.
Pelley S.
Woude J. Van Der
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 13/06/2018
Field of study

We consider a parallel computational model that consists of

P

processors, each with a fast local ephemeral memory of limited size, and sharing a large persistent memory. The model allows for each processor to fault with bounded probability, and possibly restart. On faulting all processor state and local ephemeral memory are lost, but the persistent memory remains. This model is motivated by upcoming non-volatile memories that are as fast as existing random access memory, are accessible at the granularity of cache lines, and have the capability of surviving power outages. It is further motivated by the observation that in large parallel systems, failure of processors and their caches is not unusual. Within the model we develop a framework for developing locality efficient parallel algorithms that are resilient to failures. There are several challenges, including the need to recover from failures, the desire to do this in an asynchronous setting (i.e., not blocking other processors when one fails), and the need for synchronization primitives that are robust to failures. We describe approaches to solve these challenges based on breaking computations into what we call capsules, which have certain properties, and developing a work-stealing scheduler that functions properly within the context of failures. The scheduler guarantees a time bound of

O(W/P_A + D(P/P_A) \lceil\log_{1/f} W\rceil)

in expectation, where

W

and

D

are the work and depth of the computation (in the absence of failures),

P_A

is the average number of processors available during the computation, and

f \le 1/2

is the probability that a capsule fails. Within the model and using the proposed methods, we develop efficient algorithms for parallel sorting and other primitives.Comment: This paper is the full version of a paper at SPAA 2018 with the same nam

arXiv.org e-Print Archive

Crossref

DSpace@MIT

Shining Light On Shadow Stacks

Author: Burow Nathan
Payer Mathias
Zhang Xinping
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/03/2019
Field of study

Control-Flow Hijacking attacks are the dominant attack vector against C/C++ programs. Control-Flow Integrity (CFI) solutions mitigate these attacks on the forward edge,i.e., indirect calls through function pointers and virtual calls. Protecting the backward edge is left to stack canaries, which are easily bypassed through information leaks. Shadow Stacks are a fully precise mechanism for protecting backwards edges, and should be deployed with CFI mitigations. We present a comprehensive analysis of all possible shadow stack mechanisms along three axes: performance, compatibility, and security. For performance comparisons we use SPEC CPU2006, while security and compatibility are qualitatively analyzed. Based on our study, we renew calls for a shadow stack design that leverages a dedicated register, resulting in low performance overhead, and minimal memory overhead, but sacrifices compatibility. We present case studies of our implementation of such a design, Shadesmar, on Phoronix and Apache to demonstrate the feasibility of dedicating a general purpose register to a security monitor on modern architectures, and the deployability of Shadesmar. Our comprehensive analysis, including detailed case studies for our novel design, allows compiler designers and practitioners to select the correct shadow stack design for different usage scenarios.Comment: To Appear in IEEE Security and Privacy 201

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Automated Verification of Practical Garbage Collectors

Author: Hawblitzel Chris
Petrank Erez
Publication venue: 'Logical Methods in Computer Science e.V.'
Publication date: 01/01/2009
Field of study

Garbage collectors are notoriously hard to verify, due to their low-level interaction with the underlying system and the general difficulty in reasoning about reachability in graphs. Several papers have presented verified collectors, but either the proofs were hand-written or the collectors were too simplistic to use on practical applications. In this work, we present two mechanically verified garbage collectors, both practical enough to use for real-world C# benchmarks. The collectors and their associated allocators consist of x86 assembly language instructions and macro instructions, annotated with preconditions, postconditions, invariants, and assertions. We used the Boogie verification generator and the Z3 automated theorem prover to verify this assembly language code mechanically. We provide measurements comparing the performance of the verified collector with that of the standard Bartok collectors on off-the-shelf C# benchmarks, demonstrating their competitiveness

arXiv.org e-Print Archive

CiteSeerX

Gunrock: A High-Performance Graph Processing Library on the GPU

Author: Cederman D.
Goel A.
Gonzalez J. E.
Gregor D.
Jia Y.
Low Y.
Pande P. R.
Siek J. G.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 15/01/2016
Field of study

For large-scale graph analytics on the GPU, the irregularity of data access and control flow, and the complexity of programming GPUs have been two significant challenges for developing a programmable high-performance graph library. "Gunrock", our graph-processing system designed specifically for the GPU, uses a high-level, bulk-synchronous, data-centric abstraction focused on operations on a vertex or edge frontier. Gunrock achieves a balance between performance and expressiveness by coupling high performance GPU computing primitives and optimization strategies with a high-level programming model that allows programmers to quickly develop new graph primitives with small code size and minimal GPU programming knowledge. We evaluate Gunrock on five key graph primitives and show that Gunrock has on average at least an order of magnitude speedup over Boost and PowerGraph, comparable performance to the fastest GPU hardwired primitives, and better performance than any other GPU high-level graph library.Comment: 14 pages, accepted by PPoPP'16 (removed the text repetition in the previous version v5

arXiv.org e-Print Archive

Crossref

eScholarship - University of California