1,847 research outputs found
The Parallel Persistent Memory Model
We consider a parallel computational model that consists of processors,
each with a fast local ephemeral memory of limited size, and sharing a large
persistent memory. The model allows for each processor to fault with bounded
probability, and possibly restart. On faulting all processor state and local
ephemeral memory are lost, but the persistent memory remains. This model is
motivated by upcoming non-volatile memories that are as fast as existing random
access memory, are accessible at the granularity of cache lines, and have the
capability of surviving power outages. It is further motivated by the
observation that in large parallel systems, failure of processors and their
caches is not unusual.
Within the model we develop a framework for developing locality efficient
parallel algorithms that are resilient to failures. There are several
challenges, including the need to recover from failures, the desire to do this
in an asynchronous setting (i.e., not blocking other processors when one
fails), and the need for synchronization primitives that are robust to
failures. We describe approaches to solve these challenges based on breaking
computations into what we call capsules, which have certain properties, and
developing a work-stealing scheduler that functions properly within the context
of failures. The scheduler guarantees a time bound of in expectation, where and are the work and
depth of the computation (in the absence of failures), is the average
number of processors available during the computation, and is the
probability that a capsule fails. Within the model and using the proposed
methods, we develop efficient algorithms for parallel sorting and other
primitives.Comment: This paper is the full version of a paper at SPAA 2018 with the same
nam
Shining Light On Shadow Stacks
Control-Flow Hijacking attacks are the dominant attack vector against C/C++
programs. Control-Flow Integrity (CFI) solutions mitigate these attacks on the
forward edge,i.e., indirect calls through function pointers and virtual calls.
Protecting the backward edge is left to stack canaries, which are easily
bypassed through information leaks. Shadow Stacks are a fully precise mechanism
for protecting backwards edges, and should be deployed with CFI mitigations. We
present a comprehensive analysis of all possible shadow stack mechanisms along
three axes: performance, compatibility, and security. For performance
comparisons we use SPEC CPU2006, while security and compatibility are
qualitatively analyzed. Based on our study, we renew calls for a shadow stack
design that leverages a dedicated register, resulting in low performance
overhead, and minimal memory overhead, but sacrifices compatibility. We present
case studies of our implementation of such a design, Shadesmar, on Phoronix and
Apache to demonstrate the feasibility of dedicating a general purpose register
to a security monitor on modern architectures, and the deployability of
Shadesmar. Our comprehensive analysis, including detailed case studies for our
novel design, allows compiler designers and practitioners to select the correct
shadow stack design for different usage scenarios.Comment: To Appear in IEEE Security and Privacy 201
Automated Verification of Practical Garbage Collectors
Garbage collectors are notoriously hard to verify, due to their low-level
interaction with the underlying system and the general difficulty in reasoning
about reachability in graphs. Several papers have presented verified
collectors, but either the proofs were hand-written or the collectors were too
simplistic to use on practical applications. In this work, we present two
mechanically verified garbage collectors, both practical enough to use for
real-world C# benchmarks. The collectors and their associated allocators
consist of x86 assembly language instructions and macro instructions, annotated
with preconditions, postconditions, invariants, and assertions. We used the
Boogie verification generator and the Z3 automated theorem prover to verify
this assembly language code mechanically. We provide measurements comparing the
performance of the verified collector with that of the standard Bartok
collectors on off-the-shelf C# benchmarks, demonstrating their competitiveness
Gunrock: A High-Performance Graph Processing Library on the GPU
For large-scale graph analytics on the GPU, the irregularity of data access
and control flow, and the complexity of programming GPUs have been two
significant challenges for developing a programmable high-performance graph
library. "Gunrock", our graph-processing system designed specifically for the
GPU, uses a high-level, bulk-synchronous, data-centric abstraction focused on
operations on a vertex or edge frontier. Gunrock achieves a balance between
performance and expressiveness by coupling high performance GPU computing
primitives and optimization strategies with a high-level programming model that
allows programmers to quickly develop new graph primitives with small code size
and minimal GPU programming knowledge. We evaluate Gunrock on five key graph
primitives and show that Gunrock has on average at least an order of magnitude
speedup over Boost and PowerGraph, comparable performance to the fastest GPU
hardwired primitives, and better performance than any other GPU high-level
graph library.Comment: 14 pages, accepted by PPoPP'16 (removed the text repetition in the
previous version v5
- …