350 research outputs found
InversOS: Efficient Control-Flow Protection for AArch64 Applications with Privilege Inversion
With the increasing popularity of AArch64 processors in general-purpose
computing, securing software running on AArch64 systems against control-flow
hijacking attacks has become a critical part toward secure computation. Shadow
stacks keep shadow copies of function return addresses and, when protected from
illegal modifications and coupled with forward-edge control-flow integrity,
form an effective and proven defense against such attacks. However, AArch64
lacks native support for write-protected shadow stacks, while software
alternatives either incur prohibitive performance overhead or provide weak
security guarantees.
We present InversOS, the first hardware-assisted write-protected shadow
stacks for AArch64 user-space applications, utilizing commonly available
features of AArch64 to achieve efficient intra-address space isolation (called
Privilege Inversion) required to protect shadow stacks. Privilege Inversion
adopts unconventional design choices that run protected applications in the
kernel mode and mark operating system (OS) kernel memory as user-accessible;
InversOS therefore uses a novel combination of OS kernel modifications,
compiler transformations, and another AArch64 feature to ensure the safety of
doing so and to support legacy applications. We show that InversOS is secure by
design, effective against various control-flow hijacking attacks, and
performant on selected benchmarks and applications (incurring overhead of 7.0%
on LMBench, 7.1% on SPEC CPU 2017, and 3.0% on Nginx web server).Comment: 18 pages, 9 figures, 4 table
Aikido: Accelerating shared data dynamic analyses
Despite a burgeoning demand for parallel programs, the tools available to developers working on shared-memory multicore processors have lagged behind. One reason for this is the lack of hardware support for inspecting the complex behavior of these parallel programs. Inter-thread communication, which must be instrumented for many types of analyses, may occur with any memory operation. To detect such thread communication in software, many existing tools require the instrumentation of all memory operations, which leads to significant performance overheads. To reduce this overhead, some existing tools resort to random sampling of memory operations, which introduces false negatives. Unfortunately, neither of these approaches provide the speed and accuracy programmers have traditionally expected from their tools. In this work, we present Aikido, a new system and framework that enables the development of efficient and transparent analyses that operate on shared data. Aikido uses a hybrid of existing hardware features and dynamic binary rewriting to detect thread communication with low overhead. Aikido runs a custom hypervisor below the operating system, which exposes per-thread hardware protection mechanisms not available in any widely used operating system. This hybrid approach allows us to benefit from the low cost of detecting memory accesses with hardware, while maintaining the word-level accuracy of a software-only approach. To evaluate our framework, we have implemented an Aikido-enabled vector clock race detector. Our results show that the Aikido enabled race-detector outperforms existing techniques that provide similar accuracy by up to 6.0x, and 76% on average, on the PARSEC benchmark suite.National Science Foundation (U.S.) (NSF grant CCF-0832997)National Science Foundation (U.S.) (DOE SC0005288)United States. Defense Advanced Research Projects Agency (DARPA HR0011-10- 9-0009
PARSNIP: Performant Architecture for Race Safety with No Impact on Precision
Data race detection is a useful dynamic analysis for multithreaded programs that is a key building block in record-and-replay, enforcing strong consistency models, and detecting concurrency bugs. Existing software race detectors are precise but slow, and hardware support for precise data race detection relies on assumptions like type safety that many programs violate in practice.
We propose PARSNIP, a fully precise hardware-supported data race detector. PARSNIP exploits new insights into the redundancy of race detection metadata to reduce storage overheads. PARSNIP also adopts new race detection metadata encodings that accelerate the common case while preserving soundness and completeness. When bounded hardware resources are exhausted, PARSNIP falls back to a software race detector to preserve correctness. PARSNIP does not assume that target programs are type safe, and is thus suitable for race detection on arbitrary code.
Our evaluation of PARSNIP on several PARSEC benchmarks shows that performance overheads range from negligible to 2.6x, with an average overhead of just 1.5x. Moreover, Parsnip outperforms the state-of-the-art Radish hardware race detector by 4.6x
A Survey on Thread-Level Speculation Techniques
Producción CientíficaThread-Level Speculation (TLS) is a promising technique that allows the parallel execution of sequential code without relying on a prior, compile-time-dependence analysis. In this work, we introduce the technique, present a taxonomy of TLS solutions, and summarize and put into perspective the most relevant advances in this field.MICINN (Spain) and ERDF program of the European Union: HomProg-HetSys project (TIN2014-58876-P), CAPAP-H5 network (TIN2014-53522-REDT), and COST Program Action IC1305: Network for Sustainable Ultrascale Computing (NESUS)
No Provisioned Concurrency: Fast RDMA-codesigned Remote Fork for Serverless Computing
Serverless platforms essentially face a tradeoff between container startup
time and provisioned concurrency (i.e., cached instances), which is further
exaggerated by the frequent need for remote container initialization. This
paper presents MITOSIS, an operating system primitive that provides fast remote
fork, which exploits a deep codesign of the OS kernel with RDMA. By leveraging
the fast remote read capability of RDMA and partial state transfer across
serverless containers, MITOSIS bridges the performance gap between local and
remote container initialization. MITOSIS is the first to fork over 10,000 new
containers from one instance across multiple machines within a second, while
allowing the new containers to efficiently transfer the pre-materialized states
of the forked one. We have implemented MITOSIS on Linux and integrated it with
FN, a popular serverless platform. Under load spikes in real-world serverless
workloads, MITOSIS reduces the function tail latency by 89% with orders of
magnitude lower memory usage. For serverless workflow that requires state
transfer, MITOSIS improves its execution time by 86%.Comment: To appear in OSDI'2
Extending and Implementing the Self-adaptive Virtual Processor for Distributed Memory Architectures
Many-core architectures of the future are likely to have distributed memory
organizations and need fine grained concurrency management to be used
effectively. The Self-adaptive Virtual Processor (SVP) is an abstract
concurrent programming model which can provide this, but the model and its
current implementations assume a single address space shared memory. We
investigate and extend SVP to handle distributed environments, and discuss a
prototype SVP implementation which transparently supports execution on
heterogeneous distributed memory clusters over TCP/IP connections, while
retaining the original SVP programming model
- …