2,207 research outputs found

    HardBound: Architectural Support for Spatial Safety of the C Programming Language

    Get PDF
    The C programming language is at least as well known for its absence of spatial memory safety guarantees (i.e., lack of bounds checking) as it is for its high performance. C\u27s unchecked pointer arithmetic and array indexing allow simple programming mistakes to lead to erroneous executions, silent data corruption, and security vulnerabilities. Many prior proposals have tackled enforcing spatial safety in C programs by checking pointer and array accesses. However, existing software-only proposals have significant drawbacks that may prevent wide adoption, including: unacceptably high runtime overheads, lack of completeness, incompatible pointer representations, or need for non-trivial changes to existing C source code and compiler infrastructure. Inspired by the promise of these software-only approaches, this paper proposes a hardware bounded pointer architectural primitive that supports cooperative hardware/software enforcement of spatial memory safety for C programs. This bounded pointer is a new hardware primitive datatype for pointers that leaves the standard C pointer representation intact, but augments it with bounds information maintained separately and invisibly by the hardware. The bounds are initialized by the software, and they are then propagated and enforced transparently by the hardware, which automatically checks a pointer\u27s bounds before it is dereferenced. One mode of use requires instrumenting only malloc, which enables enforcement of per-allocation spatial safety for heap-allocated objects for existing binaries. When combined with simple intra-procedural compiler instrumentation, hardware bounded pointers enable a low-overhead approach for enforcing complete spatial memory safety in unmodified C programs

    Accelerating array constraints in symbolic execution

    Get PDF
    Despite significant recent advances, the effectiveness of symbolic execution is limited when used to test complex, real-world software. One of the main scalability challenges is related to constraint solv- ing: large applications and long exploration paths lead to complex constraints, often involving big arrays indexed by symbolic expres- sions. In this paper, we propose a set of semantics-preserving trans- formations for array operations that take advantage of contextual information collected during symbolic execution. Our transforma- tions lead to simpler encodings and hence better performance in constraint solving. The results we obtain are encouraging: we show, through an extensive experimental analysis, that our transforma- tions help to significantly improve the performance of symbolic execution in the presence of arrays. We also show that our transfor- mations enable the analysis of new code, which would be otherwise out of reach for symbolic execution

    Improving the Performance and Endurance of Persistent Memory with Loose-Ordering Consistency

    Full text link
    Persistent memory provides high-performance data persistence at main memory. Memory writes need to be performed in strict order to satisfy storage consistency requirements and enable correct recovery from system crashes. Unfortunately, adhering to such a strict order significantly degrades system performance and persistent memory endurance. This paper introduces a new mechanism, Loose-Ordering Consistency (LOC), that satisfies the ordering requirements at significantly lower performance and endurance loss. LOC consists of two key techniques. First, Eager Commit eliminates the need to perform a persistent commit record write within a transaction. We do so by ensuring that we can determine the status of all committed transactions during recovery by storing necessary metadata information statically with blocks of data written to memory. Second, Speculative Persistence relaxes the write ordering between transactions by allowing writes to be speculatively written to persistent memory. A speculative write is made visible to software only after its associated transaction commits. To enable this, our mechanism supports the tracking of committed transaction ID and multi-versioning in the CPU cache. Our evaluations show that LOC reduces the average performance overhead of memory persistence from 66.9% to 34.9% and the memory write traffic overhead from 17.1% to 3.4% on a variety of workloads.Comment: This paper has been accepted by IEEE Transactions on Parallel and Distributed System

    Everything You Want to Know About Pointer-Based Checking

    Get PDF
    Lack of memory safety in C/C++ has resulted in numerous security vulnerabilities and serious bugs in large software systems. This paper highlights the challenges in enforcing memory safety for C/C++ programs and progress made as part of the SoftBoundCETS project. We have been exploring memory safety enforcement at various levels - in hardware, in the compiler, and as a hardware-compiler hybrid - in this project. Our research has identified that maintaining metadata with pointers in a disjoint metadata space and performing bounds and use-after-free checking can provide comprehensive memory safety. We describe the rationale behind the design decisions and its ramifications on various dimensions, our experience with the various variants that we explored in this project, and the lessons learned in the process. We also describe and analyze the forthcoming Intel Memory Protection Extensions (MPX) that provides hardware acceleration for disjoint metadata and pointer checking in mainstream hardware, which is expected to be available later this year

    FAUSTA: Scaling Dynamic Analysis with Traffic Generation at WhatsApp

    Get PDF
    We introduce Fausta, an algorithmic traffic gener-ation platform that enables analysis and testing at scale. Fausta has been deployed at Meta to analyze and test the WhatsApp plat-form infrastructure since September 2020, enabling WhatsApp developers to deploy reliable code changes to a code base of millions of lines of code, supporting over 2 billion users who rely on WhatsApp for their daily communications. Fausta covers expected and unexpected program behaviors in a privacy-safe controlled environment to support multiple use cases such as reliability testing, privacy analysis and performance regression detection. It currently supports three different algorithmic input generation strategies, each of which construct realistic backend server traffic that closely simulates production data, without replaying any real user data. Fausta has been deployed and closely integrated into the WhatsApp continuous integration process, catching bugs in development before they hit production. We report on the development and deployment of Fausta's reliability use case between September 2020 and August 2021. During this period it has found 1,876 unique reliability issues, with a fix rate of 74%, indicating a high degree of true positive fault revelation. We also report on the distribution of fault types revealed by Fausta, and the correlation between coverage and faults found. Overall, we do find evidence that higher coverage is correlated with fault revelation

    Aikido: Accelerating shared data dynamic analyses

    Get PDF
    Despite a burgeoning demand for parallel programs, the tools available to developers working on shared-memory multicore processors have lagged behind. One reason for this is the lack of hardware support for inspecting the complex behavior of these parallel programs. Inter-thread communication, which must be instrumented for many types of analyses, may occur with any memory operation. To detect such thread communication in software, many existing tools require the instrumentation of all memory operations, which leads to significant performance overheads. To reduce this overhead, some existing tools resort to random sampling of memory operations, which introduces false negatives. Unfortunately, neither of these approaches provide the speed and accuracy programmers have traditionally expected from their tools. In this work, we present Aikido, a new system and framework that enables the development of efficient and transparent analyses that operate on shared data. Aikido uses a hybrid of existing hardware features and dynamic binary rewriting to detect thread communication with low overhead. Aikido runs a custom hypervisor below the operating system, which exposes per-thread hardware protection mechanisms not available in any widely used operating system. This hybrid approach allows us to benefit from the low cost of detecting memory accesses with hardware, while maintaining the word-level accuracy of a software-only approach. To evaluate our framework, we have implemented an Aikido-enabled vector clock race detector. Our results show that the Aikido enabled race-detector outperforms existing techniques that provide similar accuracy by up to 6.0x, and 76% on average, on the PARSEC benchmark suite.National Science Foundation (U.S.) (NSF grant CCF-0832997)National Science Foundation (U.S.) (DOE SC0005288)United States. Defense Advanced Research Projects Agency (DARPA HR0011-10- 9-0009

    ParaLog: enabling and accelerating online parallel monitoring of multithreaded applications

    Get PDF
    Instruction-grain lifeguards monitor the events of a running application at the level of individual instructions in order to identify and help mitigate application bugs and security exploits. Because such lifeguards impose a 10-100X slowdown on existing platforms, previous studies have proposed hardware designs to accelerate lifeguard processing. However, these accelerators are either tailored to a specific class of lifeguards or suitable only for monitoring singlethreaded programs. We present ParaLog, the first design of a system enabling fast online parallel monitoring of multithreaded parallel applications. ParaLog supports a broad class of software-defined lifeguards. We show how three existing accelerators can be enhanced to support online multithreaded monitoring, dramatically reducing lifeguard overheads. We identify and solve several challenges in monitoring parallel applications and/or parallelizing these accelerators, including (i) enforcing inter-thread data dependences, (ii) dealing with inter-thread effects that are not reflected in coherence traffic, (iii) dealing with unmonitored operating system activity, and (iv) ensuring lifeguards can access shared metadata with negligible synchronization overheads. We present our system design for both Sequentially Consistent and Total Store Ordering processors. We implement and evaluate our design on a 16 core simulated CMP, using benchmarks from SPLASH-2 and PARSEC and two lifeguards: a data-flow tracking lifeguard and a memory-access checker lifeguard. Our results show that (i) our parallel accelerators improve performance by 2-9X and 1.13-3.4X for our two lifeguards, respectively, (ii) we are 5-126X faster than the time-slicing approach required by existing techniques, and (iii) our average overheads for applications with eight threads are 51% and 28% for the two lifeguards, respectively
    • …
    corecore