5 research outputs found

    CGuard: Efficient Spatial Safety for C

    Full text link
    Spatial safety violations are the root cause of many security attacks and unexpected behavior of applications. Existing techniques to enforce spatial safety work broadly at either object or pointer granularity. Object-based approaches tend to incur high CPU overheads, whereas pointer-based approaches incur both high CPU and memory overheads. SGXBounds, an object-based approach, is so far the most efficient technique that provides complete out-of-bounds protection for objects. However, a major drawback of this approach is that it can't support address space larger than 32-bit. In this paper, we present CGuard, a tool that provides object-bounds protection for C applications with comparable overheads to SGXBounds without restricting the application address space. CGuard stores the bounds information just before the base address of an object and encodes the relative offset of the base address in the spare bits of the virtual address available in x86_64 architecture. For an object that can't fit in the spare bits, CGuard uses a custom memory layout that enables it to find the base address of the object in just one memory access. Our study revealed spatial safety violations in the gcc and x264 benchmarks from the SPEC CPU2017 benchmark suite and the string_match benchmark from the Phoenix benchmark suite. The execution time overheads for the SPEC CPU2017 and Phoenix benchmark suites were 42% and 26% respectively, whereas the reduction in the throughput for the Apache webserver when the CPUs were fully saturated was 30%. These results indicate that CGuard can be highly effective while maintaining a reasonable degree of efficiency

    The Road Not Taken: Exploring Alias Analysis Based Optimizations Missed by the Compiler

    Get PDF
    Context-sensitive inter-procedural alias analyses are more precise than intra-procedural alias analyses. However, context-sensitive inter-procedural alias analyses are not scalable. As a consequence, most of the production compilers sacrifice precision for scalability and implement intra-procedural alias analysis. The alias analysis is used by many compiler optimizations, including loop transformations. Due to the imprecision of alias analysis, the program’s performance may suffer, especially in the presence of loops. Previous work proposed a general approach based on code-versioning with dynamic checks to disambiguate pointers at runtime. However, the overhead of dynamic checks in this approach is ( ), which is substantially high to enable interesting optimizations. Other suggested approaches, e.g., polyhedral and symbolic range analysis, have (1) overheads, but they only work for loops with certain constraints. The production compilers, such as LLVM and GCC, use scalar evolution analysis to compute an (1) range check for loops to resolve memory dependencies at runtime. However, this approach also can only be applied to loops with certain constraints. In this work, we present our tool, Scout, that can disambiguate two pointers at runtime using single memory access. Scout is based on the key idea to constrain the allocation size and alignment during memory allocations. Scout can also disambiguate array accesses within a loop for which the existing (1) range checks technique cannot be applied. In addition, Scout uses feedback from static optimizations to reduce the number of dynamic checks needed for optimizations. Our technique enabled new opportunities for loop-invariant code motion, dead store elimination, loopvectorization, and load elimination in an already optimized code. Our performance improvements are up to 51.11% for Polybench and up to 0.89% for CPU SPEC 2017 suites. The geometric means for our allocator’s CPU and memory overheads for CPU SPEC 2017 benchmarks are 1.05%, and 7.47%, respectively. For Polybench benchmarks, the geometric mean of CPU and memory overheads are 0.21% and 0.13%, respectively

    Rapid: Region-Based Pointer Disambiguation

    Get PDF
    Interprocedural alias analyses often sacrifice precision for scalability. Thus, modern compilers such as GCC and LLVM implement more scalable but less precise intraprocedural alias analyses. This compromise makes the compilers miss out on potential optimization opportunities, affecting the performance of the application. Modern compilers implement loop-versioning with dynamic checks for pointer disambiguation to enable the missed optimizations. Polyhedral access range analysis and symbolic range analysis enable O(1) range checks for non-overlapping of memory accesses inside loops. However, these approaches work only for the loops in which the loop bounds are loop invariants. To address this limitation, researchers proposed a technique that requires O(log n) memory accesses for pointer disambiguation. Others improved the performance of dynamic checks to single memory access by constraining the object size and alignment. However, the former approach incurs noticeable overhead due to its dynamic checks, whereas the latter has a noticeable allocator overhead. Thus, scalability remains a challenge. In this work, we present a tool, Rapid, that further reduces the overheads of the allocator and dynamic checks proposed in the existing approaches. The key idea is to identify objects that need disambiguation checks using a profiler and allocate them in different regions, which are disjoint memory areas. The disambiguation checks simply compare the regions corresponding to the objects. The regions are aligned such that the top 32 bits in the addresses of any two objects allocated in different regions are always different. As a consequence, the dynamic checks do not require any memory access to ensure that the objects belong to different regions, making them efficient. Rapid achieved a maximum performance benefit of around 52.94% for Polybench and 1.88% for CPU SPEC 2017 benchmarks. The maximum CPU overhead of our allocator is 0.57% with a geometric mean of -0.2% for CPU SPEC 2017 benchmarks. Due to the low overhead of the allocator and dynamic checks, Rapid could improve the performance of 12 out of 16 CPU SPEC 2017 benchmarks. In contrast, a state-of-the-art approach used in the comparison could improve only five CPU SPEC 2017 benchmarks
    corecore