Search CORE

1,049 research outputs found

LINE ASSOCIATIVE REGISTERS

Author: Melarkode Krishna
Publication venue: UKnowledge
Publication date: 01/01/2004
Field of study

As technological advances have improved processor speed, main memory speed has lagged behind. Even with advanced RAM technologies, it has not been possible to close the gap in speeds. Ideally, a CPU can deliver good performance when the right data is made available to it at the right time. Caches and Registers solved the problem to an extent. This thesis takes the approach of trying to create a new memory access model that is more efficient and simple instead of using various add on mechanisms to mask high memory latency. The Line Associative Registers have the functionality of a cache, scalar registers and vector registers built into them. This new model qualitatively changes how the processor accesses memory

University of Kentucky

Loopapalooza: Investigating Limits of Loop-Level Parallelism with a Compiler-Driven Approach

Author: Gabrielli Giacomo
Iordanou Konstantinos
Luján Mikel
Zaidi Ali
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 23/04/2021
Field of study

The University of Manchester - Institutional Repository

The Meaning of Memory Safety

Author: A Ahmed
AA Amorim de
AM Pitts
Aslan Askarov
C Schlesinger
D Stefan
David Chisnall
GC Necula
Hongseok Yang
T Balabonski
Xavier Leroy
Publication venue
Publication date: 06/04/2018
Field of study

We give a rigorous characterization of what it means for a programming language to be memory safe, capturing the intuition that memory safety supports local reasoning about state. We formalize this principle in two ways. First, we show how a small memory-safe language validates a noninterference property: a program can neither affect nor be affected by unreachable parts of the state. Second, we extend separation logic, a proof system for heap-manipulating programs, with a memory-safe variant of its frame rule. The new rule is stronger because it applies even when parts of the program are buggy or malicious, but also weaker because it demands a stricter form of separation between parts of the program state. We also consider a number of pragmatically motivated variations on memory safety and the reasoning principles they support. As an application of our characterization, we evaluate the security of a previously proposed dynamic monitor for memory safety of heap-allocated data.Comment: POST'18 final versio

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

Recommended from our members

CHERIvoke: Characterising pointer revocation using CHERI capabilities for temporal memory safety

Author: Ainsworth S
Filardo NW
Jones TM
Moore SW
Neumann PG
Richardson A
Roe M
Rugg P
Watson RNM
Woodruff J
Xia H
Publication venue: Proceedings of the Annual International Symposium on Microarchitecture, MICRO
Publication date: 01/01/2019
Field of study

A lack of temporal safety in low-level languages has led to an epidemic of use-after-free exploits. These have surpassed in number and severity even the infamous buffer-overflow exploits violating spatial safety. Capability addressing can directly enforce spatial safety for the C language by enforcing bounds on pointers and by rendering pointers unforgeable. Nevertheless, an efficient solution for strong temporal memory safety remains elusive. CHERI is an architectural extension to provide hardware capability addressing that is seeing significant commercial and open- source interest. We show that CHERI capabilities can be used as a foundation to enable low-cost heap temporal safety by facilitating out-of-date pointer revocation, as capabilities enable precise and efficient identification and invalidation of pointers, even when using unsafe languages such as C. We develop CHERIvoke, a technique for deterministic and fast sweeping revocation to enforce temporal safety on CHERI systems. CHERIvoke quarantines freed data before periodically using a small shadow map to revoke all dangling pointers in a single sweep of memory, and provides a tunable trade-off between performance and heap growth. We evaluate the performance of such a system using high-performance x86 processors, and further analytically examine its primary overheads. When configured with a heap-size overhead of 25%, we find that CHERIvoke achieves an average execution-time overhead of under 5%, far below the overheads associated with traditional garbage collection, revocation, or page-table systems.EP/K026399/1, EP/P020011/1, EP/K008528/

Apollo (Cambridge)

A compiler level intermediate representation based binary analysis system and its applications

Author: Anand Kapil
Publication venue
Publication date: 01/01/2013
Field of study

Analyzing and optimizing programs from their executables has received a lot of attention recently in the research community. There has been a tremendous amount of activity in executable-level research targeting varied applications such as security vulnerability analysis, untrusted code analysis, malware analysis, program testing, and binary optimizations. The vision of this dissertation is to advance the field of static analysis of executables and bridge the gap between source-level analysis and executable analysis. The main thesis of this work is scalable static binary rewriting and analysis using compiler-level intermediate representation without relying on the presence of metadata information such as debug or symbolic information. In spite of a significant overlap in the overall goals of several source-code methods and executables-level techniques, several sophisticated transformations that are well-understood and implemented in source-level infrastructures have yet to become available in executable frameworks. It is a well known fact that a standalone executable without any meta data is less amenable to analysis than the source code. Nonetheless, we believe that one of the prime reasons behind the limitations of existing executable frameworks is that current executable frameworks define their own intermediate representations (IR) which are significantly more constrained than an IR used in a compiler. Intermediate representations used in existing binary frameworks lack high level features like abstract stack, variables, and symbols and are even machine dependent in some cases. This severely limits the application of well-understood compiler transformations to executables and necessitates new research to make them applicable. In the first part of this dissertation, we present techniques to convert the binaries to the same high-level intermediate representation that compilers use. We propose methods to segment the flat address space in an executable containing undifferentiated blocks of memory. We demonstrate the inadequacy of existing variable identification methods for their promotion to symbols and present our methods for symbol promotion. We also present methods to convert the physically addressed stack in an executable to an abstract stack. The proposed methods are practical since they do not employ symbolic, relocation, or debug information which are usually absent in deployed executables. We have integrated our techniques with a prototype x86 binary framework called \emph{SecondWrite} that uses LLVM as the IR. The robustness of the framework is demonstrated by handling executables totaling more than a million lines of source-code, including several real world programs. In the next part of this work, we demonstrate that several well-known source-level analysis frameworks such as symbolic analysis have limited effectiveness in the executable domain since executables typically lack higher-level semantics such as program variables. The IR should have a precise memory abstraction for an analysis to effectively reason about memory operations. Our first work of recovering a compiler-level representation addresses this limitation by recovering several higher-level semantics information from executables. In the next part of this work, we propose methods to handle the scenarios when such semantics cannot be recovered. First, we propose a hybrid static-dynamic mechanism for recovering a precise and correct memory model in executables in presence of executable-specific artifacts such as indirect control transfers. Next, the enhanced memory model is employed to define a novel symbolic analysis framework for executables that can perform the same types of program analysis as source-level tools. Frameworks hitherto fail to simultaneously maintain the properties of correct representation and precise memory model and ignore memory-allocated variables while defining symbolic analysis mechanisms. We exemplify that our framework is robust, efficient and it significantly improves the performance of various traditional analyses like global value numbering, alias analysis and dependence analysis for executables. Finally, the underlying representation and analysis framework is employed for two separate applications. First, the framework is extended to define a novel static analysis framework, \emph{DemandFlow}, for identifying information flow security violations in program executables. Unlike existing static vulnerability detection methods for executables, DemandFlow analyzes memory locations in addition to symbols, thus improving the precision of the analysis. DemandFlow proposes a novel demand-driven mechanism to identify and precisely analyze only those program locations and memory accesses which are relevant to a vulnerability, thus enhancing scalability. DemandFlow uncovers six previously undiscovered format string and directory traversal vulnerabilities in popular ftp and internet relay chat clients. Next, the framework is extended to implement a platform-specific optimization for embedded processors. Several embedded systems provide the facility of locking one or more lines in the cache. We devise the first method in literature that employs instruction cache locking as a mechanism for improving the average-case run-time of general embedded applications. We demonstrate that the optimal solution for instruction cache locking can be obtained in polynomial time. Since our scheme is implemented inside a binary framework, it successfully addresses the portability concern by enabling the implementation of cache locking at the time of deployment when all the details of the memory hierarchy are available

CiteSeerX

Digital Repository at the University of Maryland

Recommended from our members

CheriOS: Designing an untrusted single-address-space capability operating system utilising capability hardware and a minimal hypervisor

Author: Esswood Lawrence
Publication venue: University of Cambridge
Publication date: 01/07/2020
Field of study

This thesis presents the design, implementation, and evaluation of a novel capability operating system: CheriOS. The guiding motivation behind CheriOS is to provide strong security guarantees to programmers, even allowing them to continue to program in fast, but typically unsafe, languages such as C. Furthermore, it does this in the presence of an extremely strong adversarial model: in CheriOS, every compartment -- and even the operating system itself -- is considered actively malicious. Building on top of the architecturally enforced capabilities offered by the CHERI microprocessor, I show that only a few more capability types and enforcement checks are required to provide a strong compartmentalisation model that can facilitate mutual distrust. I implement these new primitives in software, in a new abstraction layer I dub the nanokernel. Among the new OS primitives I introduce are one for integrity and confidentiality called a Reservation (which allows allocating private memory without trusting the allocator), as well as another that can provide attestation about the state of the system, a Foundation (which provides a key to sign and protect capabilities based on a signature of the starting state of a program). I show that, using these new facilities, it is possible to design an operating system without having to trust the implementation is correct. CheriOS is fundamentally fail-safe; there are no assumptions about the behaviour of the system, apart from the CHERI processor and the nanokernel, to be broken. Using CHERI and the new nanokernel primitives, programmers can expect full isolation at scopes ranging from a whole program to a single function, and not just with respect to other programs but the system itself. Programs compiled for and run on CheriOS offer full memory safety, both spatial and temporal, enforced control flow integrity between compartments and protection against common vulnerabilities such as buffer overflows, code injection and Return-Oriented-Programming attacks. I achieve this by designing a new CHERI-based ABI (Application Binary Interface) which includes a novel stack structure that offers temporal safety. I evaluate how practical the new designs are by prototyping them and offering a detailed performance evaluation. I also contrast with existing offerings from both industry and academia. CHERI capabilities can be used to restrict access to system resources, such as memory, with the required dynamic checks being performed by hardware in parallel with normal operation. Using the accelerating features of CHERI, I show that many of the security guarantees that CheriOS offers can come at little to no cost. I present a novel and secure IO/IPC layer that allows secure marshalling of multiple data streams through mutually distrusting compartments, with fine-grained authenticated access control for end-points, and without either copying or encryption. For example, CheriOS can restrict its TCP stack from having access to packet contents, or restrict an open socket to ensure data sent on it to arrives at an endpoint signed as a TLS implementation. Even with added security requirements, CheriOS can perform well on real workloads. I showcase this by running a state-of-the-art webserver, NGINX, atop both CheriOS and FreeBSD and show improvements in performance ranging from 3x to 6x when running on a small-scale low-power FPGA implementation of CHERI-MIPS

Apollo (Cambridge)

Optimizing SIMD execution in HW/SW co-designed processors

Author: Kumar Rakesh
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2014
Field of study

SIMD accelerators are ubiquitous in microprocessors from different computing domains. Their high compute power and hardware simplicity improve overall performance in an energy efficient manner. Moreover, their replicated functional units and simple control mechanism make them amenable to scaling to higher vector lengths. However, code generation for these accelerators has been a challenge from the days of their inception. Compilers generate vector code conservatively to ensure correctness. As a result they lose significant vectorization opportunities and fail to extract maximum benefits out of SIMD accelerators. This thesis proposes to vectorize the program binary at runtime in a speculative manner, in addition to the compile time static vectorization. There are different environments that support runtime profiling and optimization support required for dynamic vectorization, one of most prominent ones being: 1) Dynamic Binary Translators and Optimizers (DBTO) and 2) Hardware/Software (HW/SW) Co-designed Processors. HW/SW co-designed environment provides several advantages over DBTOs like transparent incorporations of new hardware features, binary compatibility, etc. Therefore, we use HW/SW co-designed environment to assess the potential of speculative dynamic vectorization. Furthermore, we analyze vector code generation for wider vector units and find out that even though SIMD accelerators are amenable to scaling from the hardware point of view, vector code generation at higher vector length is even more challenging. The two major factors impeding vectorization for wider SIMD units are: 1) Reduced dynamic instruction stream coverage for vectorization and 2) Large number of permutation instructions. To solve the first problem we propose Variable Length Vectorization that iteratively vectorizes for multiple vector lengths to improve dynamic instruction stream coverage. Secondly, to reduce the number of permutation instructions we propose Selective Writing that selectively writes to different parts of a vector register and avoids permutations. Finally, we tackle the problem of leakage energy in SIMD accelerators. Since SIMD accelerators consume significant amount of real estate on the chip, they become the principle source of leakage if not utilized judiciously. Power gating is one of the most widely used techniques to reduce leakage energy of functional units. However, power gating has its own energy and performance overhead associated with it. We propose to selectively devectorize the vector code when higher SIMD lanes are used intermittently. This selective devectorization keeps the higher SIMD lanes idle and power gated for maximum duration. Therefore, resulting in overall leakage energy reduction.Postprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Tesis Doctorals en Xarxa

Generation of Application Specific Hardware Extensions for Hybrid Architectures: The Development of PIRANHA - A GCC Plugin for High-Level-Synthesis

Author: Hempel Gerald
Publication venue
Publication date: 11/11/2019
Field of study

Architectures combining a field programmable gate array (FPGA) and a general-purpose processor on a single chip became increasingly popular in recent years. On the one hand, such hybrid architectures facilitate the use of application specific hardware accelerators that improve the performance of the software on the host processor. On the other hand, it obliges system designers to handle the whole process of hardware/software co-design. The complexity of this process is still one of the main reasons, that hinders the widespread use of hybrid architectures. Thus, an automated process that aids programmers with the hardware/software partitioning and the generation of application specific accelerators is an important issue. The method presented in this thesis neither requires restrictions of the used high-level-language nor special source code annotations. Usually, this is an entry barrier for programmers without deeper understanding of the underlying hardware platform. This thesis introduces a seamless programming flow that allows generating hardware accelerators for unrestricted, legacy C code. The implementation consists of a GCC plugin that automatically identifies application hot-spots and generates hardware accelerators accordingly. Apart from the accelerator implementation in a hardware description language, the compiler plugin provides the generation of a host processor interfaces and, if necessary, a prototypical integration with the host operating system. An evaluation with typical embedded applications shows general benefits of the approach, but also reveals limiting factors that hamper possible performance improvements

Technische Universität Dresden: Qucosa