10,604 research outputs found

    Generation of Application Specific Hardware Extensions for Hybrid Architectures: The Development of PIRANHA - A GCC Plugin for High-Level-Synthesis

    Get PDF
    Architectures combining a field programmable gate array (FPGA) and a general-purpose processor on a single chip became increasingly popular in recent years. On the one hand, such hybrid architectures facilitate the use of application specific hardware accelerators that improve the performance of the software on the host processor. On the other hand, it obliges system designers to handle the whole process of hardware/software co-design. The complexity of this process is still one of the main reasons, that hinders the widespread use of hybrid architectures. Thus, an automated process that aids programmers with the hardware/software partitioning and the generation of application specific accelerators is an important issue. The method presented in this thesis neither requires restrictions of the used high-level-language nor special source code annotations. Usually, this is an entry barrier for programmers without deeper understanding of the underlying hardware platform. This thesis introduces a seamless programming flow that allows generating hardware accelerators for unrestricted, legacy C code. The implementation consists of a GCC plugin that automatically identifies application hot-spots and generates hardware accelerators accordingly. Apart from the accelerator implementation in a hardware description language, the compiler plugin provides the generation of a host processor interfaces and, if necessary, a prototypical integration with the host operating system. An evaluation with typical embedded applications shows general benefits of the approach, but also reveals limiting factors that hamper possible performance improvements

    An integrated soft- and hard-programmable multithreaded architecture

    Get PDF

    Dynamic Binary Translation for Embedded Systems with Scratchpad Memory

    Get PDF
    Embedded software development has recently changed with advances in computing. Rather than fully co-designing software and hardware to perform a relatively simple task, nowadays embedded and mobile devices are designed as a platform where multiple applications can be run, new applications can be added, and existing applications can be updated. In this scenario, traditional constraints in embedded systems design (i.e., performance, memory and energy consumption and real-time guarantees) are more difficult to address. New concerns (e.g., security) have become important and increase software complexity as well. In general-purpose systems, Dynamic Binary Translation (DBT) has been used to address these issues with services such as Just-In-Time (JIT) compilation, dynamic optimization, virtualization, power management and code security. In embedded systems, however, DBT is not usually employed due to performance, memory and power overhead. This dissertation presents StrataX, a low-overhead DBT framework for embedded systems. StrataX addresses the challenges faced by DBT in embedded systems using novel techniques. To reduce DBT overhead, StrataX loads code from NAND-Flash storage and translates it into a Scratchpad Memory (SPM), a software-managed on-chip SRAM with limited capacity. SPM has similar access latency as a hardware cache, but consumes less power and chip area. StrataX manages SPM as a software instruction cache, and employs victim compression and pinning to reduce retranslation cost and capture frequently executed code in the SPM. To prevent performance loss due to excessive code expansion, StrataX minimizes the amount of code inserted by DBT to maintain control of program execution. When a hardware instruction cache is available, StrataX dynamically partitions translated code among the SPM and main memory. With these techniques, StrataX has low performance overhead relative to native execution for MiBench programs. Further, it simplifies embedded software and hardware design by operating transparently to applications without any special hardware support. StrataX achieves sufficiently low overhead to make it feasible to use DBT in embedded systems to address important design goals and requirements

    Energy-Efficient Algorithms

    Full text link
    We initiate the systematic study of the energy complexity of algorithms (in addition to time and space complexity) based on Landauer's Principle in physics, which gives a lower bound on the amount of energy a system must dissipate if it destroys information. We propose energy-aware variations of three standard models of computation: circuit RAM, word RAM, and transdichotomous RAM. On top of these models, we build familiar high-level primitives such as control logic, memory allocation, and garbage collection with zero energy complexity and only constant-factor overheads in space and time complexity, enabling simple expression of energy-efficient algorithms. We analyze several classic algorithms in our models and develop low-energy variations: comparison sort, insertion sort, counting sort, breadth-first search, Bellman-Ford, Floyd-Warshall, matrix all-pairs shortest paths, AVL trees, binary heaps, and dynamic arrays. We explore the time/space/energy trade-off and develop several general techniques for analyzing algorithms and reducing their energy complexity. These results lay a theoretical foundation for a new field of semi-reversible computing and provide a new framework for the investigation of algorithms.Comment: 40 pages, 8 pdf figures, full version of work published in ITCS 201

    Unbounded Superoptimization

    Get PDF
    Our aim is to enable software to take full advantage of the capabilities of emerging microprocessor designs without modifying the compiler. Towards this end, we propose a new approach to code generation and optimization. Our approach uses an SMT solver in a novel way to generate efficient code for modern architectures and guarantee that the generated code correctly implements the source code. The distinguishing characteristic of our approach is that the size of the constraints does not depend on the candidate sequence of instructions. To study the feasibility of our approach, we implemented a preliminary prototype, which takes as input LLVM IR code and uses Z3 SMT solver to generate ARMv7-A assembly. The prototype handles arbitrary loop-free code (not only basic blocks) as input and output. We applied it to small but tricky examples used as standard benchmarks for other superoptimization and synthesis tools. We are encouraged to see that Z3 successfully solved complex constraints that arise from our approach. This work paves the way to employing recent advances in SMT solvers and has a potential to advance SMT solvers further by providing a new category of challenging benchmarks that come from an industrial application domain

    메모리 보호를 위한 보안 정책을 시행하기 위한 코드 변환 기술

    Get PDF
    학위논문(박사)--서울대학교 대학원 :공과대학 전기·컴퓨터공학부,2020. 2. 백윤흥.Computer memory is a critical component in computer systems that needs to be protected to ensure the security of computer systems. It contains security sensitive data that should not be disclosed to adversaries. Also, it contains the important data for operating the system that should not be manipulated by the attackers. Thus, many security solutions focus on protecting memory so that sensitive data cannot be leaked out of the computer system or on preventing illegal access to computer data. In this thesis, I will present various code transformation techniques for enforcing security policies for memory protection. First, I will present a code transformation technique to track implicit data flows so that security sensitive data cannot leak through implicit data flow channels (i.e., conditional branches). Then I will present a compiler technique to instrument C/C++ program to mitigate use-after-free errors, which is a type of vulnerability that allow illegal access to stale memory location. Finally, I will present a code transformation technique for low-end embedded devices to enable execute-only memory, which is a strong security policy to protect secrets and harden the computing device against code reuse attacks.컴퓨터 메모리는 컴퓨터 시스템의 보안을 위해 보호되어야 하는 중요한 컴포넌트이다. 컴퓨터 메모리는 보안상 중요한 데이터를 담고 있을 뿐만 아니라, 시스템의 올바른 동작을 위해 공격자에 의해 조작되어서는 안되는 중요한 데이터 값들을 저장한다. 따라서 많은 보안 솔루션은 메모리를 보호하여 컴퓨터 시스템에서 중요한 데이터가 유출되거나 컴퓨터 데이터에 대한 불법적인 접근을 방지하는 데 중점을 둔다. 본 논문에서는 메모리 보호를 위한 보안 정책을 시행하기 위한 다양한 코드 변환 기술을 제시한다. 먼저, 프로그램에서 분기문을 통해 보안에 민감한 데이터가 유출되지 않도록 암시적 데이터 흐름을 추적하는 코드 변환 기술을 제시한다. 그 다음으로 C / C ++ 프로그램을 변환하여 use-after-free 오류를 완화하는 컴파일러 기술을 제시한다. 마지막으로, 중요 데이터를 보호하고 코드 재사용 공격으로부터 디바이스를 강화할 수 있는 강력한 보안 정책인 실행 전용 메모리(execute-only memory)를 저사양 임베디드 디바이스에 구현하기 위한 코드 변환 기술을 제시한다.1 Introduction 1 2 Background 4 3 A Hardware-based Technique for Efficient Implicit Information Flow Tracking 8 3.1 Introduction 8 3.2 Related Work 10 3.3 Our Approach for Implicit Flow Tracking 12 3.3.1 Implicit Flow Tracking Scheme with Program Counter Tag 12 3.3.2 tP C Management Technique 15 3.3.3 Compensation for the Untaken Path 20 3.4 Architecture Design of IFTU 22 3.4.1 Overall System 22 3.4.2 Tag Computing Core 24 3.5 Performance and Area Analysis 26 3.6 Security Analysis 28 3.7 Summary 30 4 CRCount: Pointer Invalidation with Reference Counting to Mitigate Useafter-free in Legacy C/C++ 31 4.1 Introduction 31 4.2 Related Work 36 4.3 Threat Model 40 4.4 Implicit Pointer Invalidation 40 4.4.1 Invalidation with Reference Counting 40 4.4.2 Reference Counting in C/C++ 42 4.5 Design 44 4.5.1 Overview 45 4.5.2 Pointer Footprinting 46 4.5.3 Delayed Object Free 50 4.6 Implementation 53 4.7 Evaluation 56 4.7.1 Statistics 56 4.7.2 Performance Overhead 58 4.7.3 Memory Overhead 62 4.8 Security Analysis 67 4.8.1 Attack Prevention 68 4.8.2 Security considerations 69 4.9 Limitations 69 4.10 Summary 71 5 uXOM: Efficient eXecute-Only Memory on ARM Cortex-M 73 5.1 Introduction 73 5.2 Background 78 5.2.1 ARMv7-M Address Map and the Private Peripheral Bus (PPB) 78 5.2.2 Memory Protection Unit (MPU) 79 5.2.3 Unprivileged Loads/Stores 80 5.2.4 Exception Entry and Return 80 5.3 Threat Model and Assumptions 81 5.4 Approach and Challenges 82 5.5 uXOM 85 5.5.1 Basic Design 85 5.5.2 Solving the Challenges 89 5.5.3 Optimizations 98 5.5.4 Security Analysis 99 5.6 Evaluation 100 5.6.1 Runtime Overhead 103 5.6.2 Code Size Overhead 106 5.6.3 Energy Overhead 107 5.6.4 Security and Usability 107 5.6.5 Use Cases 108 5.7 Discussion 110 5.8 Related Work 111 5.9 Summary 113 6 Conclusion and Future Work 114 6.1 Future Work 115 Abstract (In Korean) 132 Acknowlegement 133Docto

    Optimizing energy-efficiency for multi-core packet processing systems in a compiler framework

    Get PDF
    Network applications become increasingly computation-intensive and the amount of traffic soars unprecedentedly nowadays. Multi-core and multi-threaded techniques are thus widely employed in packet processing system to meet the changing requirement. However, the processing power cannot be fully utilized without a suitable programming environment. The compilation procedure is decisive for the quality of the code. It can largely determine the overall system performance in terms of packet throughput, individual packet latency, core utilization and energy efficiency. The thesis investigated compilation issues in networking domain first, particularly on energy consumption. And as a cornerstone for any compiler optimizations, a code analysis module for collecting program dependency is presented and incorporated into a compiler framework. With that dependency information, a strategy based on graph bi-partitioning and mapping is proposed to search for an optimal configuration in a parallel-pipeline fashion. The energy-aware extension is specifically effective in enhancing the energy-efficiency of the whole system. Finally, a generic evaluation framework for simulating the performance and energy consumption of a packet processing system is given. It accepts flexible architectural configuration and is capable of performingarbitrary code mapping. The simulation time is extremely short compared to full-fledged simulators. A set of our optimization results is gathered using the framework
    corecore