149 research outputs found

    Application-specific Design and Optimization for Ultra-Low-Power Embedded Systems

    Get PDF
    University of Minnesota Ph.D. dissertation. August 2019. Major: Electrical/Computer Engineering. Advisor: John Sartori. 1 computer file (PDF); xii, 101 pages.The last few decades have seen a tremendous amount of innovation in computer system design to the point where electronic devices have become very inexpensive. This has brought us on the verge of a new paradigm in computing where there will be hundreds of devices in a person’s environment, ranging from mobile phones to smart home devices to wearables to implantables, all interconnected. This paradigm, called the Internet of Things (IoT), brings new challenges in terms of power, cost, and security. For example, power and energy have become critical design constraints that not only affect the lifetime of an ultra-low-power (ULP) system, but also its size and weight. While many conventional techniques exist that are aimed at energy reduction or that improve energy efficiency, they do so at the cost of performance. As such, their impact is limited in circumstances where energy is very constrained or where significant degradation of performance or functionality is unacceptable. Focusing on the opposing demands to increase both energy efficiency and performance simultaneously in a world where Moore’s law scaling is decelerating, one of the underlying themes of this work has been to identify novel insights that enable new pathways to energy efficiency in computing systems while avoiding the conventional tradeoff that simply sacrifices performance and functionality for energy efficiency. To this end, this work proposes a method to analyze the behavior of an application on the gate-level netlist of a processor for all possible inputs using a novel symbolic hardware-software co-analysis methdology. Using this methodology several techniques have been proposed to optimize a given processor-application pair for power, area and security

    Dependable design for low-cost ultra-low-power processors

    Get PDF
    Emerging applications in the Internet of Things (IoT) domain, such as wearables, implantables, smart tags, and wireless sensor networks put severe power, cost, reliability, and security constraints on hardware system design. This dissertation focuses on the architecture and design of dependable ultra-low power computing systems. Specifically, it proposes architecture and design techniques that exploit the unique application and usage characteristics of future computing systems to deliver low power, while meeting the reliability and security constraints of these systems. First, this dissertation considers the challenge of achieving both low power and high reliability in SRAM memories. It proposes both an architectural technique to reduce the overheads of error correction and a technique that uses the nature of error correcting codes to allow lower voltage operation without sacrificing reliability. Next, this dissertation considers low power and low cost. By leveraging the fact that many IoT systems are embedded in nature and will run the same application for their entire lifetime, fine-grained usage characteristics of the hardware-software system can be determined at design time. This dissertation presents a novel hardware-software co-analysis based on symbolic simulation that can determine the possible states of the processor throughout any execution of a specific application. This enables power-gating where more gates are turned off for longer, bespoke processors customized to specific applications, and stricter determination of peak power bounds. Finally, this dissertation considers achieving secure IoT systems at low cost and power overhead. By leveraging the hardware-software co-analysis, this dissertation shows that gate-level information flow security guarantees can be provided without hardware overheads

    Hierarchical Agent-based Adaptation for Self-Aware Embedded Computing Systems

    Get PDF
    Siirretty Doriast

    A survey of emerging architectural techniques for improving cache energy consumption

    Get PDF
    The search goes on for another ground breaking phenomenon to reduce the ever-increasing disparity between the CPU performance and storage. There are encouraging breakthroughs in enhancing CPU performance through fabrication technologies and changes in chip designs but not as much luck has been struck with regards to the computer storage resulting in material negative system performance. A lot of research effort has been put on finding techniques that can improve the energy efficiency of cache architectures. This work is a survey of energy saving techniques which are grouped on whether they save the dynamic energy, leakage energy or both. Needless to mention, the aim of this work is to compile a quick reference guide of energy saving techniques from 2013 to 2016 for engineers, researchers and students

    Adaptive memory-side last-level GPU caching

    Get PDF
    Emerging GPU applications exhibit increasingly high computation demands which has led GPU manufacturers to build GPUs with an increasingly large number of streaming multiprocessors (SMs). Providing data to the SMs at high bandwidth puts significant pressure on the memory hierarchy and the Network-on-Chip (NoC). Current GPUs typically partition the memory-side last-level cache (LLC) in equally-sized slices that are shared by all SMs. Although a shared LLC typically results in a lower miss rate, we find that for workloads with high degrees of data sharing across SMs, a private LLC leads to a significant performance advantage because of increased bandwidth to replicated cache lines across different LLC slices. In this paper, we propose adaptive memory-side last-level GPU caching to boost performance for sharing-intensive workloads that need high bandwidth to read-only shared data. Adaptive caching leverages a lightweight performance model that balances increased LLC bandwidth against increased miss rate under private caching. In addition to improving performance for sharing-intensive workloads, adaptive caching also saves energy in a (co-designed) hierarchical two-stage crossbar NoC by power-gating and bypassing the second stage if the LLC is configured as a private cache. Our experimental results using 17 GPU workloads show that adaptive caching improves performance by 28.1% on average (up to 38.1%) compared to a shared LLC for sharing-intensive workloads. In addition, adaptive caching reduces NoC energy by 26.6% on average (up to 29.7%) and total system energy by 6.1% on average (up to 27.2%) when configured as a private cache. Finally, we demonstrate through a GPU NoC design space exploration that a hierarchical two-stage crossbar is both more power- and area-efficient than full and concentrated crossbars with the same bisection bandwidth, thus providing a low-cost cooperative solution to exploit workload sharing behavior in memory-side last-level caches

    Scalability of broadcast performance in wireless network-on-chip

    Get PDF
    Networks-on-Chip (NoCs) are currently the paradigm of choice to interconnect the cores of a chip multiprocessor. However, conventional NoCs may not suffice to fulfill the on-chip communication requirements of processors with hundreds or thousands of cores. The main reason is that the performance of such networks drops as the number of cores grows, especially in the presence of multicast and broadcast traffic. This not only limits the scalability of current multiprocessor architectures, but also sets a performance wall that prevents the development of architectures that generate moderate-to-high levels of multicast. In this paper, a Wireless Network-on-Chip (WNoC) where all cores share a single broadband channel is presented. Such design is conceived to provide low latency and ordered delivery for multicast/broadcast traffic, in an attempt to complement a wireline NoC that will transport the rest of communication flows. To assess the feasibility of this approach, the network performance of WNoC is analyzed as a function of the system size and the channel capacity, and then compared to that of wireline NoCs with embedded multicast support. Based on this evaluation, preliminary results on the potential performance of the proposed hybrid scheme are provided, together with guidelines for the design of MAC protocols for WNoC.Peer ReviewedPostprint (published version

    Effective data parallel computing on multicore processors

    Get PDF
    The rise of chip multiprocessing or the integration of multiple general purpose processing cores on a single chip (multicores), has impacted all computing platforms including high performance, servers, desktops, mobile, and embedded processors. Programmers can no longer expect continued increases in software performance without developing parallel, memory hierarchy friendly software that can effectively exploit the chip level multiprocessing paradigm of multicores. The goal of this dissertation is to demonstrate a design process for data parallel problems that starts with a sequential algorithm and ends with a high performance implementation on a multicore platform. Our design process combines theoretical algorithm analysis with practical optimization techniques. Our target multicores are quad-core processors from Intel and the eight-SPE IBM Cell B.E. Target applications include Matrix Multiplications (MM), Finite Difference Time Domain (FDTD), LU Decomposition (LUD), and Power Flow Solver based on Gauss-Seidel (PFS-GS) algorithms. These applications are popular computation methods in science and engineering problems and are characterized by unit-stride (MM, LUD, and PFS-GS) or 2-point stencil (FDTD) memory access pattern. The main contributions of this dissertation include a cache- and space-efficient algorithm model, integrated data pre-fetching and caching strategies, and in-core optimization techniques. Our multicore efficient implementations of the above described applications outperform nai¨ve parallel implementations by at least 2x and scales well with problem size and with the number of processing cores

    Energy Efficient Hardware Design for Securing the Internet-of-Things

    Full text link
    The Internet of Things (IoT) is a rapidly growing field that holds potential to transform our everyday lives by placing tiny devices and sensors everywhere. The ubiquity and scale of IoT devices require them to be extremely energy efficient. Given the physical exposure to malicious agents, security is a critical challenge within the constrained resources. This dissertation presents energy-efficient hardware designs for IoT security. First, this dissertation presents a lightweight Advanced Encryption Standard (AES) accelerator design. By analyzing the algorithm, a novel method to manipulate two internal steps to eliminate storage registers and replace flip-flops with latches to save area is discovered. The proposed AES accelerator achieves state-of-art area and energy efficiency. Second, the inflexibility and high Non-Recurring Engineering (NRE) costs of Application-Specific-Integrated-Circuits (ASICs) motivate a more flexible solution. This dissertation presents a reconfigurable cryptographic processor, called Recryptor, which achieves performance and energy improvements for a wide range of security algorithms across public key/secret key cryptography and hash functions. The proposed design employs circuit techniques in-memory and near-memory computing and is more resilient to power analysis attack. In addition, a simulator for in-memory computation is proposed. It is of high cost to design and evaluate new-architecture like in-memory computing in Register-transfer level (RTL). A C-based simulator is designed to enable fast design space exploration and large workload simulations. Elliptic curve arithmetic and Galois counter mode are evaluated in this work. Lastly, an error resilient register circuit, called iRazor, is designed to tolerate unpredictable variations in manufacturing process operating temperature and voltage of VLSI systems. When integrated into an ARM processor, this adaptive approach outperforms competing industrial techniques such as frequency binning and canary circuits in performance and energy.PHDElectrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/147546/1/zhyiqun_1.pd

    Emerging Security Threats in Modern Digital Computing Systems: A Power Management Perspective

    Get PDF
    Design of computing systems — from pocket-sized smart phones to massive cloud based data-centers — have one common daunting challenge : minimizing the power consumption. In this effort, power management sector is undergoing a rapid and profound transformation to promote clean and energy proportional computing. At the hardware end of system design, there is proliferation of specialized, feature rich and complex power management hardware components. Similarly, in the software design layer complex power management suites are growing rapidly. Concurrent to this development, there has been an upsurge in the integration of third-party components to counter the pressures of shorter time-to-market. These trends collectively raise serious concerns about trust and security of power management solutions. In recent times, problems such as overheating, performance degradation and poor battery life, have dogged the mobile devices market, including the infamous recall of Samsung Note 7. Power outage in the data-center of a major airline left innumerable passengers stranded, with thousands of canceled flights costing over 100 million dollars. This research examines whether such events of unintentional reliability failure, can be replicated using targeted attacks by exploiting the security loopholes in the complex power management infrastructure of a computing system. At its core, this research answers an imminent research question: How can system designers ensure secure and reliable operation of third-party power management units? Specifically, this work investigates possible attack vectors, and novel non-invasive detection and defense mechanisms to safeguard system against malicious power attacks. By a joint exploration of the threat model and techniques to seamlessly detect and protect against power attacks, this project can have a lasting impact, by enabling the design of secure and cost-effective next generation hardware platforms
    • …
    corecore