3,362 research outputs found

    AutoAccel: Automated Accelerator Generation and Optimization with Composable, Parallel and Pipeline Architecture

    Full text link
    CPU-FPGA heterogeneous architectures are attracting ever-increasing attention in an attempt to advance computational capabilities and energy efficiency in today's datacenters. These architectures provide programmers with the ability to reprogram the FPGAs for flexible acceleration of many workloads. Nonetheless, this advantage is often overshadowed by the poor programmability of FPGAs whose programming is conventionally a RTL design practice. Although recent advances in high-level synthesis (HLS) significantly improve the FPGA programmability, it still leaves programmers facing the challenge of identifying the optimal design configuration in a tremendous design space. This paper aims to address this challenge and pave the path from software programs towards high-quality FPGA accelerators. Specifically, we first propose the composable, parallel and pipeline (CPP) microarchitecture as a template of accelerator designs. Such a well-defined template is able to support efficient accelerator designs for a broad class of computation kernels, and more importantly, drastically reduce the design space. Also, we introduce an analytical model to capture the performance and resource trade-offs among different design configurations of the CPP microarchitecture, which lays the foundation for fast design space exploration. On top of the CPP microarchitecture and its analytical model, we develop the AutoAccel framework to make the entire accelerator generation automated. AutoAccel accepts a software program as an input and performs a series of code transformations based on the result of the analytical-model-based design space exploration to construct the desired CPP microarchitecture. Our experiments show that the AutoAccel-generated accelerators outperform their corresponding software implementations by an average of 72x for a broad class of computation kernels

    X-SRAM: Enabling In-Memory Boolean Computations in CMOS Static Random Access Memories

    Get PDF
    Silicon-based Static Random Access Memories (SRAM) and digital Boolean logic have been the workhorse of the state-of-art computing platforms. Despite tremendous strides in scaling the ubiquitous metal-oxide-semiconductor transistor, the underlying \textit{von-Neumann} computing architecture has remained unchanged. The limited throughput and energy-efficiency of the state-of-art computing systems, to a large extent, results from the well-known \textit{von-Neumann bottleneck}. The energy and throughput inefficiency of the von-Neumann machines have been accentuated in recent times due to the present emphasis on data-intensive applications like artificial intelligence, machine learning \textit{etc}. A possible approach towards mitigating the overhead associated with the von-Neumann bottleneck is to enable \textit{in-memory} Boolean computations. In this manuscript, we present an augmented version of the conventional SRAM bit-cells, called \textit{the X-SRAM}, with the ability to perform in-memory, vector Boolean computations, in addition to the usual memory storage operations. We propose at least six different schemes for enabling in-memory vector computations including NAND, NOR, IMP (implication), XOR logic gates with respect to different bit-cell topologies −- the 8T cell and the 8+^+T Differential cell. In addition, we also present a novel \textit{`read-compute-store'} scheme, wherein the computed Boolean function can be directly stored in the memory without the need of latching the data and carrying out a subsequent write operation. The feasibility of the proposed schemes has been verified using predictive transistor models and Monte-Carlo variation analysis.Comment: This article has been accepted in a future issue of IEEE Transactions on Circuits and Systems-I: Regular Paper

    Wireless Medical Sensor Networks: Design Requirements and Enabling Technologies

    Get PDF
    This article analyzes wireless communication protocols that could be used in healthcare environments (e.g., hospitals and small clinics) to transfer real-time medical information obtained from noninvasive sensors. For this purpose the features of the three currently most widely used protocols—namely, Bluetooth® (IEEE 802.15.1), ZigBee (IEEE 802.15.4), and Wi-Fi (IEEE 802.11)—are evaluated and compared. The important features under consideration include data bandwidth, frequency band, maximum transmission distance, encryption and authentication methods, power consumption, and current applications. In addition, an overview of network requirements with respect to medical sensor features, patient safety and patient data privacy, quality of service, and interoperability between other sensors is briefly presented. Sensor power consumption is also discussed because it is considered one of the main obstacles for wider adoption of wireless networks in medical applications. The outcome of this assessment will be a useful tool in the hands of biomedical engineering researchers. It will provide parameters to select the most effective combination of protocols to implement a specific wireless network of noninvasive medical sensors to monitor patients remotely in the hospital or at home

    EFFICIENT HARDWARE PRIMITIVES FOR SECURING LIGHTWEIGHT SYSTEMS

    Get PDF
    In the era of IoT and ubiquitous computing, the collection and communication of sensitive data is increasingly being handled by lightweight Integrated Circuits. Efficient hardware implementations of crytographic primitives for resource constrained applications have become critical, especially block ciphers which perform fundamental operations such as encryption, decryption, and even hashing. We study the efficiency of block ciphers under different implementation styles. For low latency applications that use unrolled block cipher implementations, we design a glitch filter to reduce energy consumption. For lightweight applications, we design a novel architecture for the widely used AES cipher. The design eliminates inefficiencies in data movement and clock activity, thereby significantly improving energy efficiency over state-of-the-art architectures. Apart from efficiency, vulnerability to implementation attacks are a concern, which we mitigate by our randomization capable lightweight AES architecture. We fabricate our designs in a commercial 16nm FinFET technology and present measured testchip data on energy consumption and side channel resistance. Finally, we address the problem of supply chain security by using image processing techniques to extract fingerprints from surface texture of plastic IC packages for IC authentication and counterfeit prevention. Collectively these works present efficient and cost effective solutions to secure lightweight systems
    • …
    corecore