24 research outputs found

    Efficient Hardware Architectures for Accelerating Deep Neural Networks: Survey

    Get PDF
    In the modern-day era of technology, a paradigm shift has been witnessed in the areas involving applications of Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning (DL). Specifically, Deep Neural Networks (DNNs) have emerged as a popular field of interest in most AI applications such as computer vision, image and video processing, robotics, etc. In the context of developed digital technologies and the availability of authentic data and data handling infrastructure, DNNs have been a credible choice for solving more complex real-life problems. The performance and accuracy of a DNN is a way better than human intelligence in certain situations. However, it is noteworthy that the DNN is computationally too cumbersome in terms of the resources and time to handle these computations. Furthermore, general-purpose architectures like CPUs have issues in handling such computationally intensive algorithms. Therefore, a lot of interest and efforts have been invested by the research fraternity in specialized hardware architectures such as Graphics Processing Unit (GPU), Field Programmable Gate Array (FPGA), Application Specific Integrated Circuit (ASIC), and Coarse Grained Reconfigurable Array (CGRA) in the context of effective implementation of computationally intensive algorithms. This paper brings forward the various research works carried out on the development and deployment of DNNs using the aforementioned specialized hardware architectures and embedded AI accelerators. The review discusses the detailed description of the specialized hardware-based accelerators used in the training and/or inference of DNN. A comparative study based on factors like power, area, and throughput, is also made on the various accelerators discussed. Finally, future research and development directions are discussed, such as future trends in DNN implementation on specialized hardware accelerators. This review article is intended to serve as a guide for hardware architectures for accelerating and improving the effectiveness of deep learning research.publishedVersio

    Lightweight Protocols and Applications for Memory-Based Intrinsic Physically Unclonable Functions on Commercial Off-The-Shelve Devices

    Get PDF
    We are currently living in the era in which through the ever-increasing dissemination of inter-connected embedded devices, the Internet-of-Things manifests. Although such end-point devices are commonly labeled as ``smart gadgets'' and hence they suggest to implement some sort of intelligence, from a cyber-security point of view, more then often the opposite holds. The market force in the branch of commercial embedded devices leads to minimizing production costs and time-to-market. This widespread trend has a direct, disastrous impact on the security properties of such devices. The majority of currently used devices or those that will be produced in the future do not implement any or insufficient security mechanisms. Foremost the lack of secure hardware components often mitigates the application of secure protocols and applications. This work is dedicated to a fundamental solution statement, which allows to retroactively secure commercial off-the-shelf devices, which otherwise are exposed to various attacks due to the lack of secure hardware components. In particular, we leverage the concept of Physically Unclonable Functions (PUFs), to create hardware-based security anchors in standard hardware components. For this purpose, we exploit manufacturing variations in Static Random-Access Memory (SRAM) and Dynamic Random-Access Memory modules to extract intrinsic memory-based PUF instances and building on that, to develop secure and lightweight protocols and applications. For this purpose, we empirically evaluate selected and representative device types towards their PUF characteristics. In a further step, we use those device types, which qualify due to the existence of desired PUF instances for subsequent development of security applications and protocols. Subsequently, we present various software-based security solutions which are specially tailored towards to the characteristic properties of embedded devices. More precisely, the proposed solutions comprise a secure boot architecture as well as an approach to protect the integrity of the firmware by binding it to the underlying hardware. Furthermore, we present a lightweight authentication protocol which leverages a novel DRAM-based PUF type. Finally, we propose a protocol, which allows to securely verify the software state of remote embedded devices

    Performance Optimization of Memory Intensive Applications on FPGA Accelerator

    Get PDF
    L'abstract è presente nell'allegato / the abstract is in the attachmen

    Raspberry Pi Technology

    Get PDF

    Embedded System Design

    Get PDF
    A unique feature of this open access textbook is to provide a comprehensive introduction to the fundamental knowledge in embedded systems, with applications in cyber-physical systems and the Internet of things. It starts with an introduction to the field and a survey of specification models and languages for embedded and cyber-physical systems. It provides a brief overview of hardware devices used for such systems and presents the essentials of system software for embedded systems, including real-time operating systems. The author also discusses evaluation and validation techniques for embedded systems and provides an overview of techniques for mapping applications to execution platforms, including multi-core platforms. Embedded systems have to operate under tight constraints and, hence, the book also contains a selected set of optimization techniques, including software optimization techniques. The book closes with a brief survey on testing. This fourth edition has been updated and revised to reflect new trends and technologies, such as the importance of cyber-physical systems (CPS) and the Internet of things (IoT), the evolution of single-core processors to multi-core processors, and the increased importance of energy efficiency and thermal issues

    Embedded System Design

    Get PDF
    A unique feature of this open access textbook is to provide a comprehensive introduction to the fundamental knowledge in embedded systems, with applications in cyber-physical systems and the Internet of things. It starts with an introduction to the field and a survey of specification models and languages for embedded and cyber-physical systems. It provides a brief overview of hardware devices used for such systems and presents the essentials of system software for embedded systems, including real-time operating systems. The author also discusses evaluation and validation techniques for embedded systems and provides an overview of techniques for mapping applications to execution platforms, including multi-core platforms. Embedded systems have to operate under tight constraints and, hence, the book also contains a selected set of optimization techniques, including software optimization techniques. The book closes with a brief survey on testing. This fourth edition has been updated and revised to reflect new trends and technologies, such as the importance of cyber-physical systems (CPS) and the Internet of things (IoT), the evolution of single-core processors to multi-core processors, and the increased importance of energy efficiency and thermal issues

    Predictable Shared Memory Resources for Multi-Core Real-Time Systems

    Get PDF
    A major challenge in multi-core real-time systems is the interference problem on the shared hardware components amongst cores. Examples of these shared components include buses, on-chip caches, and off-chip dynamic random access memories (DRAMs). The problem arises because different cores in the system interfere with each other, while competing to access the shared hardware components. It is a challenging problem for real-time systems because operations of one core affect the temporal behaviour of other cores, which complicates the timing analysis of the system. We address this problem by making the following contributions. 1) For shared buses, we propose CArb, a predictable and criticality-aware arbiter, which provides guaranteed and differential service to tasks based on their requirements. In addition, we utilize CArb to mitigate overheads resulting from system switching among different modes. 2) For the cache hierarchy, we address the problem of maintaining cache coherence in multi-core real-time systems by modifying current coherence protocols such that data sharing is viable for real-time systems in a manner amenable for timing analysis. The proposed solution provides performance improvements, does not impose any scheduling restrictions, and does not require any source-code modifications. 3) At the shared DRAM level, we propose PMC, a programmable memory controller that provides latency guarantees for running tasks upon accessing the off-chip DRAM, while assigning differential memory services to tasks based on their bandwidth and latency requirements. In addition to PMC, we conduct a latency-based analysis on DRAM memory controllers (MCs). Our analysis provides both best-case and worst-case bounds on the latency that any request suffers upon accessing the DRAM. The analysis comprehensively covers all possible interactions of successive requests considering all possible DRAM states. Finally, we formally model request interrelations and DRAM command interactions. We use these models to develop an automated validation framework along with benchmark suites to validate and evaluate PMC and any other MC, which we release as an open-source tool

    Image Processing Using FPGAs

    Get PDF
    This book presents a selection of papers representing current research on using field programmable gate arrays (FPGAs) for realising image processing algorithms. These papers are reprints of papers selected for a Special Issue of the Journal of Imaging on image processing using FPGAs. A diverse range of topics is covered, including parallel soft processors, memory management, image filters, segmentation, clustering, image analysis, and image compression. Applications include traffic sign recognition for autonomous driving, cell detection for histopathology, and video compression. Collectively, they represent the current state-of-the-art on image processing using FPGAs
    corecore