Search CORE

1,224 research outputs found

GPU devices for safety-critical systems: a survey

Author: Abella Ferrer Jaume
Calderón Torres Alejandro Josué
Cazorla Almeida Francisco Javier
Flores Barroso José Luis
Kosmidis Leonidas
Pérez Cerrolaza Jon
Publication venue: Association for Computing Machinery (ACM)
Publication date: 01/07/2023
Field of study

Graphics Processing Unit (GPU) devices and their associated software programming languages and frameworks can deliver the computing performance required to facilitate the development of next-generation high-performance safety-critical systems such as autonomous driving systems. However, the integration of complex, parallel, and computationally demanding software functions with different safety-criticality levels on GPU devices with shared hardware resources contributes to several safety certification challenges. This survey categorizes and provides an overview of research contributions that address GPU devices’ random hardware failures, systematic failures, and independence of execution.This work has been partially supported by the European Research Council with Horizon 2020 (grant agreements No. 772773 and 871465), the Spanish Ministry of Science and Innovation under grant PID2019-107255GB, the HiPEAC Network of Excellence and the Basque Government under grant KK-2019-00035. The Spanish Ministry of Economy and Competitiveness has also partially supported Leonidas Kosmidis with a Juan de la Cierva Incorporación postdoctoral fellowship (FJCI-2020- 045931-I).Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

Timing Analysis of General Purpose Graphics Processing Units for Real-Time Systems: Models and Analyses

Author: Kostiantyn Berezovskyi
Publication venue
Publication date: 20/04/2016
Field of study

Repositório Aberto da Universidade do Porto

Efficient hardware implementations of bio-inspired networks

Author: Vasanthakumaribabu Anakha
Publication venue: Digital Commons @ NJIT
Publication date: 31/05/2020
Field of study

The human brain, with its massive computational capability and power efficiency in small form factor, continues to inspire the ultimate goal of building machines that can perform tasks without being explicitly programmed. In an effort to mimic the natural information processing paradigms observed in the brain, several neural network generations have been proposed over the years. Among the neural networks inspired by biology, second-generation Artificial or Deep Neural Networks (ANNs/DNNs) use memoryless neuron models and have shown unprecedented success surpassing humans in a wide variety of tasks. Unlike ANNs, third-generation Spiking Neural Networks (SNNs) closely mimic biological neurons by operating on discrete and sparse events in time called spikes, which are obtained by the time integration of previous inputs. Implementation of data-intensive neural network models on computers based on the von Neumann architecture is mainly limited by the continuous data transfer between the physically separated memory and processing units. Hence, non-von Neumann architectural solutions are essential for processing these memory-intensive bio-inspired neural networks in an energy-efficient manner. Among the non-von Neumann architectures, implementations employing non-volatile memory (NVM) devices are most promising due to their compact size and low operating power. However, it is non-trivial to integrate these nanoscale devices on conventional computational substrates due to their non-idealities, such as limited dynamic range, finite bit resolution, programming variability, etc. This dissertation demonstrates the architectural and algorithmic optimizations of implementing bio-inspired neural networks using emerging nanoscale devices. The first half of the dissertation focuses on the hardware acceleration of DNN implementations. A 4-layer stochastic DNN in a crossbar architecture with memristive devices at the cross point is analyzed for accelerating DNN training. This network is then used as a baseline to explore the impact of experimental memristive device behavior on network performance. Programming variability is found to have a critical role in determining network performance compared to other non-ideal characteristics of the devices. In addition, noise-resilient inference engines are demonstrated using stochastic memristive DNNs with 100 bits for stochastic encoding during inference and 10 bits for the expensive training. The second half of the dissertation focuses on a novel probabilistic framework for SNNs using the Generalized Linear Model (GLM) neurons for capturing neuronal behavior. This work demonstrates that probabilistic SNNs have comparable perform-ance against equivalent ANNs on two popular benchmarks - handwritten-digit classification and human activity recognition. Considering the potential of SNNs in energy-efficient implementations, a hardware accelerator for inference is proposed, termed as Spintronic Accelerator for Probabilistic SNNs (SpinAPS). The learning algorithm is optimized for a hardware friendly implementation and uses first-to-spike decoding scheme for low latency inference. With binary spintronic synapses and digital CMOS logic neurons for computations, SpinAPS achieves a performance improvement of 4x in terms of GSOPS/W/mm

^2

when compared to a conventional SRAM-based design. Collectively, this work demonstrates the potential of emerging memory technologies in building energy-efficient hardware architectures for deep and spiking neural networks. The design strategies adopted in this work can be extended to other spike and non-spike based systems for building embedded solutions having power/energy constraints

Digital Commons @ New Jersey Institute of Technology (NJIT)

Approximate Computing Survey, Part I: Terminology and Software & Hardware Approximation Techniques

Author: Armeniakos Giorgos
Hanif Muhammad Abdullah
Jiao Xun
Leon Vasileios
Pekmestzi Kiamal
Shafique Muhammad
Soudris Dimitrios
Publication venue
Publication date: 20/07/2023
Field of study

The rapid growth of demanding applications in domains applying multimedia processing and machine learning has marked a new era for edge and cloud computing. These applications involve massive data and compute-intensive tasks, and thus, typical computing paradigms in embedded systems and data centers are stressed to meet the worldwide demand for high performance. Concurrently, the landscape of the semiconductor field in the last 15 years has constituted power as a first-class design concern. As a result, the community of computing systems is forced to find alternative design approaches to facilitate high-performance and/or power-efficient computing. Among the examined solutions, Approximate Computing has attracted an ever-increasing interest, with research works applying approximations across the entire traditional computing stack, i.e., at software, hardware, and architectural levels. Over the last decade, there is a plethora of approximation techniques in software (programs, frameworks, compilers, runtimes, languages), hardware (circuits, accelerators), and architectures (processors, memories). The current article is Part I of our comprehensive survey on Approximate Computing, and it reviews its motivation, terminology and principles, as well it classifies and presents the technical details of the state-of-the-art software and hardware approximation techniques.Comment: Under Review at ACM Computing Survey

arXiv.org e-Print Archive

Correct and efficient accelerator programming

Author: Cohen Albert
Donaldson Alistair F.
Huisman Marieke
Katoen Joost-Pieter
Publication venue: Dagstuhl Publishing
Publication date: 01/01/2013
Field of study

This report documents the program and the outcomes of Dagstuhl Seminar 13142 “Correct and Efficient Accelerator Programming”. The aim of this Dagstuhl seminar was to bring together researchers from various sub-disciplines of computer science to brainstorm and discuss the theoretical foundations, design and implementation of techniques and tools for correct and efficient accelerator programming

University of Twente Research Information

Simulation and implementation of novel deep learning hardware architectures for resource constrained devices

Author: Lammie Corey
Publication venue
Publication date: 01/01/2022
Field of study

Corey Lammie designed mixed signal memristive-complementary metal–oxide–semiconductor (CMOS) and field programmable gate arrays (FPGA) hardware architectures, which were used to reduce the power and resource requirements of Deep Learning (DL) systems; both during inference and training. Disruptive design methodologies, such as those explored in this thesis, can be used to facilitate the design of next-generation DL systems

ResearchOnline at James Cook University

Recommended from our members

Design and Optimization of Mobile Cloud Computing Systems with Networked Virtual Platforms

Author: Jung Young Hoon
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2016
Field of study

A Mobile Cloud Computing (MCC) system is a cloud-based system that is accessed by the users through their own mobile devices. MCC systems are emerging as the product of two technology trends: 1) the migration of personal computing from desktop to mobile devices and 2) the growing integration of large-scale computing environments into cloud systems. Designers are developing a variety of new mobile cloud computing systems. Each of these systems is developed with different goals and under the influence of different design constraints, such as high network latency or limited energy supply. The current MCC systems rely heavily on Computation Offloading, which however incurs new problems such as scalability of the cloud, privacy concerns due to storing personal information on the cloud, and high energy consumption on the cloud data centers. In this dissertation, I address these problems by exploring different options in the distribution of computation across different computing nodes in MCC systems. My thesis is that "the use of design and simulation tools optimized for design space exploration of the MCC systems is the key to optimize the distribution of computation in MCC." For a quantitative analysis of mobile cloud computing systems through design space exploration, I have developed netShip, the first generation of an innovative design and simulation tool, that offers large scalability and heterogeneity support. With this tool system designers and software programmers can efficiently develop, optimize, and validate large-scale, heterogeneous MCC systems. I have enhanced netShip to support the development of ever-evolving MCC applications with a variety of emerging needs including the fast simulation of new devices, e.g., Internet-of-Things devices, and accelerators, e.g., mobile GPUs. Leveraging netShip, I developed three new MCC systems where I applied three variations of a new computation distributing technique, called Reverse Offloading. By more actively leveraging the computational power on mobile devices, the MCC systems can reduce the total execution times, the burden of concentrated computations on the cloud, and the privacy concerns about storing personal information available in the cloud. This approach also creates opportunities for new services by utilizing the information available on the mobile device instead of accessing the cloud. Throughout my research I have enabled the design optimization of mobile applications and cloud-computing platforms. In particular, my design tool for MCC systems becomes a vehicle to optimize not only the performance but also the energy dissipation, an aspect of critical importance for any computing system

Columbia University Academic Commons

Approximate Computing Survey, Part II: Application-Specific & Architectural Approximation Techniques and Applications

Author: Armeniakos Giorgos
Hanif Muhammad Abdullah
Jiao Xun
Leon Vasileios
Pekmestzi Kiamal
Shafique Muhammad
Soudris Dimitrios
Publication venue
Publication date: 20/07/2023
Field of study

The challenging deployment of compute-intensive applications from domains such Artificial Intelligence (AI) and Digital Signal Processing (DSP), forces the community of computing systems to explore new design approaches. Approximate Computing appears as an emerging solution, allowing to tune the quality of results in the design of a system in order to improve the energy efficiency and/or performance. This radical paradigm shift has attracted interest from both academia and industry, resulting in significant research on approximation techniques and methodologies at different design layers (from system down to integrated circuits). Motivated by the wide appeal of Approximate Computing over the last 10 years, we conduct a two-part survey to cover key aspects (e.g., terminology and applications) and review the state-of-the art approximation techniques from all layers of the traditional computing stack. In Part II of our survey, we classify and present the technical details of application-specific and architectural approximation techniques, which both target the design of resource-efficient processors/accelerators & systems. Moreover, we present a detailed analysis of the application spectrum of Approximate Computing and discuss open challenges and future directions.Comment: Under Review at ACM Computing Survey

arXiv.org e-Print Archive