Search CORE

101 research outputs found

Algorithm Hardware Codesign for High Performance Neuromorphic Computing

Author: Fang Haowen
Publication venue: SURFACE at Syracuse University
Publication date: 22/12/2021
Field of study

Driven by the massive application of Internet of Things (IoT), embedded system and Cyber Physical System (CPS) etc., there is an increasing demand to apply machine intelligence on these power limited scenarios. Though deep learning has achieved impressive performance on various realistic and practical tasks such as anomaly detection, pattern recognition, machine vision etc., the ever-increasing computational complexity and model size of Deep Neural Networks (DNN) make it challenging to deploy them onto aforementioned scenarios where computation, memory and energy resource are all limited. Early studies show that biological systems\u27 energy efficiency can be orders of magnitude higher than that of digital systems. Hence taking inspiration from biological systems, neuromorphic computing and Spiking Neural Network (SNN) have drawn attention as alternative solutions for energy-efficient machine intelligence. Though believed promising, neuromorphic computing are hardly used for real world applications. A major problem is that the performance of SNN is limited compared with DNNs due to the lack of efficient training algorithm. In SNN, neuron\u27s output is spike, which is represented by Dirac Delta function mathematically. Becauase of the non-differentiable nature of spike, gradient descent cannot be directly used to train SNN. Hence algorithm level innovation is desirable. Next, as an emerging computing paradigm, hardware and architecture level innovation is also required to support new algorithms and to explore the potential of neuromorphic computing. In this work, we present a comprehensive algorithm-hardware codesign for neuromorphic computing. On the algorithm side, we address the training difficulty. We first derive a flexible SNN model that retains critical neural dynamics, and then develop algorithm to train SNN to learn temporal patterns. Next, we apply proposed algorithm to multivariate time series classification tasks to demonstrate its advantages. On hardware level, we develop a systematic solution on FPGA that is optimized for proposed SNN model to enable high performance inference. In addition, we also explore emerging devices, a memristor-based neuromorphic design is proposed. We carry out a neuron and synapse circuit which can replicate the important neural dynamics such as filter effect and adaptive threshold

Syracuse University Research Facility and Collaborative Environment

PalQuant: Accelerating High-precision Networks on Low-precision Accelerators

Author: Cheng Jian
Hu Qinghao
Li Gang
Wu Qiman
Publication venue
Publication date: 03/08/2022
Field of study

Recently low-precision deep learning accelerators (DLAs) have become popular due to their advantages in chip area and energy consumption, yet the low-precision quantized models on these DLAs bring in severe accuracy degradation. One way to achieve both high accuracy and efficient inference is to deploy high-precision neural networks on low-precision DLAs, which is rarely studied. In this paper, we propose the PArallel Low-precision Quantization (PalQuant) method that approximates high-precision computations via learning parallel low-precision representations from scratch. In addition, we present a novel cyclic shuffle module to boost the cross-group information communication between parallel low-precision groups. Extensive experiments demonstrate that PalQuant has superior performance to state-of-the-art quantization methods in both accuracy and inference speed, e.g., for ResNet-18 network quantization, PalQuant can obtain 0.52\% higher accuracy and 1.78

\times

speedup simultaneously over their 4-bit counter-part on a state-of-the-art 2-bit accelerator. Code is available at \url{https://github.com/huqinghao/PalQuant}.Comment: accepted by ECCV202

arXiv.org e-Print Archive

Can my chip behave like my brain?

Author: George Suma
Publication venue: Georgia Institute of Technology
Publication date: 27/05/2016
Field of study

Many decades ago, Carver Mead established the foundations of neuromorphic systems. Neuromorphic systems are analog circuits that emulate biology. These circuits utilize subthreshold dynamics of CMOS transistors to mimic the behavior of neurons. The objective is to not only simulate the human brain, but also to build useful applications using these bio-inspired circuits for ultra low power speech processing, image processing, and robotics. This can be achieved using reconfigurable hardware, like field programmable analog arrays (FPAAs), which enable configuring different applications on a cross platform system. As digital systems saturate in terms of power efficiency, this alternate approach has the potential to improve computational efficiency by approximately eight orders of magnitude. These systems, which include analog, digital, and neuromorphic elements combine to result in a very powerful reconfigurable processing machine.Ph.D

Approximate Computing Survey, Part II: Application-Specific & Architectural Approximation Techniques and Applications

Author: Armeniakos Giorgos
Hanif Muhammad Abdullah
Jiao Xun
Leon Vasileios
Pekmestzi Kiamal
Shafique Muhammad
Soudris Dimitrios
Publication venue
Publication date: 20/07/2023
Field of study

The challenging deployment of compute-intensive applications from domains such Artificial Intelligence (AI) and Digital Signal Processing (DSP), forces the community of computing systems to explore new design approaches. Approximate Computing appears as an emerging solution, allowing to tune the quality of results in the design of a system in order to improve the energy efficiency and/or performance. This radical paradigm shift has attracted interest from both academia and industry, resulting in significant research on approximation techniques and methodologies at different design layers (from system down to integrated circuits). Motivated by the wide appeal of Approximate Computing over the last 10 years, we conduct a two-part survey to cover key aspects (e.g., terminology and applications) and review the state-of-the art approximation techniques from all layers of the traditional computing stack. In Part II of our survey, we classify and present the technical details of application-specific and architectural approximation techniques, which both target the design of resource-efficient processors/accelerators & systems. Moreover, we present a detailed analysis of the application spectrum of Approximate Computing and discuss open challenges and future directions.Comment: Under Review at ACM Computing Survey

arXiv.org e-Print Archive

A scalable, portable, FPGA-based implementation of the Unscented Kalman Filter

Author: Soh Jeremy
Publication venue: Faculty of Engineering and Information Technologies, School of Aerospace, Mechanical and Mechatronic Engineering
Publication date: 28/02/2017
Field of study

Sustained technological progress has come to a point where robotic/autonomous systems may well soon become ubiquitous. In order for these systems to actually be useful, an increase in autonomous capability is necessary for aerospace, as well as other, applications. Greater aerospace autonomous capability means there is a need for high performance state estimation. However, the desire to reduce costs through simplified development processes and compact form factors can limit performance. A hardware-based approach, such as using a Field Programmable Gate Array (FPGA), is common when high performance is required, but hardware approaches tend to have a more complicated development process when compared to traditional software approaches; greater development complexity, in turn, results in higher costs. Leveraging the advantages of both hardware-based and software-based approaches, a hardware/software (HW/SW) codesign of the Unscented Kalman Filter (UKF), based on an FPGA, is presented. The UKF is split into an application-specific part, implemented in software to retain portability, and a non-application-specific part, implemented in hardware as a parameterisable IP core to increase performance. The codesign is split into three versions (Serial, Parallel and Pipeline) to provide flexibility when choosing the balance between resources and performance, allowing system designers to simplify the development process. Simulation results demonstrating two possible implementations of the design, a nanosatellite application and a Simultaneous Localisation and Mapping (SLAM) application, are presented. These results validate the performance of the HW/SW UKF and demonstrate its portability, particularly in small aerospace systems. Implementation (synthesis, timing, power) details for a variety of situations are presented and analysed to demonstrate how the HW/SW codesign can be scaled for any application