Search CORE

2,089 research outputs found

FPSA: A Full System Stack Solution for Reconfigurable ReRAM-based NN Accelerator Architecture

Author: Hu Xing
Ji Yu
Li Shuangchen
Wang Peiqi
Xie Xinfeng
Xie Yuan
Zhang Youhui
Zhang Youyang
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 27/01/2019
Field of study

Neural Network (NN) accelerators with emerging ReRAM (resistive random access memory) technologies have been investigated as one of the promising solutions to address the \textit{memory wall} challenge, due to the unique capability of \textit{processing-in-memory} within ReRAM-crossbar-based processing elements (PEs). However, the high efficiency and high density advantages of ReRAM have not been fully utilized due to the huge communication demands among PEs and the overhead of peripheral circuits. In this paper, we propose a full system stack solution, composed of a reconfigurable architecture design, Field Programmable Synapse Array (FPSA) and its software system including neural synthesizer, temporal-to-spatial mapper, and placement & routing. We highly leverage the software system to make the hardware design compact and efficient. To satisfy the high-performance communication demand, we optimize it with a reconfigurable routing architecture and the placement & routing tool. To improve the computational density, we greatly simplify the PE circuit with the spiking schema and then adopt neural synthesizer to enable the high density computation-resources to support different kinds of NN operations. In addition, we provide spiking memory blocks (SMBs) and configurable logic blocks (CLBs) in hardware and leverage the temporal-to-spatial mapper to utilize them to balance the storage and computation requirements of NN. Owing to the end-to-end software system, we can efficiently deploy existing deep neural networks to FPSA. Evaluations show that, compared to one of state-of-the-art ReRAM-based NN accelerators, PRIME, the computational density of FPSA improves by 31x; for representative NNs, its inference performance can achieve up to 1000x speedup.Comment: Accepted by ASPLOS 201

arXiv.org e-Print Archive

Crossref

A committee machine gas identification system based on dynamically reconfigurable FPGA

Author: Amira A
Bermak A
Brahim-Belhouari S
Chandrasekaran S
Shi M
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2008
Field of study

This paper proposes a gas identification system based on the committee machine (CM) classifier, which combines various gas identification algorithms, to obtain a unified decision with improved accuracy. The CM combines five different classifiers: K nearest neighbors (KNNs), multilayer perceptron (MLP), radial basis function (RBF), Gaussian mixture model (GMM), and probabilistic principal component analysis (PPCA). Experiments on real sensors' data proved the effectiveness of our system with an improved accuracy over individual classifiers. Due to the computationally intensive nature of CM, its implementation requires significant hardware resources. In order to overcome this problem, we propose a novel time multiplexing hardware implementation using a dynamically reconfigurable field programmable gate array (FPGA) platform. The processing is divided into three stages: sampling and preprocessing, pattern recognition, and decision stage. Dynamically reconfigurable FPGA technique is used to implement the system in a sequential manner, thus using limited hardware resources of the FPGA chip. The system is successfully tested for combustible gas identification application using our in-house tin-oxide gas sensors

Crossref

Hong Kong University of Science and Technology Institutional Repository

Brunel University Research Archive

An area-efficient 2-D convolution implementation on FPGA for space applications

Author: Di Carlo Stefano
Gambardella Giulio
Indaco Marco
Prinetto Paolo Ernesto
Rolfo Daniele
Tiotto Gabriele
Publication venue: IEEE Computer Society
Publication date: 01/01/2011
Field of study

The 2-D Convolution is an algorithm widely used in image and video processing. Although its computation is simple, its implementation requires a high computational power and an intensive use of memory. Field Programmable Gate Arrays (FPGA) architectures were proposed to accelerate calculations of 2-D Convolution and the use of buffers implemented on FPGAs are used to avoid direct memory access. In this paper we present an implementation of the 2-D Convolution algorithm on a FPGA architecture designed to support this operation in space applications. This proposed solution dramatically decreases the area needed keeping good performance, making it appropriate for embedded systems in critical space application

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Accelerated hardware video object segmentation: From foreground detection to connected components labelling

Author: Andrew Hunter
Batlle
Boykov
Cucchiara
Haralick
Hongying Meng
Kofi Appiah
Patrick Dickinson
Rosenfeld
Stauffer
Zhou
Zhou
Publication venue: 'Elsevier BV'
Publication date: 01/01/2010
Field of study

This is the preprint version of the Article - Copyright @ 2010 ElsevierThis paper demonstrates the use of a single-chip FPGA for the segmentation of moving objects in a video sequence. The system maintains highly accurate background models, and integrates the detection of foreground pixels with the labelling of objects using a connected components algorithm. The background models are based on 24-bit RGB values and 8-bit gray scale intensity values. A multimodal background differencing algorithm is presented, using a single FPGA chip and four blocks of RAM. The real-time connected component labelling algorithm, also designed for FPGA implementation, run-length encodes the output of the background subtraction, and performs connected component analysis on this representation. The run-length encoding, together with other parts of the algorithm, is performed in parallel; sequential operations are minimized as the number of run-lengths are typically less than the number of pixels. The two algorithms are pipelined together for maximum efficiency

University of Lincoln Institutional Repository

Crossref

Nottingham Trent Institutional Repository (IRep)

Sheffield Hallam University Research Archive

Brunel University Research Archive

FPGA implementation of real-time human motion recognition on a reconfigurable video processing architecture

Author: Bailey Chris
Freeman Micheal
Meng Hongying
Pears Nick
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2008
Field of study

In recent years, automatic human motion recognition has been widely researched within the computer vision and image processing communities. Here we propose a real-time embedded vision solution for human motion recognition implemented on a ubiquitous device. There are three main contributions in this paper. Firstly, we have developed a fast human motion recognition system with simple motion features and a linear Support Vector Machine(SVM) classifier. The method has been tested on a large, public human action dataset and achieved competitive performance for the temporal template (eg. ``motion history image") class of approaches. Secondly, we have developed a reconfigurable, FPGA based video processing architecture. One advantage of this architecture is that the system processing performance can be reconfigured for a particular application, with the addition of new or replicated processing cores. Finally, we have successfully implemented a human motion recognition system on this reconfigurable architecture. With a small number of human actions (hand gestures), this stand-alone system is performing reliably, with an 80% average recognition rate using limited training data. This type of system has applications in security systems, man-machine communications and intelligent environments

University of Lincoln Institutional Repository

Real-time human action recognition on an embedded, reconfigurable video processing architecture

Author: A. Aizerman
A.F. Bobick
B. Farnell
Chris Bailey
G.R. Bradski
Hongying Meng
J.K. Aggarwal
Michael Freeman
Nick Pears
T. Moeslund
T. Ogata
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

Copyright @ 2008 Springer-Verlag.In recent years, automatic human motion recognition has been widely researched within the computer vision and image processing communities. Here we propose a real-time embedded vision solution for human motion recognition implemented on a ubiquitous device. There are three main contributions in this paper. Firstly, we have developed a fast human motion recognition system with simple motion features and a linear Support Vector Machine (SVM) classifier. The method has been tested on a large, public human action dataset and achieved competitive performance for the temporal template (eg. “motion history image”) class of approaches. Secondly, we have developed a reconfigurable, FPGA based video processing architecture. One advantage of this architecture is that the system processing performance can be reconfiured for a particular application, with the addition of new or replicated processing cores. Finally, we have successfully implemented a human motion recognition system on this reconfigurable architecture. With a small number of human actions (hand gestures), this stand-alone system is performing reliably, with an 80% average recognition rate using limited training data. This type of system has applications in security systems, man-machine communications and intelligent environments.DTI and Broadcom Ltd

Crossref

UCL Discovery

Brunel University Research Archive

Optimal load shedding for microgrids with unlimited DGs

Author: Mustapha Nik Mohd. Zaitul Akmal
Publication venue
Publication date: 01/12/2013
Field of study

Recent years, increasing trends on electrical supply demand, make us to search for the new alternative in supplying the electrical power. A study in micro grid system with embedded Distribution Generations (DGs) to the system is rapidly increasing. Micro grid system basically is design either operate in islanding mode or interconnect with the main grid system. In any condition, the system must have reliable power supply and operating at low transmission power loss. During the emergency state such as outages of power due to electrical or mechanical faults in the system, it is important for the system to shed any load in order to maintain the system stability and security. In order to reduce the transmission loss, it is very important to calculate best size of the DGs as well as to find the best positions in locating the DG itself.. Analytical Hierarchy Process (AHP) has been applied to find and calculate the load shedding priorities based on decision alternatives which have been made. The main objective of this project is to optimize the load shedding in the micro grid system with unlimited DG’s by applied optimization technique Gravitational Search Algorithm (GSA). The technique is used to optimize the placement and sizing of DGs, as well as to optimal the load shedding. Several load shedding schemes have been proposed and studied in this project such as load shedding with fixed priority index, without priority index and with dynamic priority index. The proposed technique was tested on the IEEE 69 Test Bus Distribution system

UTHM Institutional Repository