197 research outputs found
Comprehensive Evaluation of Supply Voltage Underscaling in FPGA on-Chip Memories
In this work, we evaluate aggressive undervolting, i.e., voltage scaling below the nominal level to reduce the energy consumption of Field Programmable Gate Arrays (FPGAs). Usually, voltage guardbands are added by chip vendors to ensure the worst-case process and environmental scenarios. Through experimenting on several FPGA architectures, we measure this voltage guardband to be on average 39% of the nominal level, which in turn, delivers more than an order of magnitude power savings. However, further undervolting below the voltage guardband may cause reliability issues as the result of the circuit delay increase, i.e., start to appear faults. We extensively characterize the behavior of these faults in terms of the rate, location, type, as well as sensitivity to environmental temperature, with a concentration of on-chip memories, or Block RAMs (BRAMs). Finally, we evaluate a typical FPGA-based Neural Network (NN) accelerator under low-voltage BRAM operations. In consequence, the substantial NN energy savings come with the cost of NN accuracy loss. To attain power savings without NN accuracy loss, we propose a novel technique that relies on the deterministic behavior of undervolting faults and can limit the accuracy loss to 0.1% without any timing-slack overhead.Peer ReviewedPostprint (author's final draft
Design for novel enhanced weightless neural network and multi-classifier.
Weightless neural systems have often struggles in terms of speed, performances, and memory issues. There is also lack of sufficient interfacing of weightless neural systems to others systems. Addressing these issues motivates and forms the aims and objectives of this thesis. In addressing these issues, algorithms are formulated, classifiers, and multi-classifiers are designed, and hardware design of classifier are also reported. Specifically, the purpose of this thesis is to report on the algorithms and designs of weightless neural systems.
A background material for the research is a weightless neural network known as Probabilistic Convergent Network (PCN). By introducing two new and different interfacing method, the word "Enhanced" is added to PCN thereby giving it the name Enhanced Probabilistic Convergent Network (EPCN). To solve the problem of speed and performances when large-class databases are employed in data analysis, multi-classifiers are designed whose composition vary depending on problem complexity. It also leads to the introduction of a novel gating function with application of EPCN as an intelligent combiner. For databases which are not very large, single classifiers suffices. Speed and ease of application in adverse condition were considered as improvement which has led to the design of EPCN in hardware. A novel hashing function is implemented and tested on hardware-based EPCN.
Results obtained have indicated the utility of employing weightless neural systems. The results obtained also indicate significant new possible areas of application of weightless neural systems
Survey of FPGA applications in the period 2000 – 2015 (Technical Report)
Romoth J, Porrmann M, Rückert U. Survey of FPGA applications in the period 2000 – 2015 (Technical Report).; 2017.Since their introduction, FPGAs can be seen in more and more different fields of applications. The key advantage is the combination of software-like flexibility with the performance otherwise common to hardware. Nevertheless, every application field introduces special requirements to the used computational architecture. This paper provides an overview of the different topics FPGAs have been used for in the last 15 years of research and why they have been chosen over other processing units like e.g. CPUs
Nn-X - a hardware accelerator for convolutional neural networks
Convolutional neural networks (ConvNets) are hierarchical models of the mammalian visual cortex. These models have been increasingly used in computer vision to perform object recognition and full scene understanding. ConvNets consist of multiple layers that contain groups of artificial neurons, which are mathematical approximations of biological neurons. A ConvNet can consist of millions of neurons and require billions of computations to produce one output. ^ Currently, giant server farms are used to process information in real time. These supercomputers require a large amount of power and a constant link to the end-user. Low powered embedded systems are not able to run convolutional neural networks in real time. Thus, using these systems on mobile platforms or on platforms where a connection to an off-site server is not guaranteed, is unfeasible. ^ In this work we present nn-X — a scalable hardware architecture capable of processing ConvNets in real time. We evaluate the performance and power consumption of the aforementioned architecture and compare it with systems typically used to process convolutional neural networks. Our system is prototyped on the Xilinx Zynq XC7Z045 device. On this device, we are able to achieve a peak performance of 227 GOPs/s, a measured performance of up to 200 GOPs/s while consuming less than 3 W of power. This translates to a performance per power improvement of up to 10 times that of conventional embedded systems and up to 25 times that of performance systems like desktops and GPUs
Field programmable gate array based sigmoid function implementation using differential lookup table and second order nonlinear function
Artificial neural network (ANN) is an established artificial intelligence technique that is widely used for solving numerous problems such as classification and clustering in various fields. However, the major problem with ANN is a factor of time. ANN takes a longer time to execute a huge number of neurons. In order to overcome this, ANN is implemented into hardware namely field-programmable-gate-array (FPGA). However, implementing the ANN into a field-programmable gate array (FPGA) has led to a new problem related to the sigmoid function implementation. Often used as the activation function for ANN, a sigmoid function cannot be directly implemented in FPGA. Owing to its accuracy, the lookup table (LUT) has always been used to implement the sigmoid function in FPGA. In this case, obtaining the high accuracy of LUT is expensive particularly in terms of its memory requirements in FPGA. Second-order nonlinear function (SONF) is an appealing replacement for LUT due to its small memory requirement. Although there is a trade-off between accuracy and memory size. Taking the advantage of the aforementioned approaches, this thesis proposed a combination of SONF and a modified LUT namely differential lookup table (dLUT). The deviation values between SONF and sigmoid function are used to create the dLUT. SONF is used as the first step to approximate the sigmoid function. Then it is followed by adding or deducting with the value that has been stored in the dLUT as a second step as demonstrated via simulation. This combination has successfully reduced the deviation value. The reduction value is significant as compared to previous implementations such as SONF, and LUT itself. Further simulation has been carried out to evaluate the accuracy of the ANN in detecting the object in an indoor environment by using the proposed method as a sigmoid function. The result has proven that the proposed method has produced the output almost as accurately as software implementation in detecting the target in indoor positioning problems. Therefore, the proposed method can be applied in any field that demands higher processing and high accuracy in sigmoid function outpu
Unsupervised Heart-rate Estimation in Wearables With Liquid States and A Probabilistic Readout
Heart-rate estimation is a fundamental feature of modern wearable devices. In
this paper we propose a machine intelligent approach for heart-rate estimation
from electrocardiogram (ECG) data collected using wearable devices. The novelty
of our approach lies in (1) encoding spatio-temporal properties of ECG signals
directly into spike train and using this to excite recurrently connected
spiking neurons in a Liquid State Machine computation model; (2) a novel
learning algorithm; and (3) an intelligently designed unsupervised readout
based on Fuzzy c-Means clustering of spike responses from a subset of neurons
(Liquid states), selected using particle swarm optimization. Our approach
differs from existing works by learning directly from ECG signals (allowing
personalization), without requiring costly data annotations. Additionally, our
approach can be easily implemented on state-of-the-art spiking-based
neuromorphic systems, offering high accuracy, yet significantly low energy
footprint, leading to an extended battery life of wearable devices. We
validated our approach with CARLsim, a GPU accelerated spiking neural network
simulator modeling Izhikevich spiking neurons with Spike Timing Dependent
Plasticity (STDP) and homeostatic scaling. A range of subjects are considered
from in-house clinical trials and public ECG databases. Results show high
accuracy and low energy footprint in heart-rate estimation across subjects with
and without cardiac irregularities, signifying the strong potential of this
approach to be integrated in future wearable devices.Comment: 51 pages, 12 figures, 6 tables, 95 references. Under submission at
Elsevier Neural Network
Energy Efficient Computing with Time-Based Digital Circuits
University of Minnesota Ph.D. dissertation. May 2019. Major: Electrical Engineering. Advisor: Chris Kim. 1 computer file (PDF); xv, 150 pages.Advancements in semiconductor technology have given the world economical, abundant, and reliable computing resources which have enabled countless breakthroughs in science, medicine, and agriculture which have improved the lives of many. Due to physics, the rate of these advancements is slowing, while the demand for the increasing computing horsepower ever grows. Novel computer architectures that leverage the foundation of conventional systems must become mainstream to continue providing the improved hardware required by engineers, scientists, and governments to innovate. This thesis provides a path forward by introducing multiple time-based computing architectures for a diverse range of applications. Simply put, time-based computing encodes the output of the computation in the time it takes to generate the result. Conventional systems encode this information in voltages across multiple signals; the performance of these systems is tightly coupled to improvements in semiconductor technology. Time-based computing elegantly uses the simplest of components from conventional systems to efficiently compute complex results. Two time-based neuromorphic computing platforms, based on a ring oscillator and a digital delay line, are described. An analog-to-digital converter is designed in the time domain using a beat frequency circuit which is used to record brain activity. A novel path planning architecture, with designs for 2D and 3D routes, is implemented in the time domain. Finally, a machine learning application using time domain inputs enables improved performance of heart rate prediction, biometric identification, and introduces a new method for using machine learning to predict temporal signal sequences. As these innovative architectures are presented, it will become clear the way forward will be increasingly enabled with time-based designs
- …