457 research outputs found
Accelerating Deterministic and Stochastic Binarized Neural Networks on FPGAs Using OpenCL
Recent technological advances have proliferated the available computing
power, memory, and speed of modern Central Processing Units (CPUs), Graphics
Processing Units (GPUs), and Field Programmable Gate Arrays (FPGAs).
Consequently, the performance and complexity of Artificial Neural Networks
(ANNs) is burgeoning. While GPU accelerated Deep Neural Networks (DNNs)
currently offer state-of-the-art performance, they consume large amounts of
power. Training such networks on CPUs is inefficient, as data throughput and
parallel computation is limited. FPGAs are considered a suitable candidate for
performance critical, low power systems, e.g. the Internet of Things (IOT) edge
devices. Using the Xilinx SDAccel or Intel FPGA SDK for OpenCL development
environment, networks described using the high-level OpenCL framework can be
accelerated on heterogeneous platforms. Moreover, the resource utilization and
power consumption of DNNs can be further enhanced by utilizing regularization
techniques that binarize network weights. In this paper, we introduce, to the
best of our knowledge, the first FPGA-accelerated stochastically binarized DNN
implementations, and compare them to implementations accelerated using both
GPUs and FPGAs. Our developed networks are trained and benchmarked using the
popular MNIST and CIFAR-10 datasets, and achieve near state-of-the-art
performance, while offering a >16-fold improvement in power consumption,
compared to conventional GPU-accelerated networks. Both our FPGA-accelerated
determinsitic and stochastic BNNs reduce inference times on MNIST and CIFAR-10
by >9.89x and >9.91x, respectively.Comment: 4 pages, 3 figures, 1 tabl
Efficient Design of Triplet Based Spike-Timing Dependent Plasticity
Spike-Timing Dependent Plasticity (STDP) is believed to play an important
role in learning and the formation of computational function in the brain. The
classical model of STDP which considers the timing between pairs of
pre-synaptic and post-synaptic spikes (p-STDP) is incapable of reproducing
synaptic weight changes similar to those seen in biological experiments which
investigate the effect of either higher order spike trains (e.g. triplet and
quadruplet of spikes), or, simultaneous effect of the rate and timing of spike
pairs on synaptic plasticity. In this paper, we firstly investigate synaptic
weight changes using a p-STDP circuit and show how it fails to reproduce the
mentioned complex biological experiments. We then present a new STDP VLSI
circuit which acts based on the timing among triplets of spikes (t-STDP) that
is able to reproduce all the mentioned experimental results. We believe that
our new STDP VLSI circuit improves upon previous circuits, whose learning
capacity exceeds current designs due to its capability of mimicking the
outcomes of biological experiments more closely; thus plays a significant role
in future VLSI implementation of neuromorphic systems
Design and Implementation of BCM Rule Based on Spike-Timing Dependent Plasticity
The Bienenstock-Cooper-Munro (BCM) and Spike Timing-Dependent Plasticity
(STDP) rules are two experimentally verified form of synaptic plasticity where
the alteration of synaptic weight depends upon the rate and the timing of pre-
and post-synaptic firing of action potentials, respectively. Previous studies
have reported that under specific conditions, i.e. when a random train of
Poissonian distributed spikes are used as inputs, and weight changes occur
according to STDP, it has been shown that the BCM rule is an emergent property.
Here, the applied STDP rule can be either classical pair-based STDP rule, or
the more powerful triplet-based STDP rule. In this paper, we demonstrate the
use of two distinct VLSI circuit implementations of STDP to examine whether BCM
learning is an emergent property of STDP. These circuits are stimulated with
random Poissonian spike trains. The first circuit implements the classical
pair-based STDP, while the second circuit realizes a previously described
triplet-based STDP rule. These two circuits are simulated using 0.35 um CMOS
standard model in HSpice simulator. Simulation results demonstrate that the
proposed triplet-based STDP circuit significantly produces the threshold-based
behaviour of the BCM. Also, the results testify to similar behaviour for the
VLSI circuit for pair-based STDP in generating the BCM
Automated machine learning for healthcare and clinical notes analysis
Machine learning (ML) has been slowly entering every aspect of our lives and its positive impact has been astonishing. To accelerate embedding ML in more applications and incorporating it in real-world scenarios, automated machine learning (AutoML) is emerging. The main purpose of AutoML is to provide seamless integration of ML in various industries, which will facilitate better outcomes in everyday tasks. In healthcare, AutoML has been already applied to easier settings with structured data such as tabular lab data. However, there is still a need for applying AutoML for interpreting medical text, which is being generated at a tremendous rate. For this to happen, a promising method is AutoML for clinical notes analysis, which is an unexplored research area representing a gap in ML research. The main objective of this paper is to fill this gap and provide a comprehensive survey and analytical study towards AutoML for clinical notes. To that end, we first introduce the AutoML technology and review its various tools and techniques. We then survey the literature of AutoML in the healthcare industry and discuss the developments specific to clinical settings, as well as those using general AutoML tools for healthcare applications. With this background, we then discuss challenges of working with clinical notes and highlight the benefits of developing AutoML for medical notes processing. Next, we survey relevant ML research for clinical notes and analyze the literature and the field of AutoML in the healthcare industry. Furthermore, we propose future research directions and shed light on the challenges and opportunities this emerging field holds. With this, we aim to assist the community with the implementation of an AutoML platform for medical notes, which if realized can revolutionize patient outcomes
Neuromorphic engineering: neuromimetic computation for understanding the brain
Neuromorphic engineering attempts to understand the computational properties of neural processing systems by building electronic circuits and systems that emulate the principles of computation in the neural systems. The electronic systems that are developed in this process can serve both engineering and life sciences in various ways ranging from low-power brain-like computing embedded systems to neural-based control, brain machine interfaces, and neuroprosthesis. To realize such systems, various approaches and strategies with their own advantages and limitations, may be adopted. Here, we provide a summary of our recent article published in the proceedings of the IEEE [1], where we have discussed and reviewed the various approaches to the design and implementation of neuromorphic learning systems, and pointed out challenges and opportunities in these systems
Neuromorphic engineering: neuromimetic computation for understanding the brain
Neuromorphic engineering attempts to understand the computational properties of neural processing systems by building electronic circuits and systems that emulate the principles of computation in the neural systems. The electronic systems that are developed in this process can serve both engineering and life sciences in various ways ranging from low-power brain-like computing embedded systems to neural-based control, brain machine interfaces, and neuroprosthesis. To realize such systems, various approaches and strategies with their own advantages and limitations, may be adopted. Here, we provide a summary of our recent article published in the proceedings of the IEEE [1], where we have discussed and reviewed the various approaches to the design and implementation of neuromorphic learning systems, and pointed out challenges and opportunities in these systems
Design and analysis of efficient QCA reversible adders
Quantum-dot cellular automata (QCA) as an emerging nanotechnology are envisioned to overcome the scaling and the heat dissipation issues of the current CMOS technology. In a QCA structure, information destruction plays an essential role in the overall heat dissipation, and in turn in the power consumption of the system. Therefore, reversible logic, which significantly controls the information flow of the system, is deemed suitable to achieve ultra-low-power structures. In order to benefit from the opportunities QCA and reversible logic provide, in this paper, we first review and implement prior reversible full-adder art in QCA. We then propose a novel reversible design based on three- and five-input majority gates, and a robust one-layer crossover scheme. The new full-adder significantly advances previous designs in terms of the optimization metrics, namely cell count, area, and delay. The proposed efficient full-adder is then used to design reversible ripple-carry adders (RCAs) with different sizes (i.e., 4, 8, and 16 bits). It is demonstrated that the new RCAs lead to 33% less garbage outputs, which can be essential in terms of lowering power consumption. This along with the achieved improvements in area, complexity, and delay introduces an ultra-efficient reversible QCA adder that can be beneficial in developing future computer arithmetic circuits and architecture
Training Progressively Binarizing Deep Networks Using FPGAs
While hardware implementations of inference routines for Binarized Neural
Networks (BNNs) are plentiful, current realizations of efficient BNN hardware
training accelerators, suitable for Internet of Things (IoT) edge devices,
leave much to be desired. Conventional BNN hardware training accelerators
perform forward and backward propagations with parameters adopting binary
representations, and optimization using parameters adopting floating or
fixed-point real-valued representations--requiring two distinct sets of network
parameters. In this paper, we propose a hardware-friendly training method that,
contrary to conventional methods, progressively binarizes a singular set of
fixed-point network parameters, yielding notable reductions in power and
resource utilizations. We use the Intel FPGA SDK for OpenCL development
environment to train our progressively binarizing DNNs on an OpenVINO FPGA. We
benchmark our training approach on both GPUs and FPGAs using CIFAR-10 and
compare it to conventional BNNs.Comment: Accepted at 2020 IEEE International Symposium on Circuits and Systems
(ISCAS
Semi-supervised and weakly-supervised deep neural networks and dataset for fish detection in turbid underwater videos
Fish are key members of marine ecosystems, and they have a significant share in the healthy human diet. Besides, fish abundance is an excellent indicator of water quality, as they have adapted to various levels of oxygen, turbidity, nutrients, and pH. To detect various fish in underwater videos, Deep Neural Networks (DNNs) can be of great assistance. However, training DNNs is highly dependent on large, labeled datasets, while labeling fish in turbid underwater video frames is a laborious and time-consuming task, hindering the development of accurate and efficient models for fish detection. To address this problem, firstly, we have collected a dataset called FishInTurbidWater, which consists of a collection of video footage gathered from turbid waters, and quickly and weakly (i.e., giving higher priority to speed over accuracy) labeled them in a 4-times fast-forwarding software. Next, we designed and implemented a semi-supervised contrastive learning fish detection model that is self-supervised using unlabeled data, and then fine-tuned with a small fraction (20%) of our weakly labeled FishInTurbidWater data. At the next step, we trained, using our weakly labeled data, a novel weakly-supervised ensemble DNN with transfer learning from ImageNet. The results show that our semi-supervised contrastive model leads to more than 20 times faster turnaround time between dataset collection and result generation, with reasonably high accuracy (89%). At the same time, the proposed weakly-supervised ensemble model can detect fish in turbid waters with high (94%) accuracy, while still cutting the development time by a factor of four, compared to fully-supervised models trained on carefully labeled datasets. Our dataset and code are publicly available at the hyperlink FishInTurbidWater
Variation-aware binarized memristive networks
The quantization of weights to binary states in Deep Neural Networks (DNNs) can replace resource-hungry multiply accumulate operations with simple accumulations. Such Binarized Neural Networks (BNNs) exhibit greatly reduced resource and power requirements. In addition, memristors have been shown as promising synaptic weight elements in DNNs. In this paper, we propose and simulate novel Binarized Memristive Convolutional Neural Network (BMCNN) architectures employing hybrid weight and parameter representations. We train the proposed architectures offline and then map the trained parameters to our binarized memristive devices for inference. To take into account the variations in memristive devices, and to study their effect on the performance, we introduce variations in R ON and R OFF . Moreover, we introduce means to mitigate the adverse effect of memristive variations in our proposed networks. Finally, we benchmark our BMCNNs and variation-aware BMCNNs using the MNIST dataset
- …