Search CORE

4,108 research outputs found

A temporally and spatially local spike-based backpropagation algorithm to enable training in hardware

Author: Biswas Anmol
Ganguly Udayan
Saraswat Vivek
Publication venue
Publication date: 24/08/2023
Field of study

Spiking Neural Networks (SNNs) have emerged as a hardware efficient architecture for classification tasks. The challenge of spike-based encoding has been the lack of a universal training mechanism performed entirely using spikes. There have been several attempts to adopt the powerful backpropagation (BP) technique used in non-spiking artificial neural networks (ANN): (1) SNNs can be trained by externally computed numerical gradients. (2) A major advancement towards native spike-based learning has been the use of approximate Backpropagation using spike-time dependent plasticity (STDP) with phased forward/backward passes. However, the transfer of information between such phases for gradient and weight update calculation necessitates external memory and computational access. This is a challenge for standard neuromorphic hardware implementations. In this paper, we propose a stochastic SNN based Back-Prop (SSNN-BP) algorithm that utilizes a composite neuron to simultaneously compute the forward pass activations and backward pass gradients explicitly with spikes. Although signed gradient values are a challenge for spike-based representation, we tackle this by splitting the gradient signal into positive and negative streams. We show that our method approaches BP ANN baseline with sufficiently long spike-trains. Finally, we show that the well-performing softmax cross-entropy loss function can be implemented through inhibitory lateral connections enforcing a Winner Take All (WTA) rule. Our SNN with a 2-layer network shows excellent generalization through comparable performance to ANNs with equivalent architecture and regularization parameters on static image datasets like MNIST, Fashion-MNIST, Extended MNIST, and temporally encoded image datasets like Neuromorphic MNIST datasets. Thus, SSNN-BP enables BP compatible with purely spike-based neuromorphic hardware

arXiv.org e-Print Archive

ApproxTrain: Fast Simulation of Approximate Multipliers for DNN Training and Inference

Author: Gamaarachchi Hasindu
Gong Jing
Hu Xiaobo Sharon
Javaid Haris
Parameswaran Sri
Saadat Hassaan
Publication venue
Publication date: 23/09/2022
Field of study

Edge training of Deep Neural Networks (DNNs) is a desirable goal for continuous learning; however, it is hindered by the enormous computational power required by training. Hardware approximate multipliers have shown their effectiveness for gaining resource-efficiency in DNN inference accelerators; however, training with approximate multipliers is largely unexplored. To build resource efficient accelerators with approximate multipliers supporting DNN training, a thorough evaluation of training convergence and accuracy for different DNN architectures and different approximate multipliers is needed. This paper presents ApproxTrain, an open-source framework that allows fast evaluation of DNN training and inference using simulated approximate multipliers. ApproxTrain is as user-friendly as TensorFlow (TF) and requires only a high-level description of a DNN architecture along with C/C++ functional models of the approximate multiplier. We improve the speed of the simulation at the multiplier level by using a novel LUT-based approximate floating-point (FP) multiplier simulator on GPU (AMSim). ApproxTrain leverages CUDA and efficiently integrates AMSim into the TensorFlow library, in order to overcome the absence of native hardware approximate multiplier in commercial GPUs. We use ApproxTrain to evaluate the convergence and accuracy of DNN training with approximate multipliers for small and large datasets (including ImageNet) using LeNets and ResNets architectures. The evaluations demonstrate similar convergence behavior and negligible change in test accuracy compared to FP32 and bfloat16 multipliers. Compared to CPU-based approximate multiplier simulations in training and inference, the GPU-accelerated ApproxTrain is more than 2500x faster. Based on highly optimized closed-source cuDNN/cuBLAS libraries with native hardware multipliers, the original TensorFlow is only 8x faster than ApproxTrain.Comment: 14 pages, 12 figure

arXiv.org e-Print Archive

On quantum backpropagation, information reuse, and cheating measurement collapse

Author: Abbas Amira
Gilboa Dar
Huang Hsin-Yuan
Huggins William J.
King Robbie
McClean Jarrod R.
Movassagh Ramis
Publication venue
Publication date: 22/05/2023
Field of study

The success of modern deep learning hinges on the ability to train neural networks at scale. Through clever reuse of intermediate information, backpropagation facilitates training through gradient computation at a total cost roughly proportional to running the function, rather than incurring an additional factor proportional to the number of parameters - which can now be in the trillions. Naively, one expects that quantum measurement collapse entirely rules out the reuse of quantum information as in backpropagation. But recent developments in shadow tomography, which assumes access to multiple copies of a quantum state, have challenged that notion. Here, we investigate whether parameterized quantum models can train as efficiently as classical neural networks. We show that achieving backpropagation scaling is impossible without access to multiple copies of a state. With this added ability, we introduce an algorithm with foundations in shadow tomography that matches backpropagation scaling in quantum resources while reducing classical auxiliary computational costs to open problems in shadow tomography. These results highlight the nuance of reusing quantum information for practical purposes and clarify the unique difficulties in training large quantum models, which could alter the course of quantum machine learning.Comment: 29 pages, 2 figure

arXiv.org e-Print Archive

Backpropagation of Unrolled Solvers with Folded Optimization

Author: Dinh My H.
Fioretto Ferdinando
Kotary James
Publication venue
Publication date: 04/09/2023
Field of study

The integration of constrained optimization models as components in deep networks has led to promising advances on many specialized learning tasks. A central challenge in this setting is backpropagation through the solution of an optimization problem, which typically lacks a closed form. One typical strategy is algorithm unrolling, which relies on automatic differentiation through the operations of an iterative solver. While flexible and general, unrolling can encounter accuracy and efficiency issues in practice. These issues can be avoided by analytical differentiation of the optimization, but current frameworks impose rigid requirements on the optimization problem's form. This paper provides theoretical insights into the backward pass of unrolled optimization, leading to a system for generating efficiently solvable analytical models of backpropagation. Additionally, it proposes a unifying view of unrolling and analytical differentiation through optimization mappings. Experiments over various model-based learning tasks demonstrate the advantages of the approach both computationally and in terms of enhanced expressiveness.Comment: Published in IJCA

arXiv.org e-Print Archive

Features and neural net recognition strategies for hand printed digits

Author: Pink Jeffrey R.
Publication venue: RIT Scholar Works
Publication date: 01/10/1995
Field of study

The thesis goal is to develop a computer system for hand printed digit recognition based on an investigation into various feature extractors and neural network strategies. Features such as subwindow pixel summation, moments, and orientation vectors will be among those investigated. Morphological thinning of characters prior to feature extraction will be used to assess the impact on network training and testing. Different strategies for implementing a multilayer perceptron neural network will be investigated. A high-level language called MatLab will be used for neural network algorithm development and quick prototyping. The feature extractors will be developed to operate on small (less than or equal to 256 bits) binary hand printed digits (0, 1, 2, 3, 4, 5, 6, 7, 8, 9)

RIT Scholar Works

Classification algorithms on the cell processor

Author: Wyganowski Mateusz
Publication venue: RIT Scholar Works
Publication date: 01/08/2008
Field of study

The rapid advancement in the capacity and reliability of data storage technology has allowed for the retention of virtually limitless quantity and detail of digital information. Massive information databases are becoming more and more widespread among governmental, educational, scientific, and commercial organizations. By segregating this data into carefully defined input (e.g.: images) and output (e.g.: classification labels) sets, a classification algorithm can be used develop an internal expert model of the data by employing a specialized training algorithm. A properly trained classifier is capable of predicting the output for future input data from the same input domain that it was trained on. Two popular classifiers are Neural Networks and Support Vector Machines. Both, as with most accurate classifiers, require massive computational resources to carry out the training step and can take months to complete when dealing with extremely large data sets. In most cases, utilizing larger training improves the final accuracy of the trained classifier. However, access to the kinds of computational resources required to do so is expensive and out of reach of private or under funded institutions. The Cell Broadband Engine (CBE), introduced by Sony, Toshiba, and IBM has recently been introduced into the market. The current most inexpensive iteration is available in the Sony Playstation 3 ® computer entertainment system. The CBE is a novel multi-core architecture which features many hardware enhancements designed to accelerate the processing of massive amounts of data. These characteristics and the cheap and widespread availability of this technology make the Cell a prime candidate for the task of training classifiers. In this work, the feasibility of the Cell processor in the use of training Neural Networks and Support Vector Machines was explored. In the Neural Network family of classifiers, the fully connected Multilayer Perceptron and Convolution Network were implemented. In the Support Vector Machine family, a Working Set technique known as the Gradient Projection-based Decomposition Technique, as well as the Cascade SVM were implemented

RIT Scholar Works

New methods for deep dictionary learning and for image completion

Author: Huang Junjie
Publication venue: Electrical and Electronic Engineering, Imperial College London
Publication date: 01/01/2020
Field of study

Digital imaging plays an essential role in many aspects of our daily life. However due to the hardware limitations of the imaging devices, the image measurements are usually inpaired and require further processing to enhance the quality of the raw images in order to enable applications on the user side. Image enhancement aims to improve the information content within image measurements by exploiting the properties of the target image and the forward model of the imaging device. In this thesis, we aim to tackle two specific image enhancement problems, that is, single image super-resolution and image completion. First, we present a new Deep Analysis Dictionary Model (DeepAM) which consists of multiple layers of analysis dictionaries with associated soft-thresholding operators and a single layer of synthesis dictionary for single image super-resolution. To achieve an effective deep model, each analysis dictionary has been designed to be composed of an Information Preserving Analysis Dictionary (IPAD) which passes essential information from the input signal to output and a Clustering Analysis Dictionary (CAD) which generates discriminative feature representation. The parameters of the deep analysis dictionary model are optimized using a layer-wise learning strategy. We demonstrate that both the proposed deep dictionary design and the learning algorithm are effective. Simulation results show that the proposed method achieves comparable performance with Deep Neural Networks and other existing methods. We then generalize DeepAM to a Deep Convolutional Analysis Dictionary Model (DeepCAM) by learning convolutional dictionaries instead of unstructured dictionaries. The convolutional dictionary is more suitable for processing high-dimensional signals like images and has only a small number of free parameters. By exploiting the properties of a convolutional dictionary, we present an efficient convolutional analysis dictionary learning algorithm. The IPAD and the CAD parts are learned using variations of the proposed convolutional analysis dictionary learning algorithm. We demonstrate that DeepCAM is an effective multi-layer convolutional model and achieves better performance than DeepAM while using a smaller number of parameters. Finally, we present an image completion algorithm based on dense correspondence between the input image and an exemplar image retrieved from Internet which has been taken at a similar position. The dense correspondence which is estimated using a hierarchical PatchMatch algorithm is usually noisy and with a large occlusion area corresponding to the region to be completed. By modelling the dense correspondence as a smooth field, an Expectation-Maximization (EM) based method is presented to interpolate a smooth field over the occlusion area which is then used to transfer image content from the exemplar image to the input image. Color correction is further applied to diminish the possible color differences between the input image and the exemplar image. Numerical results demonstrate that the proposed image completion algorithm is able to achieve photo realistic image completion results.Open Acces

Spiral - Imperial College Digital Repository

Efficient Fuel Consumption Minimization for Green Vehicle Routing Problems using a Hybrid Neural Network-Optimization Algorithm

Author: Fossum Astrid
Publication venue: UiT Norges arktiske universitet
Publication date: 01/06/2023
Field of study

Efficient routing optimization yields benefits that extend beyond mere financial gains. In this thesis, we present a methodology that utilizes a graph convolutional neural network to facilitate the development of energy-efficient waste collection routes. Our approach focuses on a Waste company in Tromsø, Remiks, and uses real-life datasets, ensuring practicability and ease of implementation. In particular, we extend the dpdp algorithm introduced by Kool et al. (2021) [1] to minimize fuel consumption and devise routes that account for the impact of elevation and real road distance traveled. Our findings shed light on the potential advantages and enhancements these optimized routes can offer Remiks, including improved effectiveness and cost savings. Additionally, we identify key areas for future research and development

Munin - Open Research Archive