4,108 research outputs found
A temporally and spatially local spike-based backpropagation algorithm to enable training in hardware
Spiking Neural Networks (SNNs) have emerged as a hardware efficient
architecture for classification tasks. The challenge of spike-based encoding
has been the lack of a universal training mechanism performed entirely using
spikes. There have been several attempts to adopt the powerful backpropagation
(BP) technique used in non-spiking artificial neural networks (ANN): (1) SNNs
can be trained by externally computed numerical gradients. (2) A major
advancement towards native spike-based learning has been the use of approximate
Backpropagation using spike-time dependent plasticity (STDP) with phased
forward/backward passes. However, the transfer of information between such
phases for gradient and weight update calculation necessitates external memory
and computational access. This is a challenge for standard neuromorphic
hardware implementations. In this paper, we propose a stochastic SNN based
Back-Prop (SSNN-BP) algorithm that utilizes a composite neuron to
simultaneously compute the forward pass activations and backward pass gradients
explicitly with spikes. Although signed gradient values are a challenge for
spike-based representation, we tackle this by splitting the gradient signal
into positive and negative streams. We show that our method approaches BP ANN
baseline with sufficiently long spike-trains. Finally, we show that the
well-performing softmax cross-entropy loss function can be implemented through
inhibitory lateral connections enforcing a Winner Take All (WTA) rule. Our SNN
with a 2-layer network shows excellent generalization through comparable
performance to ANNs with equivalent architecture and regularization parameters
on static image datasets like MNIST, Fashion-MNIST, Extended MNIST, and
temporally encoded image datasets like Neuromorphic MNIST datasets. Thus,
SSNN-BP enables BP compatible with purely spike-based neuromorphic hardware
ApproxTrain: Fast Simulation of Approximate Multipliers for DNN Training and Inference
Edge training of Deep Neural Networks (DNNs) is a desirable goal for
continuous learning; however, it is hindered by the enormous computational
power required by training. Hardware approximate multipliers have shown their
effectiveness for gaining resource-efficiency in DNN inference accelerators;
however, training with approximate multipliers is largely unexplored. To build
resource efficient accelerators with approximate multipliers supporting DNN
training, a thorough evaluation of training convergence and accuracy for
different DNN architectures and different approximate multipliers is needed.
This paper presents ApproxTrain, an open-source framework that allows fast
evaluation of DNN training and inference using simulated approximate
multipliers. ApproxTrain is as user-friendly as TensorFlow (TF) and requires
only a high-level description of a DNN architecture along with C/C++ functional
models of the approximate multiplier. We improve the speed of the simulation at
the multiplier level by using a novel LUT-based approximate floating-point (FP)
multiplier simulator on GPU (AMSim). ApproxTrain leverages CUDA and efficiently
integrates AMSim into the TensorFlow library, in order to overcome the absence
of native hardware approximate multiplier in commercial GPUs. We use
ApproxTrain to evaluate the convergence and accuracy of DNN training with
approximate multipliers for small and large datasets (including ImageNet) using
LeNets and ResNets architectures. The evaluations demonstrate similar
convergence behavior and negligible change in test accuracy compared to FP32
and bfloat16 multipliers. Compared to CPU-based approximate multiplier
simulations in training and inference, the GPU-accelerated ApproxTrain is more
than 2500x faster. Based on highly optimized closed-source cuDNN/cuBLAS
libraries with native hardware multipliers, the original TensorFlow is only 8x
faster than ApproxTrain.Comment: 14 pages, 12 figure
On quantum backpropagation, information reuse, and cheating measurement collapse
The success of modern deep learning hinges on the ability to train neural
networks at scale. Through clever reuse of intermediate information,
backpropagation facilitates training through gradient computation at a total
cost roughly proportional to running the function, rather than incurring an
additional factor proportional to the number of parameters - which can now be
in the trillions. Naively, one expects that quantum measurement collapse
entirely rules out the reuse of quantum information as in backpropagation. But
recent developments in shadow tomography, which assumes access to multiple
copies of a quantum state, have challenged that notion. Here, we investigate
whether parameterized quantum models can train as efficiently as classical
neural networks. We show that achieving backpropagation scaling is impossible
without access to multiple copies of a state. With this added ability, we
introduce an algorithm with foundations in shadow tomography that matches
backpropagation scaling in quantum resources while reducing classical auxiliary
computational costs to open problems in shadow tomography. These results
highlight the nuance of reusing quantum information for practical purposes and
clarify the unique difficulties in training large quantum models, which could
alter the course of quantum machine learning.Comment: 29 pages, 2 figure
Backpropagation of Unrolled Solvers with Folded Optimization
The integration of constrained optimization models as components in deep
networks has led to promising advances on many specialized learning tasks. A
central challenge in this setting is backpropagation through the solution of an
optimization problem, which typically lacks a closed form. One typical strategy
is algorithm unrolling, which relies on automatic differentiation through the
operations of an iterative solver. While flexible and general, unrolling can
encounter accuracy and efficiency issues in practice. These issues can be
avoided by analytical differentiation of the optimization, but current
frameworks impose rigid requirements on the optimization problem's form. This
paper provides theoretical insights into the backward pass of unrolled
optimization, leading to a system for generating efficiently solvable
analytical models of backpropagation. Additionally, it proposes a unifying view
of unrolling and analytical differentiation through optimization mappings.
Experiments over various model-based learning tasks demonstrate the advantages
of the approach both computationally and in terms of enhanced expressiveness.Comment: Published in IJCA
Features and neural net recognition strategies for hand printed digits
The thesis goal is to develop a computer system for hand printed digit recognition based on an investigation into various feature extractors and neural network strategies. Features such as subwindow pixel summation, moments, and orientation vectors will be among those investigated. Morphological thinning of characters prior to feature extraction will be used to assess the impact on network training and testing. Different strategies for implementing a multilayer perceptron neural network will be investigated. A high-level language called MatLab will be used for neural network algorithm development and quick prototyping. The feature extractors will be developed to operate on small (less than or equal to 256 bits) binary hand printed digits (0, 1, 2, 3, 4, 5, 6, 7, 8, 9)
Classification algorithms on the cell processor
The rapid advancement in the capacity and reliability of data storage technology has allowed for the retention of virtually limitless quantity and detail of digital information. Massive information databases are becoming more and more widespread among governmental, educational, scientific, and commercial organizations. By segregating this data into carefully defined input (e.g.: images) and output (e.g.: classification labels) sets, a classification algorithm can be used develop an internal expert model of the data by employing a specialized training algorithm. A properly trained classifier is capable of predicting the output for future input data from the same input domain that it was trained on. Two popular classifiers are Neural Networks and Support Vector Machines. Both, as with most accurate classifiers, require massive computational resources to carry out the training step and can take months to complete when dealing with extremely large data sets. In most cases, utilizing larger training improves the final accuracy of the trained classifier. However, access to the kinds of computational resources required to do so is expensive and out of reach of private or under funded institutions. The Cell Broadband Engine (CBE), introduced by Sony, Toshiba, and IBM has recently been introduced into the market. The current most inexpensive iteration is available in the Sony Playstation 3 ® computer entertainment system. The CBE is a novel multi-core architecture which features many hardware enhancements designed to accelerate the processing of massive amounts of data. These characteristics and the cheap and widespread availability of this technology make the Cell a prime candidate for the task of training classifiers. In this work, the feasibility of the Cell processor in the use of training Neural Networks and Support Vector Machines was explored. In the Neural Network family of classifiers, the fully connected Multilayer Perceptron and Convolution Network were implemented. In the Support Vector Machine family, a Working Set technique known as the Gradient Projection-based Decomposition Technique, as well as the Cascade SVM were implemented
New methods for deep dictionary learning and for image completion
Digital imaging plays an essential role in many aspects of our daily life. However due to the hardware limitations of the imaging devices, the image measurements are usually inpaired and require further processing to enhance the quality of the raw images in order to enable applications on the user side.
Image enhancement aims to improve the information content within image measurements by exploiting the properties of the target image and the forward model of the imaging device.
In this thesis, we aim to tackle two specific image enhancement problems, that is, single image super-resolution and image completion.
First, we present a new Deep Analysis Dictionary Model (DeepAM) which consists of multiple layers of analysis dictionaries with associated soft-thresholding operators and a single layer of synthesis dictionary for single image super-resolution. To achieve an effective deep model, each analysis dictionary has been designed to be composed of an Information Preserving Analysis Dictionary (IPAD) which passes essential information from the input signal to output and a Clustering Analysis Dictionary (CAD) which generates discriminative feature representation. The parameters of the deep analysis dictionary model are optimized using a layer-wise learning strategy. We demonstrate that both the proposed deep dictionary design and the learning algorithm are effective. Simulation results show that the proposed method achieves comparable performance with Deep Neural Networks and other existing methods.
We then generalize DeepAM to a Deep Convolutional Analysis Dictionary Model (DeepCAM) by learning convolutional dictionaries instead of unstructured dictionaries.
The convolutional dictionary is more suitable for processing high-dimensional signals like images and has only a small number of free parameters. By exploiting the properties of a convolutional dictionary, we present an efficient convolutional analysis dictionary learning algorithm. The IPAD and the CAD parts are learned using variations of the proposed convolutional analysis dictionary learning algorithm.
We demonstrate that DeepCAM is an effective multi-layer convolutional model and achieves better performance than DeepAM while using a smaller number of parameters.
Finally, we present an image completion algorithm based on dense correspondence between the input image and an exemplar image retrieved from Internet which has been taken at a similar position. The dense correspondence which is estimated using a hierarchical PatchMatch algorithm is usually noisy and with a large occlusion area corresponding to the region to be completed. By modelling the dense correspondence as a smooth field, an Expectation-Maximization (EM) based method is presented to interpolate a smooth field over the occlusion area which is then used to transfer image content from the exemplar image to the input image. Color correction is further applied to diminish the possible color differences between the input image and the exemplar image. Numerical results demonstrate that the proposed image completion algorithm is able to achieve photo realistic image completion results.Open Acces
Efficient Fuel Consumption Minimization for Green Vehicle Routing Problems using a Hybrid Neural Network-Optimization Algorithm
Efficient routing optimization yields benefits that extend beyond mere financial
gains. In this thesis, we present a methodology that utilizes a graph convolutional neural network to facilitate the development of energy-efficient waste
collection routes. Our approach focuses on a Waste company in Tromsø, Remiks,
and uses real-life datasets, ensuring practicability and ease of implementation.
In particular, we extend the dpdp algorithm introduced by Kool et al. (2021) [1]
to minimize fuel consumption and devise routes that account for the impact of
elevation and real road distance traveled. Our findings shed light on the potential advantages and enhancements these optimized routes can offer Remiks,
including improved effectiveness and cost savings. Additionally, we identify
key areas for future research and development
- …