820 research outputs found
Paoding: Supervised Robustness-preserving Data-free Neural Network Pruning
When deploying pre-trained neural network models in real-world applications,
model consumers often encounter resource-constraint platforms such as mobile
and smart devices. They typically use the pruning technique to reduce the size
and complexity of the model, generating a lighter one with less resource
consumption. Nonetheless, most existing pruning methods are proposed with a
premise that the model after being pruned has a chance to be fine-tuned or even
retrained based on the original training data. This may be unrealistic in
practice, as the data controllers are often reluctant to provide their model
consumers with the original data. In this work, we study the neural network
pruning in the \emph{data-free} context, aiming to yield lightweight models
that are not only accurate in prediction but also robust against undesired
inputs in open-world deployments. Considering the absence of the fine-tuning
and retraining that can fix the mis-pruned units, we replace the traditional
aggressive one-shot strategy with a conservative one that treats the pruning as
a progressive process. We propose a pruning method based on stochastic
optimization that uses robustness-related metrics to guide the pruning process.
Our method is implemented as a Python package named \textsc{Paoding} and
evaluated with a series of experiments on diverse neural network models. The
experimental results show that it significantly outperforms existing one-shot
data-free pruning approaches in terms of robustness preservation and accuracy
Quantized Neural Networks and Neuromorphic Computing for Embedded Systems
Deep learning techniques have made great success in areas such as computer vision, speech recognition and natural language processing. Those breakthroughs made by deep learning techniques are changing every aspect of our lives. However, deep learning techniques have not realized their full potential in embedded systems such as mobiles, vehicles etc. because the high performance of deep learning techniques comes at the cost of high computation resource and energy consumption. Therefore, it is very challenging to deploy deep learning models in embedded systems because such systems have very limited computation resources and power constraints. Extensive research on deploying deep learning techniques in embedded systems has been conducted and considerable progress has been made. In this book chapter, we are going to introduce two approaches. The first approach is model compression, which is one of the very popular approaches proposed in recent years. Another approach is neuromorphic computing, which is a novel computing system that mimicks the human brain
An investigation into adaptive power reduction techniques for neural hardware
In light of the growing applicability of Artificial Neural Network (ANN) in the signal processing field [1] and the present thrust of the semiconductor industry towards lowpower SOCs for mobile devices [2], the power consumption of ANN hardware has become a very important implementation issue. Adaptability is a powerful and useful feature of neural networks. All current approaches for low-power ANN hardware techniques are ‘non-adaptive’ with respect to the power consumption of the network (i.e. power-reduction is not an objective of the adaptation/learning process). In the research work presented in this thesis, investigations on possible adaptive power reduction techniques have been carried out, which attempt to exploit the adaptability of neural networks in order to reduce the power consumption. Three separate approaches for such adaptive power reduction are proposed: adaptation of size, adaptation of network weights and adaptation of calculation precision. Initial case studies exhibit promising results with significantpower reduction
Nature of the learning algorithms for feedforward neural networks
The neural network model (NN) comprised of relatively simple computing elements, operating in parallel, offers an attractive and versatile framework for exploring a variety of learning
structures and processes for intelligent systems. Due to the amount of research developed in
the area many types of networks have been defined. The one of interest here is the multi-layer
perceptron as it is one of the simplest and it is considered a powerful representation tool whose
complete potential has not been adequately exploited and whose limitations need yet to be
specified in a formal and coherent framework. This dissertation addresses the theory of generalisation performance and architecture selection for the multi-layer perceptron; a subsidiary
aim is to compare and integrate this model with existing data analysis techniques and exploit
its potential by combining it with certain constructs from computational geometry creating a
reliable, coherent network design process which conforms to the characteristics of a generative
learning algorithm, ie. one including mechanisms for manipulating the connections and/or
units that comprise the architecture in addition to the procedure for updating the weights of
the connections. This means that it is unnecessary to provide an initial network as input to
the complete training process.After discussing in general terms the motivation for this study, the multi-layer perceptron
model is introduced and reviewed, along with the relevant supervised training algorithm, ie.
backpropagation. More particularly, it is argued that a network developed employing this model
can in general be trained and designed in a much better way by extracting more information
about the domains of interest through the application of certain geometric constructs in a preprocessing stage, specifically by generating the Voronoi Diagram and Delaunav Triangulation
[Okabe et al. 92] of the set of points comprising the training set and once a final architecture which performs appropriately on it has been obtained, Principal Component Analysis
[Jolliffe 86] is applied to the outputs produced by the units in the network's hidden layer to
eliminate the redundant dimensions of this space
Online Tool Condition Monitoring Based on Parsimonious Ensemble+
Accurate diagnosis of tool wear in metal turning process remains an open
challenge for both scientists and industrial practitioners because of
inhomogeneities in workpiece material, nonstationary machining settings to suit
production requirements, and nonlinear relations between measured variables and
tool wear. Common methodologies for tool condition monitoring still rely on
batch approaches which cannot cope with a fast sampling rate of metal cutting
process. Furthermore they require a retraining process to be completed from
scratch when dealing with a new set of machining parameters. This paper
presents an online tool condition monitoring approach based on Parsimonious
Ensemble+, pENsemble+. The unique feature of pENsemble+ lies in its highly
flexible principle where both ensemble structure and base-classifier structure
can automatically grow and shrink on the fly based on the characteristics of
data streams. Moreover, the online feature selection scenario is integrated to
actively sample relevant input attributes. The paper presents advancement of a
newly developed ensemble learning algorithm, pENsemble+, where online active
learning scenario is incorporated to reduce operator labelling effort. The
ensemble merging scenario is proposed which allows reduction of ensemble
complexity while retaining its diversity. Experimental studies utilising
real-world manufacturing data streams and comparisons with well known
algorithms were carried out. Furthermore, the efficacy of pENsemble was
examined using benchmark concept drift data streams. It has been found that
pENsemble+ incurs low structural complexity and results in a significant
reduction of operator labelling effort.Comment: this paper has been published by IEEE Transactions on Cybernetic
Automated Circuit Approximation Method Driven by Data Distribution
We propose an application-tailored data-driven fully automated method for
functional approximation of combinational circuits. We demonstrate how an
application-level error metric such as the classification accuracy can be
translated to a component-level error metric needed for an efficient and fast
search in the space of approximate low-level components that are used in the
application. This is possible by employing a weighted mean error distance
(WMED) metric for steering the circuit approximation process which is conducted
by means of genetic programming. WMED introduces a set of weights (calculated
from the data distribution measured on a selected signal in a given
application) determining the importance of each input vector for the
approximation process. The method is evaluated using synthetic benchmarks and
application-specific approximate MAC (multiply-and-accumulate) units that are
designed to provide the best trade-offs between the classification accuracy and
power consumption of two image classifiers based on neural networks.Comment: Accepted for publication at Design, Automation and Test in Europe
(DATE 2019). Florence, Ital
Deep Learning fast inference on FPGA for CMS Muon Level-1 Trigger studies
With the advent of the High-Luminosity phase of the LHC (HL-LHC), the instantaneous luminosity of the Large Hadron Collider at CERN is expected to increase up to ≈7.5⋅1034cm−2s−1. Therefore, new strategies for data acquisition and processing will be necessary, in preparation for the higher number of signals produced inside the detectors. In the context of an upgrade of the trigger system of the Compact Muon Solenoid (CMS), new reconstruction algorithms, aiming for an improved performance, are being developed. For what concerns the online tracking of muons, one of the figures that is being improved is the accuracy of the transverse momentum (pT) measurement. Machine Learning techniques have already been considered as a promising solution for this problem, as they make possible, with the use of more information collected by the detector, to build models able to predict with an improved precision the pT. This work aims to implement such models onto an FPGA, which promises smaller latency with respect to traditional inference algorithms running on CPU, an important aspect for a trigger system. The analysis carried out in this work will use data obtained through Monte Carlo simulations of muons crossing the barrel region of the CMS muon chambers, and compare the results with the pT assigned by the current CMS Level 1 Barrel Muon Track Finder (BMTF) trigger system
- …