Search CORE

1,596 research outputs found

A spectral multi-resolution image encoding network

Author: Brause Rüdiger W.
Glitsch Jürgen
Publication venue
Publication date: 08/09/2010
Field of study

After a short introduction into traditional image transform coding, multirate systems and multiscale signal coding the paper focuses on the subject of image encoding by a neural network. Taking also noise into account a network model is proposed which not only learns the optimal localized basis functions for the transform but also learns to implement a whitening filter by multi-resolution encoding. A simulation showing the multi-resolution capabilitys concludes the contribution

SparseNN: An Energy-Efficient Neural Network Accelerator Exploiting Input and Output Sparsity

Author: Chen Xizi
Jiang Jingbo
Tsui Chi-Ying
Zhu Jingyang
Publication venue
Publication date: 03/11/2017
Field of study

Contemporary Deep Neural Network (DNN) contains millions of synaptic connections with tens to hundreds of layers. The large computation and memory requirements pose a challenge to the hardware design. In this work, we leverage the intrinsic activation sparsity of DNN to substantially reduce the execution cycles and the energy consumption. An end-to-end training algorithm is proposed to develop a lightweight run-time predictor for the output activation sparsity on the fly. From our experimental results, the computation overhead of the prediction phase can be reduced to less than 5% of the original feedforward phase with negligible accuracy loss. Furthermore, an energy-efficient hardware architecture, SparseNN, is proposed to exploit both the input and output sparsity. SparseNN is a scalable architecture with distributed memories and processing elements connected through a dedicated on-chip network. Compared with the state-of-the-art accelerators which only exploit the input sparsity, SparseNN can achieve a 10%-70% improvement in throughput and a power reduction of around 50%

arXiv.org e-Print Archive

Model compression as constrained optimization, with application to neural nets. Part II: quantization

Author: Carreira-Perpiñán Miguel Á.
Idelbayev Yerlan
Publication venue
Publication date: 13/07/2017
Field of study

We consider the problem of deep neural net compression by quantization: given a large, reference net, we want to quantize its real-valued weights using a codebook with

K

entries so that the training loss of the quantized net is minimal. The codebook can be optimally learned jointly with the net, or fixed, as for binarization or ternarization approaches. Previous work has quantized the weights of the reference net, or incorporated rounding operations in the backpropagation algorithm, but this has no guarantee of converging to a loss-optimal, quantized net. We describe a new approach based on the recently proposed framework of model compression as constrained optimization \citep{Carreir17a}. This results in a simple iterative "learning-compression" algorithm, which alternates a step that learns a net of continuous weights with a step that quantizes (or binarizes/ternarizes) the weights, and is guaranteed to converge to local optimum of the loss for quantized nets. We develop algorithms for an adaptive codebook or a (partially) fixed codebook. The latter includes binarization, ternarization, powers-of-two and other important particular cases. We show experimentally that we can achieve much higher compression rates than previous quantization work (even using just 1 bit per weight) with negligible loss degradation.Comment: 33 pages, 15 figures, 3 table

arXiv.org e-Print Archive

Automated Pruning for Deep Neural Network Compression

Author: Bianco Simone
Manessi Franco
Napoletano Paolo
Rozza Alessandro
Schettini Raimondo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 06/01/2019
Field of study

In this work we present a method to improve the pruning step of the current state-of-the-art methodology to compress neural networks. The novelty of the proposed pruning technique is in its differentiability, which allows pruning to be performed during the backpropagation phase of the network training. This enables an end-to-end learning and strongly reduces the training time. The technique is based on a family of differentiable pruning functions and a new regularizer specifically designed to enforce pruning. The experimental results show that the joint optimization of both the thresholds and the network weights permits to reach a higher compression rate, reducing the number of weights of the pruned network by a further 14% to 33% compared to the current state-of-the-art. Furthermore, we believe that this is the first study where the generalization capabilities in transfer learning tasks of the features extracted by a pruned network are analyzed. To achieve this goal, we show that the representations learned using the proposed pruning methodology maintain the same effectiveness and generality of those learned by the corresponding non-compressed network on a set of different recognition tasks.Comment: 8 pages, 5 figures. Published as a conference paper at ICPR 201

arXiv.org e-Print Archive

FPGA-Based Low-Power Speech Recognition with Recurrent Neural Networks

Author: Choi Sungwook
Hwang Kyuyeon
Lee Minjae
Park Jinhwan
Shin Sungho
Sung Wonyong
Publication venue
Publication date: 30/09/2016
Field of study

In this paper, a neural network based real-time speech recognition (SR) system is developed using an FPGA for very low-power operation. The implemented system employs two recurrent neural networks (RNNs); one is a speech-to-character RNN for acoustic modeling (AM) and the other is for character-level language modeling (LM). The system also employs a statistical word-level LM to improve the recognition accuracy. The results of the AM, the character-level LM, and the word-level LM are combined using a fairly simple N-best search algorithm instead of the hidden Markov model (HMM) based network. The RNNs are implemented using massively parallel processing elements (PEs) for low latency and high throughput. The weights are quantized to 6 bits to store all of them in the on-chip memory of an FPGA. The proposed algorithm is implemented on a Xilinx XC7Z045, and the system can operate much faster than real-time.Comment: Accepted to SiPS 201

arXiv.org e-Print Archive

Bitwise Neural Networks

Author: Kim Minje
Smaragdis Paris
Publication venue
Publication date: 22/01/2016
Field of study

Based on the assumption that there exists a neural network that efficiently represents a set of Boolean functions between all binary inputs and outputs, we propose a process for developing and deploying neural networks whose weight parameters, bias terms, input, and intermediate hidden layer output signals, are all binary-valued, and require only basic bit logic for the feedforward pass. The proposed Bitwise Neural Network (BNN) is especially suitable for resource-constrained environments, since it replaces either floating or fixed-point arithmetic with significantly more efficient bitwise operations. Hence, the BNN requires for less spatial complexity, less memory bandwidth, and less power consumption in hardware. In order to design such networks, we propose to add a few training schemes, such as weight compression and noisy backpropagation, which result in a bitwise network that performs almost as well as its corresponding real-valued network. We test the proposed network on the MNIST dataset, represented using binary features, and show that BNNs result in competitive performance while offering dramatic computational savings.Comment: This paper was presented at the International Conference on Machine Learning (ICML) Workshop on Resource-Efficient Machine Learning, Lille, France, Jul. 6-11, 201

arXiv.org e-Print Archive

Winner-Relaxing Self-Organizing Maps

Author: Claussen Jens Christian
Publication venue: 'MIT Press - Journals'
Publication date: 02/11/2004
Field of study

A new family of self-organizing maps, the Winner-Relaxing Kohonen Algorithm, is introduced as a generalization of a variant given by Kohonen in 1991. The magnification behaviour is calculated analytically. For the original variant a magnification exponent of 4/7 is derived; the generalized version allows to steer the magnification in the wide range from exponent 1/2 to 1 in the one-dimensional case, thus provides optimal mapping in the sense of information theory. The Winner Relaxing Algorithm requires minimal extra computations per learning step and is conveniently easy to implement.Comment: 14 pages (6 figs included). To appear in Neural Computatio

arXiv.org e-Print Archive

Rough Neural Networks Architecture For Improving Generalization In Pattern Recognition

Author: Ali Adlan Hanan Hassan
Publication venue
Publication date: 01/01/2004
Field of study

Neural networks are found to be attractive trainable machines for pattern recognition. The capability of these models to accommodate wide variety and variability of conditions, and the ability to imitate brain functions, make them popular research area. This research focuses on developing hybrid rough neural networks. These novel approaches are assumed to provide superior performance with respect to detection and automatic target recognition.In this thesis, hybrid architectures of rough set theory and neural networks have been investigated, developed, and implemented. The first hybrid approach provides novel neural network referred to as Rough Shared weight Neural Networks (RSNN). It uses the concept of approximation based on rough neurons to feature extraction, and experiences the methodology of weight sharing. The network stages are a feature extraction network, and a classification network. The extraction network is composed of rough neurons that accounts for the upper and lower approximations and embeds a membership function to replace ordinary activation functions. The neural network learns the rough set’s upper and lower approximations as feature extractors simultaneously with classification. The RSNN implements a novel approximation transform. The basic design for the network is provided together with the learning rules. The architecture provides a novel method to pattern recognition and is expected to be robust to any pattern recognition problem. The second hybrid approach is a two stand alone subsystems, referred to as Rough Neural Networks (RNN). The extraction network extracts detectors that represent pattern’s classes to be supplied to the classification network. It works as a filter for original distilled features based on equivalence relations and rough set reduction, while the second is responsible for classification of the outputs from the first system. The two approaches were applied to image pattern recognition problems. The RSNN was applied to automatic target recognition problem. The data is Synthetic Aperture Radar (SAR) image scenes of tanks, and background. The RSNN provides a novel methodology for designing nonlinear filters without prior knowledge of the problem domain. The RNN was used to detect patterns present in satellite image. A novel feature extraction algorithm was developed to extract the feature vectors. The algorithm enhances the recognition ability of the system compared to manual extraction and labeling of pattern classes. The performance of the rough backpropagation network is improved compared to backpropagation of the same architecture. The network has been designed to produce detection plane for the desired pattern. The hybrid approaches developed in this thesis provide novel techniques to recognition static and dynamic representation of patterns. In both domains the rough set theory improved generalization of the neural networks paradigms. The methodologies are theoretically robust to any pattern recognition problem, and are proved practically for image environments

Universiti Putra Malaysia Institutional Repository

A network of spiking neurons for computing sparse representations in an energy efficient way

Author: Chklovskii Dmitri B.
Genkin Alexander
Hu Tao
Publication venue
Publication date: 04/10/2012
Field of study

Computing sparse redundant representations is an important problem both in applied mathematics and neuroscience. In many applications, this problem must be solved in an energy efficient way. Here, we propose a hybrid distributed algorithm (HDA), which solves this problem on a network of simple nodes communicating via low-bandwidth channels. HDA nodes perform both gradient-descent-like steps on analog internal variables and coordinate-descent-like steps via quantized external variables communicated to each other. Interestingly, such operation is equivalent to a network of integrate-and-fire neurons, suggesting that HDA may serve as a model of neural computation. We show that the numerical performance of HDA is on par with existing algorithms. In the asymptotic regime the representation error of HDA decays with time, t, as 1/t. HDA is stable against time-varying noise, specifically, the representation error decays as 1/sqrt(t) for Gaussian white noise.Comment: 5 figures Early Access: http://www.mitpressjournals.org/doi/abs/10.1162/NECO_a_0035

arXiv.org e-Print Archive

A flexible, extensible software framework for model compression based on the LC algorithm

Author: Carreira-Perpiñán Miguel Á.
Idelbayev Yerlan
Publication venue
Publication date: 15/05/2020
Field of study

We propose a software framework based on the ideas of the Learning-Compression (LC) algorithm, that allows a user to compress a neural network or other machine learning model using different compression schemes with minimal effort. Currently, the supported compressions include pruning, quantization, low-rank methods (including automatically learning the layer ranks), and combinations of those, and the user can choose different compression types for different parts of a neural network. The LC algorithm alternates two types of steps until convergence: a learning (L) step, which trains a model on a dataset (using an algorithm such as SGD); and a compression (C) step, which compresses the model parameters (using a compression scheme such as low-rank or quantization). This decoupling of the "machine learning" aspect from the "signal compression" aspect means that changing the model or the compression type amounts to calling the corresponding subroutine in the L or C step, respectively. The library fully supports this by design, which makes it flexible and extensible. This does not come at the expense of performance: the runtime needed to compress a model is comparable to that of training the model in the first place; and the compressed model is competitive in terms of prediction accuracy and compression ratio with other algorithms (which are often specialized for specific models or compression schemes). The library is written in Python and PyTorch and available in Github.Comment: 15 pages, 4 figures, 2 table

arXiv.org e-Print Archive