20 research outputs found

    Finding Non-Uniform Quantization Schemes using Multi-Task Gaussian Processes

    Full text link
    We propose a novel method for neural network quantization that casts the neural architecture search problem as one of hyperparameter search to find non-uniform bit distributions throughout the layers of a CNN. We perform the search assuming a Multi-Task Gaussian Processes prior, which splits the problem to multiple tasks, each corresponding to different number of training epochs, and explore the space by sampling those configurations that yield maximum information. We then show that with significantly lower precision in the last layers we achieve a minimal loss of accuracy with appreciable memory savings. We test our findings on the CIFAR10 and ImageNet datasets using the VGG, ResNet and GoogLeNet architectures.Comment: Accepted for publication at ECCV 2020. Code availiable at https://code.active.vision . Updated for typ

    Constrained image generation using binarized neural networks with decision procedures

    No full text
    We consider the problem of binary image generation with given properties. This problem arises in a number of practical applications, including generation of artificial porous medium for an electrode of lithium-ion batteries, for composed materials, etc. A generated image represents a porous medium and, as such, it is subject to two sets of constraints: topological constraints on the structure and process constraints on the physical process over this structure. To perform image generation we need to define a mapping from a porous medium to its physical process parameters. For a given geometry of a porous medium, this mapping can be done by solving a partial differential equation (PDE). However, embedding a PDE solver into the search procedure is computationally expensive. We use a binarized neural network to approximate a PDE solver. This allows us to encode the entire problem as a logical formula. Our main contribution is that, for the first time, we show that this problem can be tackled using decision procedures. Our experiments show that our model is able to produce random constrained images that satisfy both topological and process constraints

    Dynamic bit-width reconfiguration for energy-efficient deep learning hardware

    No full text
    Deep learning models have reached state of the art performance in many machine learning tasks. Benefits in terms of energy, bandwidth, latency, etc., can be obtained by evaluating these models directly within Internet of Things end nodes, rather than in the cloud. This calls for implementations of deep learning tasks that can run in resource limited environments with low energy footprints. Research and industry have recently investigated these aspects, coming up with specialized hardware accelerators for low power deep learning. One effective technique adopted in these devices consists in reducing the bit-width of calculations, exploiting the error resilience of deep learning. However, bit-widths are tipically set statically for a given model, regardless of input data. Unless models are retrained, this solution invariably sacrifices accuracy for energy efficiency. In this paper, we propose a new approach for implementing input-dependant dynamic bit-width reconfiguration in deep learning accelerators. Our method is based on a fully automatic characterization phase, and can be applied to popular models without retraining. Using the energy data from a real deep learning accelerator chip, we show that 50% energy reduction can be achieved with respect to a static bit-width selection, with less than 1% accuracy loss

    Dynamic Beam Width Tuning for Energy-Efficient Recurrent Neural Networks

    Get PDF
    Recurrent Neural Networks (RNNs) are state-of-the-art models for many machine learning tasks, such as language modeling and machine translation. Executing the inference phase of a RNN directly in edge nodes, rather than in the cloud, would provide benefits in terms of energy consumption, latency and network bandwidth, provided that models can be made efficient enough to run on energy-constrained embedded devices. To this end, we propose an algorithmic optimization for improving the energy efficiency of encoder-decoder RNNs. Our method operates on the Beam Width (BW), i.e. one of the parameters that most influences inference complexity, modulating it depending on the currently processed input based on a metric of the network's "confidence". Results on two different machine translation models show that our method is able to reduce the average BW by up to 33%, thus significantly reducing the inference execution time and energy consumption, while maintaining the same translation performance

    On the area scalability of valence-change memristors for neuromorphic computing

    No full text
    The ability to vary the conductance of a valence-change memristor in a continuous manner makes it a prime choice as an artificial synapse in neuromorphic systems. Because synapses are the most numerous components in the brain, exceeding the neurons by several orders of magnitude, the scalability of artificial synapses is crucial to the development of large scale neuromorphic systems but is an issue which is seldom investigated. Leveraging on the conductive atomic force microscopy method, we found that the conductance switching of nanoscale memristors (∼25 nm2) is abrupt in a majority of the cases examined. This behavior is contrary to the analoglike conductance modulation or plasticity typically observed in larger area memristors. The result therefore implies that plasticity may be lost when the device dimension is scaled down. The contributing factor behind the plasticity behavior of a large-area memristor was investigated by current mapping, and may be ascribed to the disruption of the plurality of conductive filaments happening at different voltages, thus yielding an apparent continuous change in conductance with voltage. The loss of plasticity in scaled memristors may pose a serious constraint to the development of large scale neuromorphic systems.Published versio

    Fast secure comparison for medium-sized integers and its application in binarized neural networks

    Get PDF
    In 1994, Feige, Kilian, and Naor proposed a simple protocol for secure 3-way comparison of integers a and b from the range [0, 2]. Their observation is that for (Formula Presented), the Legendre symbol (Formula Presented) coincides with the sign of x for (Formula Presented), thus reducing secure comparison to secure evaluation of the Legendre symbol. More recently, in 2011, Yu generalized this idea to handle secure comparisons for integers from substantially larger ranges [0, d], essentially by searching for primes for which the Legendre symbol coincides with the sign function on (Formula Presented). In this paper, we present new comparison protocols based on the Legendre symbol that additionally employ some form of error correction. We relax the prime search by requiring that the Legendre symbol encodes the sign function in a noisy fashion only. Practically, we use the majority vote over a window of (Formula Presented) adjacent Legendre symbols, for small positive integers k. Our technique significantly increases the comparison range: e.g., for a modulus of 60 bits, d increases by a factor of 2.8 (for (Formula Presented)) and 3.8 (for (Formula Presented)) respectively. We give a practical method to find primes with suitable noisy encodings. We demonstrate the practical relevance of our comparison protocol by applying it in a secure neural network classifier for the MNIST dataset. Concretely, we discuss a secure multiparty computation based on the binarized multi-layer perceptron of Hubara et al., using our comparison for the second and third layers

    U-net fixed-point quantization for medical image segmentation

    No full text
    Model quantization is leveraged to reduce the memory consumption and the computation time of deep neural networks. This is achieved by representing weights and activations with a lower bit resolution when compared to their high precision floating point counterparts. The suitable level of quantization is directly related to the model performance. Lowering the quantization precision (e.g. 2 bits), reduces the amount of memory required to store model parameters and the amount of logic required to implement computational blocks, which contributes to reducing the power consumption of the entire system. These benefits typically come at the cost of reduced accuracy. The main challenge is to quantize a network as much as possible, while maintaining the performance accuracy. In this work, we present a quantization method for the U-Net architecture, a popular model in medical image segmentation. We then apply our quantization algorithm to three datasets: (1) the Spinal Cord Gray Matter Segmentation (GM), (2) the ISBI challenge for segmentation of neuronal structures in Electron Microscopic (EM), and (3) the public National Institute of Health (NIH) dataset for pancreas segmentation in abdominal CT scans. The reported results demonstrate that with only 4 bits for weights and 6 bits for activations, we obtain 8 fold reduction in memory requirements while loosing only 2.21%, 0.57% and 2.09% dice overlap score for EM, GM and NIH datasets respectively. Our fixed point quantization provides a flexible trade off between accuracy and memory requirement which is not provided by previous quantization methods for U-Net such as TernaryNet.Comment: Accepted to MICCAI 2019's Hardware Aware Learning for Medical Imaging and Computer Assisted Interventio
    corecore