34,303 research outputs found

    Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications

    Full text link
    Although the latest high-end smartphone has powerful CPU and GPU, running deeper convolutional neural networks (CNNs) for complex tasks such as ImageNet classification on mobile devices is challenging. To deploy deep CNNs on mobile devices, we present a simple and effective scheme to compress the entire CNN, which we call one-shot whole network compression. The proposed scheme consists of three steps: (1) rank selection with variational Bayesian matrix factorization, (2) Tucker decomposition on kernel tensor, and (3) fine-tuning to recover accumulated loss of accuracy, and each step can be easily implemented using publicly available tools. We demonstrate the effectiveness of the proposed scheme by testing the performance of various compressed CNNs (AlexNet, VGGS, GoogLeNet, and VGG-16) on the smartphone. Significant reductions in model size, runtime, and energy consumption are obtained, at the cost of small loss in accuracy. In addition, we address the important implementation level issue on 1?1 convolution, which is a key operation of inception module of GoogLeNet as well as CNNs compressed by our proposed scheme

    TTHRESH: Tensor Compression for Multidimensional Visual Data

    Full text link
    Memory and network bandwidth are decisive bottlenecks when handling high-resolution multidimensional data sets in visualization applications, and they increasingly demand suitable data compression strategies. We introduce a novel lossy compression algorithm for multidimensional data over regular grids. It leverages the higher-order singular value decomposition (HOSVD), a generalization of the SVD to three dimensions and higher, together with bit-plane, run-length and arithmetic coding to compress the HOSVD transform coefficients. Our scheme degrades the data particularly smoothly and achieves lower mean squared error than other state-of-the-art algorithms at low-to-medium bit rates, as it is required in data archiving and management for visualization purposes. Further advantages of the proposed algorithm include very fine bit rate selection granularity and the ability to manipulate data at very small cost in the compression domain, for example to reconstruct filtered and/or subsampled versions of all (or selected parts) of the data set

    Efficient Decomposition of High-Rank Tensors

    Full text link
    Tensors are a natural way to express correlations among many physical variables, but storing tensors in a computer naively requires memory which scales exponentially in the rank of the tensor. This is not optimal, as the required memory is actually set not by the rank but by the mutual information amongst the variables in question. Representations such as the tensor tree perform near-optimally when the tree decomposition is chosen to reflect the correlation structure in question, but making such a choice is non-trivial and good heuristics remain highly context-specific. In this work I present two new algorithms for choosing efficient tree decompositions, independent of the physical context of the tensor. The first is a brute-force algorithm which produces optimal decompositions up to truncation error but is generally impractical for high-rank tensors, as the number of possible choices grows exponentially in rank. The second is a greedy algorithm, and while it is not optimal it performs extremely well in numerical experiments while having runtime which makes it practical even for tensors of very high rank.Comment: 10 pages, 15 figures. Updated to match published version in J Comp Phy

    Compression of animated 3D models using HO-SVD

    Full text link
    This work presents an analysis of Higher Order Singular Value Decomposition (HO-SVD) applied to lossy compression of 3D mesh animations. We describe strategies for choosing a number of preserved spatial and temporal components after tensor decomposition. Compression error is measured using three metrics (MSE, Hausdorff, MSDM). Results are compared with a method based on Principal Component Analysis (PCA) and presented on a set of animations with typical mesh deformations.Comment: 15 pages, 11 figure

    Memory footprint reduction for the FFT-based volume integral equation method via tensor decompositions

    Full text link
    We present a method of memory footprint reduction for FFT-based, electromagnetic (EM) volume integral equation (VIE) formulations. The arising Green's function tensors have low multilinear rank, which allows Tucker decomposition to be employed for their compression, thereby greatly reducing the required memory storage for numerical simulations. Consequently, the compressed components are able to fit inside a graphical processing unit (GPU) on which highly parallelized computations can vastly accelerate the iterative solution of the arising linear system. In addition, the element-wise products throughout the iterative solver's process require additional flops, thus, we provide a variety of novel and efficient methods that maintain the linear complexity of the classic element-wise product with an additional multiplicative small constant. We demonstrate the utility of our approach via its application to VIE simulations for the Magnetic Resonance Imaging (MRI) of a human head. For these simulations we report an order of magnitude acceleration over standard techniques.Comment: 11 pages, 10 figures, 5 tables, 2 algorithms, journa

    VLSI Friendly Framework for Scalable Video Coding based on Compressed Sensing

    Full text link
    This paper presents a new VLSI friendly framework for scalable video coding based on Compressed Sensing (CS). It achieves scalability through 3-Dimensional Discrete Wavelet Transform (3-D DWT) and better compression ratio by exploiting the inherent sparsity of the high-frequency wavelet sub-bands through CS. By using 3-D DWT and a proposed adaptive measurement scheme called AMS at the encoder, one can succeed in improving the compression ratio and reducing the complexity of the decoder. The proposed video codec uses only 7% of the total number of multipliers needed in a conventional CS-based video coding system. A codebook of Bernoulli matrices with different sizes corresponding to the predefined sparsity levels is maintained at both the encoder and the decoder. Based on the calculated l0-norm of the input vector, one of the sixteen possible Bernoulli matrices will be selected for taking the CS measurements and its index will be transmitted along with the measurements. Based on this index, the corresponding Bernoulli matrix has been used in CS reconstruction algorithm to get back the high-frequency wavelet sub-bands at the decoder. At the decoder, a new Enhanced Approximate Message Passing (EAMP) algorithm has been proposed to reconstruct the wavelet coefficients and apply the inverse wavelet transform for restoring back the video frames. Simulation results have established the superiority of the proposed framework over the existing schemes and have increased its suitability for VLSI implementation. Moreover, the coded video is found to be scalable with an increase in a number of levels of wavelet decomposition

    Image Compression with Iterated Function Systems, Finite Automata and Zerotrees: Grand Unification

    Full text link
    Fractal image compression, Culik's image compression and zerotree prediction coding of wavelet image decomposition coefficients succeed only because typical images being compressed possess a significant degree of self-similarity. Besides the common concept, these methods turn out to be even more tightly related, to the point of algorithmical reducibility of one technique to another. The goal of the present paper is to demonstrate these relations. The paper offers a plain-term interpretation of Culik's image compression, in regular image processing terms, without resorting to finite state machines and similar lofty language. The interpretation is shown to be algorithmically related to an IFS fractal image compression method: an IFS can be exactly transformed into Culik's image code. Using this transformation, we will prove that in a self-similar (part of an) image any zero wavelet coefficient is the root of a zerotree, or its branch. The paper discusses the zerotree coding of (wavelet/projection) coefficients as a common predictor/corrector, applied vertically through different layers of a multiresolutional decomposition, rather than within the same view. This interpretation leads to an insight into the evolution of image compression techniques: from a causal single-layer prediction, to non-causal same-view predictions (wavelet decomposition among others) and to a causal cross-layer prediction (zero-trees, Culik's method).Comment: This is a full paper submitted to Data Compression Conference '96; 10 pages; The abstract of this paper was published in Proc. DCC'96: Data Compression Conference, March 31 - April 3, 1996, Snowbird, Utah, IEEE Computer Society Press, Los Alamitos, California, 1996, p.44

    Spectral Stiffness Microplane Model for Quasibrittle Textile Composites

    Full text link
    The present contribution proposes a general constitutive model to simulate the orthotropic stiffness, pre-peak nonlinearity, failure envelopes, and the post-peak softening and fracture of textile composites. Following the microplane model framework, the constitutive laws are formulated in terms of stress and strain vectors acting on planes of several orientations within the material meso-structure. The model exploits the spectral decomposition of the orthotropic stiffness tensor to define orthogonal strain modes at the microplane level. These are associated to the various constituents at the mesoscale and to the material response to different types of deformation. Strain-dependent constitutive equations are used to relate the microplane eigenstresses and eigenstrains while a variational principle is applied to relate the microplane stresses at the mesoscale to the continuum tensor at the macroscale. Thanks to these features, the resulting spectral stiffness microplane formulation can easily capture various physical inelastic phenomena typical of fiber and textile composites such as: matrix microcracking, micro-delamination, crack bridging, pullout, and debonding. The application of the model to a twill 2Ă—\times2 shows that it can realistically predict its uniaxial as well as multi-axial behavior. Furthermore, the model shows excellent agreement with experiments on the axial crushing of composite tubes, this capability making it a valuable design tool for crashworthiness applications. The formulation is computationally efficient, easy to calibrate and adaptable to other kinds of composite architectures of great current interest such as 2D and 3D braids or 3D woven textiles

    Online adaptive basis refinement and compression for reduced-order models via vector-space sieving

    Full text link
    In many applications, projection-based reduced-order models (ROMs) have demonstrated the ability to provide rapid approximate solutions to high-fidelity full-order models (FOMs). However, there is no a priori assurance that these approximate solutions are accurate; their accuracy depends on the ability of the low-dimensional trial basis to represent the FOM solution. As a result, ROMs can generate inaccurate approximate solutions, e.g., when the FOM solution at the online prediction point is not well represented by training data used to construct the trial basis. To address this fundamental deficiency of standard model-reduction approaches, this work proposes a novel online-adaptive mechanism for efficiently enriching the trial basis in a manner that ensures convergence of the ROM to the FOM, yet does not incur any FOM solves. The mechanism is based on the previously proposed adaptive hh-refinement method for ROMs [12], but improves upon this work in two crucial ways. First, the proposed method enables basis refinement with respect to any orthogonal basis (not just the Kronecker basis), thereby generalizing the refinement mechanism and enabling it to be tailored to the physics characterizing the problem at hand. Second, the proposed method provides a fast online algorithm for periodically compressing the enriched basis via an efficient proper orthogonal decomposition (POD) method, which does not incur any operations that scale with the FOM dimension. These two features allow the proposed method to serve as (1) a failsafe mechanism for ROMs, as the method enables the ROM to satisfy any prescribed error tolerance online (even in the case of inadequate training), and (2) an efficient online basis-adaptation mechanism, as the combination of basis enrichment and compression enables the basis to adapt online while controlling its dimension

    Tensorizing Neural Networks

    Full text link
    Deep neural networks currently demonstrate state-of-the-art performance in several domains. At the same time, models of this class are very demanding in terms of computational resources. In particular, a large amount of memory is required by commonly used fully-connected layers, making it hard to use the models on low-end devices and stopping the further increase of the model size. In this paper we convert the dense weight matrices of the fully-connected layers to the Tensor Train format such that the number of parameters is reduced by a huge factor and at the same time the expressive power of the layer is preserved. In particular, for the Very Deep VGG networks we report the compression factor of the dense weight matrix of a fully-connected layer up to 200000 times leading to the compression factor of the whole network up to 7 times
    • …
    corecore