34,303 research outputs found
Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications
Although the latest high-end smartphone has powerful CPU and GPU, running
deeper convolutional neural networks (CNNs) for complex tasks such as ImageNet
classification on mobile devices is challenging. To deploy deep CNNs on mobile
devices, we present a simple and effective scheme to compress the entire CNN,
which we call one-shot whole network compression. The proposed scheme consists
of three steps: (1) rank selection with variational Bayesian matrix
factorization, (2) Tucker decomposition on kernel tensor, and (3) fine-tuning
to recover accumulated loss of accuracy, and each step can be easily
implemented using publicly available tools. We demonstrate the effectiveness of
the proposed scheme by testing the performance of various compressed CNNs
(AlexNet, VGGS, GoogLeNet, and VGG-16) on the smartphone. Significant
reductions in model size, runtime, and energy consumption are obtained, at the
cost of small loss in accuracy. In addition, we address the important
implementation level issue on 1?1 convolution, which is a key operation of
inception module of GoogLeNet as well as CNNs compressed by our proposed
scheme
TTHRESH: Tensor Compression for Multidimensional Visual Data
Memory and network bandwidth are decisive bottlenecks when handling
high-resolution multidimensional data sets in visualization applications, and
they increasingly demand suitable data compression strategies. We introduce a
novel lossy compression algorithm for multidimensional data over regular grids.
It leverages the higher-order singular value decomposition (HOSVD), a
generalization of the SVD to three dimensions and higher, together with
bit-plane, run-length and arithmetic coding to compress the HOSVD transform
coefficients. Our scheme degrades the data particularly smoothly and achieves
lower mean squared error than other state-of-the-art algorithms at
low-to-medium bit rates, as it is required in data archiving and management for
visualization purposes. Further advantages of the proposed algorithm include
very fine bit rate selection granularity and the ability to manipulate data at
very small cost in the compression domain, for example to reconstruct filtered
and/or subsampled versions of all (or selected parts) of the data set
Efficient Decomposition of High-Rank Tensors
Tensors are a natural way to express correlations among many physical
variables, but storing tensors in a computer naively requires memory which
scales exponentially in the rank of the tensor. This is not optimal, as the
required memory is actually set not by the rank but by the mutual information
amongst the variables in question. Representations such as the tensor tree
perform near-optimally when the tree decomposition is chosen to reflect the
correlation structure in question, but making such a choice is non-trivial and
good heuristics remain highly context-specific. In this work I present two new
algorithms for choosing efficient tree decompositions, independent of the
physical context of the tensor. The first is a brute-force algorithm which
produces optimal decompositions up to truncation error but is generally
impractical for high-rank tensors, as the number of possible choices grows
exponentially in rank. The second is a greedy algorithm, and while it is not
optimal it performs extremely well in numerical experiments while having
runtime which makes it practical even for tensors of very high rank.Comment: 10 pages, 15 figures. Updated to match published version in J Comp
Phy
Compression of animated 3D models using HO-SVD
This work presents an analysis of Higher Order Singular Value Decomposition
(HO-SVD) applied to lossy compression of 3D mesh animations. We describe
strategies for choosing a number of preserved spatial and temporal components
after tensor decomposition. Compression error is measured using three metrics
(MSE, Hausdorff, MSDM). Results are compared with a method based on Principal
Component Analysis (PCA) and presented on a set of animations with typical mesh
deformations.Comment: 15 pages, 11 figure
Memory footprint reduction for the FFT-based volume integral equation method via tensor decompositions
We present a method of memory footprint reduction for FFT-based,
electromagnetic (EM) volume integral equation (VIE) formulations. The arising
Green's function tensors have low multilinear rank, which allows Tucker
decomposition to be employed for their compression, thereby greatly reducing
the required memory storage for numerical simulations. Consequently, the
compressed components are able to fit inside a graphical processing unit (GPU)
on which highly parallelized computations can vastly accelerate the iterative
solution of the arising linear system. In addition, the element-wise products
throughout the iterative solver's process require additional flops, thus, we
provide a variety of novel and efficient methods that maintain the linear
complexity of the classic element-wise product with an additional
multiplicative small constant. We demonstrate the utility of our approach via
its application to VIE simulations for the Magnetic Resonance Imaging (MRI) of
a human head. For these simulations we report an order of magnitude
acceleration over standard techniques.Comment: 11 pages, 10 figures, 5 tables, 2 algorithms, journa
VLSI Friendly Framework for Scalable Video Coding based on Compressed Sensing
This paper presents a new VLSI friendly framework for scalable video coding
based on Compressed Sensing (CS). It achieves scalability through 3-Dimensional
Discrete Wavelet Transform (3-D DWT) and better compression ratio by exploiting
the inherent sparsity of the high-frequency wavelet sub-bands through CS. By
using 3-D DWT and a proposed adaptive measurement scheme called AMS at the
encoder, one can succeed in improving the compression ratio and reducing the
complexity of the decoder. The proposed video codec uses only 7% of the total
number of multipliers needed in a conventional CS-based video coding system. A
codebook of Bernoulli matrices with different sizes corresponding to the
predefined sparsity levels is maintained at both the encoder and the decoder.
Based on the calculated l0-norm of the input vector, one of the sixteen
possible Bernoulli matrices will be selected for taking the CS measurements and
its index will be transmitted along with the measurements. Based on this index,
the corresponding Bernoulli matrix has been used in CS reconstruction algorithm
to get back the high-frequency wavelet sub-bands at the decoder. At the
decoder, a new Enhanced Approximate Message Passing (EAMP) algorithm has been
proposed to reconstruct the wavelet coefficients and apply the inverse wavelet
transform for restoring back the video frames. Simulation results have
established the superiority of the proposed framework over the existing schemes
and have increased its suitability for VLSI implementation. Moreover, the coded
video is found to be scalable with an increase in a number of levels of wavelet
decomposition
Image Compression with Iterated Function Systems, Finite Automata and Zerotrees: Grand Unification
Fractal image compression, Culik's image compression and zerotree prediction
coding of wavelet image decomposition coefficients succeed only because typical
images being compressed possess a significant degree of self-similarity.
Besides the common concept, these methods turn out to be even more tightly
related, to the point of algorithmical reducibility of one technique to
another. The goal of the present paper is to demonstrate these relations.
The paper offers a plain-term interpretation of Culik's image compression, in
regular image processing terms, without resorting to finite state machines and
similar lofty language. The interpretation is shown to be algorithmically
related to an IFS fractal image compression method: an IFS can be exactly
transformed into Culik's image code. Using this transformation, we will prove
that in a self-similar (part of an) image any zero wavelet coefficient is the
root of a zerotree, or its branch.
The paper discusses the zerotree coding of (wavelet/projection) coefficients
as a common predictor/corrector, applied vertically through different layers of
a multiresolutional decomposition, rather than within the same view. This
interpretation leads to an insight into the evolution of image compression
techniques: from a causal single-layer prediction, to non-causal same-view
predictions (wavelet decomposition among others) and to a causal cross-layer
prediction (zero-trees, Culik's method).Comment: This is a full paper submitted to Data Compression Conference '96; 10
pages; The abstract of this paper was published in Proc. DCC'96: Data
Compression Conference, March 31 - April 3, 1996, Snowbird, Utah, IEEE
Computer Society Press, Los Alamitos, California, 1996, p.44
Spectral Stiffness Microplane Model for Quasibrittle Textile Composites
The present contribution proposes a general constitutive model to simulate
the orthotropic stiffness, pre-peak nonlinearity, failure envelopes, and the
post-peak softening and fracture of textile composites. Following the
microplane model framework, the constitutive laws are formulated in terms of
stress and strain vectors acting on planes of several orientations within the
material meso-structure. The model exploits the spectral decomposition of the
orthotropic stiffness tensor to define orthogonal strain modes at the
microplane level. These are associated to the various constituents at the
mesoscale and to the material response to different types of deformation.
Strain-dependent constitutive equations are used to relate the microplane
eigenstresses and eigenstrains while a variational principle is applied to
relate the microplane stresses at the mesoscale to the continuum tensor at the
macroscale. Thanks to these features, the resulting spectral stiffness
microplane formulation can easily capture various physical inelastic phenomena
typical of fiber and textile composites such as: matrix microcracking,
micro-delamination, crack bridging, pullout, and debonding. The application of
the model to a twill 22 shows that it can realistically predict its
uniaxial as well as multi-axial behavior. Furthermore, the model shows
excellent agreement with experiments on the axial crushing of composite tubes,
this capability making it a valuable design tool for crashworthiness
applications. The formulation is computationally efficient, easy to calibrate
and adaptable to other kinds of composite architectures of great current
interest such as 2D and 3D braids or 3D woven textiles
Online adaptive basis refinement and compression for reduced-order models via vector-space sieving
In many applications, projection-based reduced-order models (ROMs) have
demonstrated the ability to provide rapid approximate solutions to
high-fidelity full-order models (FOMs). However, there is no a priori assurance
that these approximate solutions are accurate; their accuracy depends on the
ability of the low-dimensional trial basis to represent the FOM solution. As a
result, ROMs can generate inaccurate approximate solutions, e.g., when the FOM
solution at the online prediction point is not well represented by training
data used to construct the trial basis. To address this fundamental deficiency
of standard model-reduction approaches, this work proposes a novel
online-adaptive mechanism for efficiently enriching the trial basis in a manner
that ensures convergence of the ROM to the FOM, yet does not incur any FOM
solves. The mechanism is based on the previously proposed adaptive
-refinement method for ROMs [12], but improves upon this work in two crucial
ways. First, the proposed method enables basis refinement with respect to any
orthogonal basis (not just the Kronecker basis), thereby generalizing the
refinement mechanism and enabling it to be tailored to the physics
characterizing the problem at hand. Second, the proposed method provides a fast
online algorithm for periodically compressing the enriched basis via an
efficient proper orthogonal decomposition (POD) method, which does not incur
any operations that scale with the FOM dimension. These two features allow the
proposed method to serve as (1) a failsafe mechanism for ROMs, as the method
enables the ROM to satisfy any prescribed error tolerance online (even in the
case of inadequate training), and (2) an efficient online basis-adaptation
mechanism, as the combination of basis enrichment and compression enables the
basis to adapt online while controlling its dimension
Tensorizing Neural Networks
Deep neural networks currently demonstrate state-of-the-art performance in
several domains. At the same time, models of this class are very demanding in
terms of computational resources. In particular, a large amount of memory is
required by commonly used fully-connected layers, making it hard to use the
models on low-end devices and stopping the further increase of the model size.
In this paper we convert the dense weight matrices of the fully-connected
layers to the Tensor Train format such that the number of parameters is reduced
by a huge factor and at the same time the expressive power of the layer is
preserved. In particular, for the Very Deep VGG networks we report the
compression factor of the dense weight matrix of a fully-connected layer up to
200000 times leading to the compression factor of the whole network up to 7
times
- …