31,378 research outputs found
Compressed Modular Matrix Multiplication
We propose to store several integers modulo a small prime into a single
machine word. Modular addition is performed by addition and possibly
subtraction of a word containing several times the modulo. Modular
Multiplication is not directly accessible but modular dot product can be
performed by an integer multiplication by the reverse integer. Modular
multiplication by a word containing a single residue is a also possible.
Therefore matrix multiplication can be performed on such a compressed storage.
We here give bounds on the sizes of primes and matrices for which such a
compression is possible. We also explicit the details of the required
compressed arithmetic routines.Comment: Published in: MICA'2008 : Milestones in Computer Algebra, Tobago :
Trinit\'e-et-Tobago (2008
Dictionary Learning for Blind One Bit Compressed Sensing
This letter proposes a dictionary learning algorithm for blind one bit
compressed sensing. In the blind one bit compressed sensing framework, the
original signal to be reconstructed from one bit linear random measurements is
sparse in an unknown domain. In this context, the multiplication of measurement
matrix \Ab and sparse domain matrix , \ie \Db=\Ab\Phi, should be
learned. Hence, we use dictionary learning to train this matrix. Towards that
end, an appropriate continuous convex cost function is suggested for one bit
compressed sensing and a simple steepest-descent method is exploited to learn
the rows of the matrix \Db. Experimental results show the effectiveness of
the proposed algorithm against the case of no dictionary learning, specially
with increasing the number of training signals and the number of sign
measurements.Comment: 5 pages, 3 figure
Construction of a Large Class of Deterministic Sensing Matrices that Satisfy a Statistical Isometry Property
Compressed Sensing aims to capture attributes of -sparse signals using
very few measurements. In the standard Compressed Sensing paradigm, the
\m\times \n measurement matrix \A is required to act as a near isometry on
the set of all -sparse signals (Restricted Isometry Property or RIP).
Although it is known that certain probabilistic processes generate \m \times
\n matrices that satisfy RIP with high probability, there is no practical
algorithm for verifying whether a given sensing matrix \A has this property,
crucial for the feasibility of the standard recovery algorithms. In contrast
this paper provides simple criteria that guarantee that a deterministic sensing
matrix satisfying these criteria acts as a near isometry on an overwhelming
majority of -sparse signals; in particular, most such signals have a unique
representation in the measurement domain. Probability still plays a critical
role, but it enters the signal model rather than the construction of the
sensing matrix. We require the columns of the sensing matrix to form a group
under pointwise multiplication. The construction allows recovery methods for
which the expected performance is sub-linear in \n, and only quadratic in
\m; the focus on expected performance is more typical of mainstream signal
processing than the worst-case analysis that prevails in standard Compressed
Sensing. Our framework encompasses many families of deterministic sensing
matrices, including those formed from discrete chirps, Delsarte-Goethals codes,
and extended BCH codes.Comment: 16 Pages, 2 figures, to appear in IEEE Journal of Selected Topics in
Signal Processing, the special issue on Compressed Sensin
Parallel structurally-symmetric sparse matrix-vector products on multi-core processors
We consider the problem of developing an efficient multi-threaded
implementation of the matrix-vector multiplication algorithm for sparse
matrices with structural symmetry. Matrices are stored using the compressed
sparse row-column format (CSRC), designed for profiting from the symmetric
non-zero pattern observed in global finite element matrices. Unlike classical
compressed storage formats, performing the sparse matrix-vector product using
the CSRC requires thread-safe access to the destination vector. To avoid race
conditions, we have implemented two partitioning strategies. In the first one,
each thread allocates an array for storing its contributions, which are later
combined in an accumulation step. We analyze how to perform this accumulation
in four different ways. The second strategy employs a coloring algorithm for
grouping rows that can be concurrently processed by threads. Our results
indicate that, although incurring an increase in the working set size, the
former approach leads to the best performance improvements for most matrices.Comment: 17 pages, 17 figures, reviewed related work section, fixed typo
Improving Matrix-vector Multiplication via Lossless Grammar-Compressed Matrices
As nowadays Machine Learning (ML) techniques are generating
huge data collections, the problem of how to efficiently engineer
their storage and operations is becoming of paramount importance.
In this article we propose a new lossless compression scheme for
real-valued matrices which achieves efficient performance in terms
of compression ratio and time for linear-algebra operations. Ex-
periments show that, as a compressor, our tool is clearly superior
to gzip and it is usually within 20% of xz in terms of compression
ratio. In addition, our compressed format supports matrix-vector
multiplications in time and space proportional to the size of the
compressed representation, unlike gzip and xz that require the full
decompression of the compressed matrix. To our knowledge our
lossless compressor is the first one achieving time and space com-
plexities which match the theoretical limit expressed by the k-th
order statistical entropy of the input.
To achieve further time/space reductions, we propose column-
reordering algorithms hinging on a novel column-similarity score.
Our experiments on various data sets of ML matrices show that our
column reordering can yield a further reduction of up to 16% in the
peak memory usage during matrix-vector multiplication.
Finally, we compare our proposal against the state-of-the-art
Compressed Linear Algebra (CLA) approach showing that ours runs
always at least twice faster (in a multi-thread setting), and achieves
better compressed space occupancy and peak memory usage. This
experimentally confirms the provably effective theoretical bounds
we show for our compressed-matrix approach
Common Subexpression-based Compression and Multiplication of Sparse Constant Matrices
In deep learning inference, model parameters are pruned and quantized to
reduce the model size. Compression methods and common subexpression (CSE)
elimination algorithms are applied on sparse constant matrices to deploy the
models on low-cost embedded devices. However, the state-of-the-art CSE
elimination methods do not scale well for handling large matrices. They reach
hours for extracting CSEs in a matrix while their matrix
multiplication algorithms execute longer than the conventional matrix
multiplication methods. Besides, there exist no compression methods for
matrices utilizing CSEs. As a remedy to this problem, a random search-based
algorithm is proposed in this paper to extract CSEs in the column pairs of a
constant matrix. It produces an adder tree for a matrix in a
minute. To compress the adder tree, this paper presents a compression format by
extending the Compressed Sparse Row (CSR) to include CSEs. While compression
rates of more than can be achieved compared to the original CSR format,
simulations for a single-core embedded system show that the matrix
multiplication execution time can be reduced by
- …