37,797 research outputs found
Fast Digital Convolutions using Bit-Shifts
An exact, one-to-one transform is presented that not only allows digital
circular convolutions, but is free from multiplications and quantisation errors
for transform lengths of arbitrary powers of two. The transform is analogous to
the Discrete Fourier Transform, with the canonical harmonics replaced by a set
of cyclic integers computed using only bit-shifts and additions modulo a prime
number. The prime number may be selected to occupy contemporary word sizes or
to be very large for cryptographic or data hiding applications. The transform
is an extension of the Rader Transforms via Carmichael's Theorem. These
properties allow for exact convolutions that are impervious to numerical
overflow and to utilise Fast Fourier Transform algorithms.Comment: 4 pages, 2 figures, submitted to IEEE Signal Processing Letter
An Orthogonal 16-point Approximate DCT for Image and Video Compression
A low-complexity orthogonal multiplierless approximation for the 16-point
discrete cosine transform (DCT) was introduced. The proposed method was
designed to possess a very low computational cost. A fast algorithm based on
matrix factorization was proposed requiring only 60~additions. The proposed
architecture outperforms classical and state-of-the-art algorithms when
assessed as a tool for image and video compression. Digital VLSI hardware
implementations were also proposed being physically realized in FPGA technology
and implemented in 45 nm up to synthesis and place-route levels. Additionally,
the proposed method was embedded into a high efficiency video coding (HEVC)
reference software for actual proof-of-concept. Obtained results show
negligible video degradation when compared to Chen DCT algorithm in HEVC.Comment: 18 pages, 7 figures, 6 table
Opendda: a Novel High-Performance Computational Framework for the Discrete Dipole Approximation
This work presents a highly optimized computational framework for the
Discrete Dipole Approximation, a numerical method for calculating the optical
properties associated with a target of arbitrary geometry that is widely used
in atmospheric, astrophysical and industrial simulations. Core optimizations
include the bit-fielding of integer data and iterative methods that complement
a new Discrete Fourier Transform (DFT) kernel, which efficiently calculates the
matrix vector products required by these iterative solution schemes. The new
kernel performs the requisite 3-D DFTs as ensembles of 1-D transforms, and by
doing so, is able to reduce the number of constituent 1-D transforms by 60% and
the memory by over 80%. The optimizations also facilitate the use of parallel
techniques to further enhance the performance. Complete OpenMP-based
shared-memory and MPI-based distributed-memory implementations have been
created to take full advantage of the various architectures. Several benchmarks
of the new framework indicate extremely favorable performance and scalability.
OpenDDA is available following the usual open source regulations from
http://www.opendda.orgComment: 29 pages, 5 figure
Improved 8-point Approximate DCT for Image and Video Compression Requiring Only 14 Additions
Video processing systems such as HEVC requiring low energy consumption needed
for the multimedia market has lead to extensive development in fast algorithms
for the efficient approximation of 2-D DCT transforms. The DCT is employed in a
multitude of compression standards due to its remarkable energy compaction
properties. Multiplier-free approximate DCT transforms have been proposed that
offer superior compression performance at very low circuit complexity. Such
approximations can be realized in digital VLSI hardware using additions and
subtractions only, leading to significant reductions in chip area and power
consumption compared to conventional DCTs and integer transforms. In this
paper, we introduce a novel 8-point DCT approximation that requires only 14
addition operations and no multiplications. The proposed transform possesses
low computational complexity and is compared to state-of-the-art DCT
approximations in terms of both algorithm complexity and peak signal-to-noise
ratio. The proposed DCT approximation is a candidate for reconfigurable video
standards such as HEVC. The proposed transform and several other DCT
approximations are mapped to systolic-array digital architectures and
physically realized as digital prototype circuits using FPGA technology and
mapped to 45 nm CMOS technology.Comment: 30 pages, 7 figures, 5 table
On the spectra of hypermatrix direct sum and Kronecker products constructions
Our main result is an elementary derivation of the spectral decomposition of
hypermatrices generated by arbitrary combinations of Kronecker products and
direct sums of cubic side length
A Class of DCT Approximations Based on the Feig-Winograd Algorithm
A new class of matrices based on a parametrization of the Feig-Winograd
factorization of 8-point DCT is proposed. Such parametrization induces a matrix
subspace, which unifies a number of existing methods for DCT approximation. By
solving a comprehensive multicriteria optimization problem, we identified
several new DCT approximations. Obtained solutions were sought to possess the
following properties: (i) low multiplierless computational complexity, (ii)
orthogonality or near orthogonality, (iii) low complexity invertibility, and
(iv) close proximity and performance to the exact DCT. Proposed approximations
were submitted to assessment in terms of proximity to the DCT, coding
performance, and suitability for image compression. Considering Pareto
efficiency, particular new proposed approximations could outperform various
existing methods archived in literature.Comment: 26 pages, 4 figures, 5 tables, fixed arithmetic complexity in Table
I
Lossless Image and Intra-frame Compression with Integer-to-Integer DST
Video coding standards are primarily designed for efficient lossy
compression, but it is also desirable to support efficient lossless compression
within video coding standards using small modifications to the lossy coding
architecture. A simple approach is to skip transform and quantization, and
simply entropy code the prediction residual. However, this approach is
inefficient at compression. A more efficient and popular approach is to skip
transform and quantization but also process the residual block with DPCM, along
the horizontal or vertical direction, prior to entropy coding. This paper
explores an alternative approach based on processing the residual block with
integer-to-integer (i2i) transforms. I2i transforms can map integer pixels to
integer transform coefficients without increasing the dynamic range and can be
used for lossless compression. We focus on lossless intra coding and develop
novel i2i approximations of the odd type-3 DST (ODST-3). Experimental results
with the HEVC reference software show that the developed i2i approximations of
the ODST-3 improve lossless intra-frame compression efficiency with respect to
HEVC version 2, which uses the popular DPCM method, by an average 2.7% without
a significant effect on computational complexity.Comment: Draft consisting of 16 page
Efficient Quantum Transforms
Quantum mechanics requires the operation of quantum computers to be unitary,
and thus makes it important to have general techniques for developing fast
quantum algorithms for computing unitary transforms. A quantum routine for
computing a generalized Kronecker product is given. Applications include
re-development of the networks for computing the Walsh-Hadamard and the quantum
Fourier transform. New networks for two wavelet transforms are given. Quantum
computation of Fourier transforms for non-Abelian groups is defined. A slightly
relaxed definition is shown to simplify the analysis and the networks that
computes the transforms. Efficient networks for computing such transforms for a
class of metacyclic groups are introduced. A novel network for computing a
Fourier transform for a group used in quantum error-correction is also given.Comment: 30 pages, LaTeX2e, 7 figures include
Computing Hasse-Witt matrices of hyperelliptic curves in average polynomial time
We present an efficient algorithm to compute the Hasse-Witt matrix of a
hyperelliptic curve C/Q modulo all primes of good reduction up to a given bound
N, based on the average polynomial-time algorithm recently introduced by
Harvey. An implementation for hyperelliptic curves of genus 2 and 3 is more
than an order of magnitude faster than alternative methods for N = 2^26.Comment: 17 page
Fractional integrals and Fourier transforms
This paper gives a short survey of some basic results related to estimates of
fractional integrals and Fourier transforms. It is closely adjoint to our
previous survey papers \cite{K1998} and \cite{K2007}. The main methods used in
the paper are based on nonincreasing rearrangements. We give alternative proofs
of some results.
We observe also that the paper represents the mini-course given by the author
at Barcelona University in October, 2014.Comment: 42 page
- …