3,914 research outputs found
Hydra: An Accelerator for Real-Time Edge-Aware Permeability Filtering in 65nm CMOS
Many modern video processing pipelines rely on edge-aware (EA) filtering
methods. However, recent high-quality methods are challenging to run in
real-time on embedded hardware due to their computational load. To this end, we
propose an area-efficient and real-time capable hardware implementation of a
high quality EA method. In particular, we focus on the recently proposed
permeability filter (PF) that delivers promising quality and performance in the
domains of HDR tone mapping, disparity and optical flow estimation. We present
an efficient hardware accelerator that implements a tiled variant of the PF
with low on-chip memory requirements and a significantly reduced external
memory bandwidth (6.4x w.r.t. the non-tiled PF). The design has been taped out
in 65 nm CMOS technology, is able to filter 720p grayscale video at 24.8 Hz and
achieves a high compute density of 6.7 GFLOPS/mm2 (12x higher than embedded
GPUs when scaled to the same technology node). The low area and bandwidth
requirements make the accelerator highly suitable for integration into SoCs
where silicon area budget is constrained and external memory is typically a
heavily contended resource
Linear estimation in Krein spaces. Part II. Applications
We have shown that several interesting problems in Hâ-filtering, quadratic game theory, and risk sensitive control and estimation follow as special cases of the Krein-space linear estimation theory developed in Part I. We show that all these problems can be cast into the problem of calculating the stationary point of certain second-order forms, and that by considering the appropriate state space models and error Gramians, we can use the Krein-space estimation theory to calculate the stationary points and study their properties. The approach discussed here allows for interesting generalizations, such as finite memory adaptive filtering with varying sliding patterns
Learning detectors quickly using structured covariance matrices
Computer vision is increasingly becoming interested in the rapid estimation
of object detectors. Canonical hard negative mining strategies are slow as they
require multiple passes of the large negative training set. Recent work has
demonstrated that if the distribution of negative examples is assumed to be
stationary, then Linear Discriminant Analysis (LDA) can learn comparable
detectors without ever revisiting the negative set. Even with this insight,
however, the time to learn a single object detector can still be on the order
of tens of seconds on a modern desktop computer. This paper proposes to
leverage the resulting structured covariance matrix to obtain detectors with
identical performance in orders of magnitude less time and memory. We elucidate
an important connection to the correlation filter literature, demonstrating
that these can also be trained without ever revisiting the negative set
Convolutional Dictionary Learning through Tensor Factorization
Tensor methods have emerged as a powerful paradigm for consistent learning of
many latent variable models such as topic models, independent component
analysis and dictionary learning. Model parameters are estimated via CP
decomposition of the observed higher order input moments. However, in many
domains, additional invariances such as shift invariances exist, enforced via
models such as convolutional dictionary learning. In this paper, we develop
novel tensor decomposition algorithms for parameter estimation of convolutional
models. Our algorithm is based on the popular alternating least squares method,
but with efficient projections onto the space of stacked circulant matrices.
Our method is embarrassingly parallel and consists of simple operations such as
fast Fourier transforms and matrix multiplications. Our algorithm converges to
the dictionary much faster and more accurately compared to the alternating
minimization over filters and activation maps
A pseudospectral matrix method for time-dependent tensor fields on a spherical shell
We construct a pseudospectral method for the solution of time-dependent,
non-linear partial differential equations on a three-dimensional spherical
shell. The problem we address is the treatment of tensor fields on the sphere.
As a test case we consider the evolution of a single black hole in numerical
general relativity. A natural strategy would be the expansion in tensor
spherical harmonics in spherical coordinates. Instead, we consider the simpler
and potentially more efficient possibility of a double Fourier expansion on the
sphere for tensors in Cartesian coordinates. As usual for the double Fourier
method, we employ a filter to address time-step limitations and certain
stability issues. We find that a tensor filter based on spin-weighted spherical
harmonics is successful, while two simplified, non-spin-weighted filters do not
lead to stable evolutions. The derivatives and the filter are implemented by
matrix multiplication for efficiency. A key technical point is the construction
of a matrix multiplication method for the spin-weighted spherical harmonic
filter. As example for the efficient parallelization of the double Fourier,
spin-weighted filter method we discuss an implementation on a GPU, which
achieves a speed-up of up to a factor of 20 compared to a single core CPU
implementation.Comment: 33 pages, 9 figure
- âŠ