14,736 research outputs found
Totally parallel multilevel algorithms
Four totally parallel algorithms for the solution of a sparse linear system have common characteristics which become quite apparent when they are implemented on a highly parallel hypercube such as the CM2. These four algorithms are Parallel Superconvergent Multigrid (PSMG) of Frederickson and McBryan, Robust Multigrid (RMG) of Hackbusch, the FFT based Spectral Algorithm, and Parallel Cyclic Reduction. In fact, all four can be formulated as particular cases of the same totally parallel multilevel algorithm, which are referred to as TPMA. In certain cases the spectral radius of TPMA is zero, and it is recognized to be a direct algorithm. In many other cases the spectral radius, although not zero, is small enough that a single iteration per timestep keeps the local error within the required tolerance
Fast, Dense Feature SDM on an iPhone
In this paper, we present our method for enabling dense SDM to run at over 90
FPS on a mobile device. Our contributions are two-fold. Drawing inspiration
from the FFT, we propose a Sparse Compositional Regression (SCR) framework,
which enables a significant speed up over classical dense regressors. Second,
we propose a binary approximation to SIFT features. Binary Approximated SIFT
(BASIFT) features, which are a computationally efficient approximation to SIFT,
a commonly used feature with SDM. We demonstrate the performance of our
algorithm on an iPhone 7, and show that we achieve similar accuracy to SDM
Convolutional Dictionary Learning through Tensor Factorization
Tensor methods have emerged as a powerful paradigm for consistent learning of
many latent variable models such as topic models, independent component
analysis and dictionary learning. Model parameters are estimated via CP
decomposition of the observed higher order input moments. However, in many
domains, additional invariances such as shift invariances exist, enforced via
models such as convolutional dictionary learning. In this paper, we develop
novel tensor decomposition algorithms for parameter estimation of convolutional
models. Our algorithm is based on the popular alternating least squares method,
but with efficient projections onto the space of stacked circulant matrices.
Our method is embarrassingly parallel and consists of simple operations such as
fast Fourier transforms and matrix multiplications. Our algorithm converges to
the dictionary much faster and more accurately compared to the alternating
minimization over filters and activation maps
Distributed and parallel sparse convex optimization for radio interferometry with PURIFY
Next generation radio interferometric telescopes are entering an era of big
data with extremely large data sets. While these telescopes can observe the sky
in higher sensitivity and resolution than before, computational challenges in
image reconstruction need to be overcome to realize the potential of
forthcoming telescopes. New methods in sparse image reconstruction and convex
optimization techniques (cf. compressive sensing) have shown to produce higher
fidelity reconstructions of simulations and real observations than traditional
methods. This article presents distributed and parallel algorithms and
implementations to perform sparse image reconstruction, with significant
practical considerations that are important for implementing these algorithms
for Big Data. We benchmark the algorithms presented, showing that they are
considerably faster than their serial equivalents. We then pre-sample gridding
kernels to scale the distributed algorithms to larger data sizes, showing
application times for 1 Gb to 2.4 Tb data sets over 25 to 100 nodes for up to
50 billion visibilities, and find that the run-times for the distributed
algorithms range from 100 milliseconds to 3 minutes per iteration. This work
presents an important step in working towards computationally scalable and
efficient algorithms and implementations that are needed to image observations
of both extended and compact sources from next generation radio interferometers
such as the SKA. The algorithms are implemented in the latest versions of the
SOPT (https://github.com/astro-informatics/sopt) and PURIFY
(https://github.com/astro-informatics/purify) software packages {(Versions
3.1.0)}, which have been released alongside of this article.Comment: 25 pages, 5 figure
- …