Search CORE

14,736 research outputs found

Totally parallel multilevel algorithms

Author: Frederickson Paul O.
Publication venue
Publication date
Field of study

Four totally parallel algorithms for the solution of a sparse linear system have common characteristics which become quite apparent when they are implemented on a highly parallel hypercube such as the CM2. These four algorithms are Parallel Superconvergent Multigrid (PSMG) of Frederickson and McBryan, Robust Multigrid (RMG) of Hackbusch, the FFT based Spectral Algorithm, and Parallel Cyclic Reduction. In fact, all four can be formulated as particular cases of the same totally parallel multilevel algorithm, which are referred to as TPMA. In certain cases the spectral radius of TPMA is zero, and it is recognized to be a direct algorithm. In many other cases the spectral radius, although not zero, is small enough that a single iteration per timestep keeps the local error within the required tolerance

NASA Technical Reports Server

Fast, Dense Feature SDM on an iPhone

Author: Fagg Ashton
Lucey Simon
Sridharan Sridha
Publication venue
Publication date: 15/12/2016
Field of study

In this paper, we present our method for enabling dense SDM to run at over 90 FPS on a mobile device. Our contributions are two-fold. Drawing inspiration from the FFT, we propose a Sparse Compositional Regression (SCR) framework, which enables a significant speed up over classical dense regressors. Second, we propose a binary approximation to SIFT features. Binary Approximated SIFT (BASIFT) features, which are a computationally efficient approximation to SIFT, a commonly used feature with SDM. We demonstrate the performance of our algorithm on an iPhone 7, and show that we achieve similar accuracy to SDM

arXiv.org e-Print Archive

Crossref

Queensland University of Technology ePrints Archive

Convolutional Dictionary Learning through Tensor Factorization

Author: Anandkumar Animashree
Huang Furong
Publication venue
Publication date: 01/01/2015
Field of study

Tensor methods have emerged as a powerful paradigm for consistent learning of many latent variable models such as topic models, independent component analysis and dictionary learning. Model parameters are estimated via CP decomposition of the observed higher order input moments. However, in many domains, additional invariances such as shift invariances exist, enforced via models such as convolutional dictionary learning. In this paper, we develop novel tensor decomposition algorithms for parameter estimation of convolutional models. Our algorithm is based on the popular alternating least squares method, but with efficient projections onto the space of stacked circulant matrices. Our method is embarrassingly parallel and consists of simple operations such as fast Fourier transforms and matrix multiplications. Our algorithm converges to the dictionary much faster and more accurately compared to the alternating minimization over filters and activation maps

arXiv.org e-Print Archive

eScholarship - University of California

Caltech Authors

Distributed and parallel sparse convex optimization for radio interferometry with PURIFY

Author: Cai Xiaohao
Christidi Ilektra
d'Avezac Mayeul
Guichard Roland
McEwen Jason D.
Perez-Suarez David
Pratley Luke
Publication venue
Publication date: 11/03/2019
Field of study

Next generation radio interferometric telescopes are entering an era of big data with extremely large data sets. While these telescopes can observe the sky in higher sensitivity and resolution than before, computational challenges in image reconstruction need to be overcome to realize the potential of forthcoming telescopes. New methods in sparse image reconstruction and convex optimization techniques (cf. compressive sensing) have shown to produce higher fidelity reconstructions of simulations and real observations than traditional methods. This article presents distributed and parallel algorithms and implementations to perform sparse image reconstruction, with significant practical considerations that are important for implementing these algorithms for Big Data. We benchmark the algorithms presented, showing that they are considerably faster than their serial equivalents. We then pre-sample gridding kernels to scale the distributed algorithms to larger data sizes, showing application times for 1 Gb to 2.4 Tb data sets over 25 to 100 nodes for up to 50 billion visibilities, and find that the run-times for the distributed algorithms range from 100 milliseconds to 3 minutes per iteration. This work presents an important step in working towards computationally scalable and efficient algorithms and implementations that are needed to image observations of both extended and compact sources from next generation radio interferometers such as the SKA. The algorithms are implemented in the latest versions of the SOPT (https://github.com/astro-informatics/sopt) and PURIFY (https://github.com/astro-informatics/purify) software packages {(Versions 3.1.0)}, which have been released alongside of this article.Comment: 25 pages, 5 figure

arXiv.org e-Print Archive

Southampton (e-Prints Soton)