Search CORE

84,605 research outputs found

A Massively Parallel Algorithm for the Approximate Calculation of Inverse p-th Roots of Large Sparse Matrices

Author: Kühne Thomas D.
Lass Michael
Mohr Stephan
Plessl Christian
Wiebeler Hendrik
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 12/04/2018
Field of study

We present the submatrix method, a highly parallelizable method for the approximate calculation of inverse p-th roots of large sparse symmetric matrices which are required in different scientific applications. We follow the idea of Approximate Computing, allowing imprecision in the final result in order to be able to utilize the sparsity of the input matrix and to allow massively parallel execution. For an n x n matrix, the proposed algorithm allows to distribute the calculations over n nodes with only little communication overhead. The approximate result matrix exhibits the same sparsity pattern as the input matrix, allowing for efficient reuse of allocated data structures. We evaluate the algorithm with respect to the error that it introduces into calculated results, as well as its performance and scalability. We demonstrate that the error is relatively limited for well-conditioned matrices and that results are still valuable for error-resilient applications like preconditioning even for ill-conditioned matrices. We discuss the execution time and scaling of the algorithm on a theoretical level and present a distributed implementation of the algorithm using MPI and OpenMP. We demonstrate the scalability of this implementation by running it on a high-performance compute cluster comprised of 1024 CPU cores, showing a speedup of 665x compared to single-threaded execution

arXiv.org e-Print Archive

Crossref

UPCommons. Portal del coneixement obert de la UPC

Feasibility and performances of compressed-sensing and sparse map-making with Herschel/PACS data

Author: Bertin
Bobin
Bruckstein
Candes
Candès
Candès
Cantalupo
de Graauw
Donoho
Griffin
J.-L. Starck
Lustig
M. Sauvage
N. Barbey
P. Chanial
Pence
Pilbratt
Poglitsch
Pratt
R. Ottensamer
Skodras
Publication venue: 'EDP Sciences'
Publication date: 02/12/2010
Field of study

The Herschel Space Observatory of ESA was launched in May 2009 and is in operation since. From its distant orbit around L2 it needs to transmit a huge quantity of information through a very limited bandwidth. This is especially true for the PACS imaging camera which needs to compress its data far more than what can be achieved with lossless compression. This is currently solved by including lossy averaging and rounding steps on board. Recently, a new theory called compressed-sensing emerged from the statistics community. This theory makes use of the sparsity of natural (or astrophysical) images to optimize the acquisition scheme of the data needed to estimate those images. Thus, it can lead to high compression factors. A previous article by Bobin et al. (2008) showed how the new theory could be applied to simulated Herschel/PACS data to solve the compression requirement of the instrument. In this article, we show that compressed-sensing theory can indeed be successfully applied to actual Herschel/PACS data and give significant improvements over the standard pipeline. In order to fully use the redundancy present in the data, we perform full sky map estimation and decompression at the same time, which cannot be done in most other compression methods. We also demonstrate that the various artifacts affecting the data (pink noise, glitches, whose behavior is a priori not well compatible with compressed-sensing) can be handled as well in this new framework. Finally, we make a comparison between the methods from the compressed-sensing scheme and data acquired with the standard compression scheme. We discuss improvements that can be made on ground for the creation of sky maps from the data.Comment: 11 pages, 6 figures, 5 tables, peer-reviewed articl

arXiv.org e-Print Archive

Crossref

EDP Sciences OAI-PMH repository (1.2.0)

Distributing the Kalman Filter for Large-Scale Systems

Author: Khan Usman A.
Moura Jose M. F.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 25/02/2008
Field of study

This paper derives a \emph{distributed} Kalman filter to estimate a sparsely connected, large-scale,

n-

dimensional, dynamical system monitored by a network of

N

sensors. Local Kalman filters are implemented on the (

n_l-

dimensional, where

n_l\ll n

) sub-systems that are obtained after spatially decomposing the large-scale system. The resulting sub-systems overlap, which along with an assimilation procedure on the local Kalman filters, preserve an

L

th order Gauss-Markovian structure of the centralized error processes. The information loss due to the

L

th order Gauss-Markovian approximation is controllable as it can be characterized by a divergence that decreases as

L\uparrow

. The order of the approximation,

L

, leads to a lower bound on the dimension of the sub-systems, hence, providing a criterion for sub-system selection. The assimilation procedure is carried out on the local error covariances with a distributed iterate collapse inversion (DICI) algorithm that we introduce. The DICI algorithm computes the (approximated) centralized Riccati and Lyapunov equations iteratively with only local communication and low-order computation. We fuse the observations that are common among the local Kalman filters using bipartite fusion graphs and consensus averaging algorithms. The proposed algorithm achieves full distribution of the Kalman filter that is coherent with the centralized Kalman filter with an

L

th order Gaussian-Markovian structure on the centralized error processes. Nowhere storage, communication, or computation of

n-

dimensional vectors and matrices is needed; only

n_l \ll n

dimensional vectors and matrices are communicated or used in the computation at the sensors

arXiv.org e-Print Archive

Crossref

Parallel matrix inversion techniques

Author: Kumar M. J.
Lau K. K.
Venkatesh S.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1996
Field of study

In this paper, we present techniques for inverting sparse, symmetric and positive definite matrices on parallel and distributed computers. We propose two algorithms, one for SIMD implementation and the other for MIMD implementation. These algorithms are modified versions of Gaussian elimination and they take into account the sparseness of the matrix. Our algorithms perform better than the general parallel Gaussian elimination algorithm. In order to demonstrate the usefulness of our technique, we implemented the snake problem using our sparse matrix algorithm. Our studies reveal that the proposed sparse matrix inversion algorithm significantly reduces the time taken for obtaining the solution of the snake problem. In this paper, we present the results of our experimental work

Deakin Research Online

A linear algebra processor using Monte Carlo methods

Author: Alexandrov Vassil Nikolov
Cadenas Medina Jose Oswaldo
Megson Graham M
Plaks T P
Publication venue
Publication date: 11/09/2003
Field of study

Central Archive at the University of Reading

GPU-Accelerated Algorithms for Compressed Signals Recovery with Application to Astronomical Imagery Deblurring

Author: Fiandrotti Attilio
Fosson Sophie M.
Magli Enrico
Ravazzi Chiara
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2017
Field of study

Compressive sensing promises to enable bandwidth-efficient on-board compression of astronomical data by lifting the encoding complexity from the source to the receiver. The signal is recovered off-line, exploiting GPUs parallel computation capabilities to speedup the reconstruction process. However, inherent GPU hardware constraints limit the size of the recoverable signal and the speedup practically achievable. In this work, we design parallel algorithms that exploit the properties of circulant matrices for efficient GPU-accelerated sparse signals recovery. Our approach reduces the memory requirements, allowing us to recover very large signals with limited memory. In addition, it achieves a tenfold signal recovery speedup thanks to ad-hoc parallelization of matrix-vector multiplications and matrix inversions. Finally, we practically demonstrate our algorithms in a typical application of circulant matrices: deblurring a sparse astronomical image in the compressed domain

arXiv.org e-Print Archive

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Institutional Research Information System University of Turin

PORTO Publications Open Repository TOrino

Parallel computation of optimized arrays for 2-D electrical imaging surveys

Author: Chambers J.E.
Loke M.H.
Wilkinson P.B.
Publication venue: 'Wiley'
Publication date: 01/12/2010
Field of study

Modern automatic multi-electrode survey instruments have made it possible to use non-traditional arrays to maximize the subsurface resolution from electrical imaging surveys. Previous studies have shown that one of the best methods for generating optimized arrays is to select the set of array configurations that maximizes the model resolution for a homogeneous earth model. The Sherman–Morrison Rank-1 update is used to calculate the change in the model resolution when a new array is added to a selected set of array configurations. This method had the disadvantage that it required several hours of computer time even for short 2-D survey lines. The algorithm was modified to calculate the change in the model resolution rather than the entire resolution matrix. This reduces the computer time and memory required as well as the computational round-off errors. The matrix–vector multiplications for a single add-on array were replaced with matrix–matrix multiplications for 28 add-on arrays to further reduce the computer time. The temporary variables were stored in the double-precision Single Instruction Multiple Data (SIMD) registers within the CPU to minimize computer memory access. A further reduction in the computer time is achieved by using the computer graphics card Graphics Processor Unit (GPU) as a highly parallel mathematical coprocessor. This makes it possible to carry out the calculations for 512 add-on arrays in parallel using the GPU. The changes reduce the computer time by more than two orders of magnitude. The algorithm used to generate an optimized data set adds a specified number of new array configurations after each iteration to the existing set. The resolution of the optimized data set can be increased by adding a smaller number of new array configurations after each iteration. Although this increases the computer time required to generate an optimized data set with the same number of data points, the new fast numerical routines has made this practical on commonly available microcomputers

NERC Open Research Archive

Enabling Factor Analysis on Thousand-Subject Neuroimaging Datasets

Author: Anderson Michael J.
Capotă Mihai
Chen Po-Hsuan
Manning Jeremy R.
Norman Kenneth A.
Ramadge Peter J.
Turek Javier S.
Wang Yida
Willke Theodore L.
Zhu Xia
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

The scale of functional magnetic resonance image data is rapidly increasing as large multi-subject datasets are becoming widely available and high-resolution scanners are adopted. The inherent low-dimensionality of the information in this data has led neuroscientists to consider factor analysis methods to extract and analyze the underlying brain activity. In this work, we consider two recent multi-subject factor analysis methods: the Shared Response Model and Hierarchical Topographic Factor Analysis. We perform analytical, algorithmic, and code optimization to enable multi-node parallel implementations to scale. Single-node improvements result in 99x and 1812x speedups on these two methods, and enables the processing of larger datasets. Our distributed implementations show strong scaling of 3.3x and 5.5x respectively with 20 nodes on real datasets. We also demonstrate weak scaling on a synthetic dataset with 1024 subjects, on up to 1024 nodes and 32,768 cores

arXiv.org e-Print Archive

Princeton University Open Access Repository

Crossref