84,605 research outputs found
A Massively Parallel Algorithm for the Approximate Calculation of Inverse p-th Roots of Large Sparse Matrices
We present the submatrix method, a highly parallelizable method for the
approximate calculation of inverse p-th roots of large sparse symmetric
matrices which are required in different scientific applications. We follow the
idea of Approximate Computing, allowing imprecision in the final result in
order to be able to utilize the sparsity of the input matrix and to allow
massively parallel execution. For an n x n matrix, the proposed algorithm
allows to distribute the calculations over n nodes with only little
communication overhead. The approximate result matrix exhibits the same
sparsity pattern as the input matrix, allowing for efficient reuse of allocated
data structures.
We evaluate the algorithm with respect to the error that it introduces into
calculated results, as well as its performance and scalability. We demonstrate
that the error is relatively limited for well-conditioned matrices and that
results are still valuable for error-resilient applications like
preconditioning even for ill-conditioned matrices. We discuss the execution
time and scaling of the algorithm on a theoretical level and present a
distributed implementation of the algorithm using MPI and OpenMP. We
demonstrate the scalability of this implementation by running it on a
high-performance compute cluster comprised of 1024 CPU cores, showing a speedup
of 665x compared to single-threaded execution
Feasibility and performances of compressed-sensing and sparse map-making with Herschel/PACS data
The Herschel Space Observatory of ESA was launched in May 2009 and is in
operation since. From its distant orbit around L2 it needs to transmit a huge
quantity of information through a very limited bandwidth. This is especially
true for the PACS imaging camera which needs to compress its data far more than
what can be achieved with lossless compression. This is currently solved by
including lossy averaging and rounding steps on board. Recently, a new theory
called compressed-sensing emerged from the statistics community. This theory
makes use of the sparsity of natural (or astrophysical) images to optimize the
acquisition scheme of the data needed to estimate those images. Thus, it can
lead to high compression factors.
A previous article by Bobin et al. (2008) showed how the new theory could be
applied to simulated Herschel/PACS data to solve the compression requirement of
the instrument. In this article, we show that compressed-sensing theory can
indeed be successfully applied to actual Herschel/PACS data and give
significant improvements over the standard pipeline. In order to fully use the
redundancy present in the data, we perform full sky map estimation and
decompression at the same time, which cannot be done in most other compression
methods. We also demonstrate that the various artifacts affecting the data
(pink noise, glitches, whose behavior is a priori not well compatible with
compressed-sensing) can be handled as well in this new framework. Finally, we
make a comparison between the methods from the compressed-sensing scheme and
data acquired with the standard compression scheme. We discuss improvements
that can be made on ground for the creation of sky maps from the data.Comment: 11 pages, 6 figures, 5 tables, peer-reviewed articl
Distributing the Kalman Filter for Large-Scale Systems
This paper derives a \emph{distributed} Kalman filter to estimate a sparsely
connected, large-scale, dimensional, dynamical system monitored by a
network of sensors. Local Kalman filters are implemented on the
(dimensional, where ) sub-systems that are obtained after
spatially decomposing the large-scale system. The resulting sub-systems
overlap, which along with an assimilation procedure on the local Kalman
filters, preserve an th order Gauss-Markovian structure of the centralized
error processes. The information loss due to the th order Gauss-Markovian
approximation is controllable as it can be characterized by a divergence that
decreases as . The order of the approximation, , leads to a lower
bound on the dimension of the sub-systems, hence, providing a criterion for
sub-system selection. The assimilation procedure is carried out on the local
error covariances with a distributed iterate collapse inversion (DICI)
algorithm that we introduce. The DICI algorithm computes the (approximated)
centralized Riccati and Lyapunov equations iteratively with only local
communication and low-order computation. We fuse the observations that are
common among the local Kalman filters using bipartite fusion graphs and
consensus averaging algorithms. The proposed algorithm achieves full
distribution of the Kalman filter that is coherent with the centralized Kalman
filter with an th order Gaussian-Markovian structure on the centralized
error processes. Nowhere storage, communication, or computation of
dimensional vectors and matrices is needed; only dimensional
vectors and matrices are communicated or used in the computation at the
sensors
Parallel matrix inversion techniques
In this paper, we present techniques for inverting sparse, symmetric and positive definite matrices on parallel and distributed computers. We propose two algorithms, one for SIMD implementation and the other for MIMD implementation. These algorithms are modified versions of Gaussian elimination and they take into account the sparseness of the matrix. Our algorithms perform better than the general parallel Gaussian elimination algorithm. In order to demonstrate the usefulness of our technique, we implemented the snake problem using our sparse matrix algorithm. Our studies reveal that the proposed sparse matrix inversion algorithm significantly reduces the time taken for obtaining the solution of the snake problem. In this paper, we present the results of our experimental work
GPU-Accelerated Algorithms for Compressed Signals Recovery with Application to Astronomical Imagery Deblurring
Compressive sensing promises to enable bandwidth-efficient on-board
compression of astronomical data by lifting the encoding complexity from the
source to the receiver. The signal is recovered off-line, exploiting GPUs
parallel computation capabilities to speedup the reconstruction process.
However, inherent GPU hardware constraints limit the size of the recoverable
signal and the speedup practically achievable. In this work, we design parallel
algorithms that exploit the properties of circulant matrices for efficient
GPU-accelerated sparse signals recovery. Our approach reduces the memory
requirements, allowing us to recover very large signals with limited memory. In
addition, it achieves a tenfold signal recovery speedup thanks to ad-hoc
parallelization of matrix-vector multiplications and matrix inversions.
Finally, we practically demonstrate our algorithms in a typical application of
circulant matrices: deblurring a sparse astronomical image in the compressed
domain
Parallel computation of optimized arrays for 2-D electrical imaging surveys
Modern automatic multi-electrode survey instruments have made it possible to use non-traditional arrays to maximize the subsurface resolution from electrical imaging surveys. Previous studies have shown that one of the best methods for generating optimized arrays is to select the set of array configurations that maximizes the model resolution for a homogeneous earth model. The Sherman–Morrison Rank-1 update is used to calculate the change in the model resolution when a new array is added to a selected set of array configurations. This method had the disadvantage that it required several hours of computer time even for short 2-D survey lines. The algorithm was modified to calculate the change in the model resolution rather than the entire resolution matrix. This reduces the computer time and memory required as well as the computational round-off errors. The matrix–vector multiplications for a single add-on array were replaced with matrix–matrix multiplications for 28 add-on arrays to further reduce the computer time. The temporary variables were stored in the double-precision Single Instruction Multiple Data (SIMD) registers within the CPU to minimize computer memory access. A further reduction in the computer time is achieved by using the computer graphics card Graphics Processor Unit (GPU) as a highly parallel mathematical coprocessor. This makes it possible to carry out the calculations for 512 add-on arrays in parallel using the GPU. The changes reduce the computer time by more than two orders of magnitude. The algorithm used to generate an optimized data set adds a specified number of new array configurations after each iteration to the existing set. The resolution of the optimized data set can be increased by adding a smaller number of new array configurations after each iteration. Although this increases the computer time required to generate an optimized data set with the same number of data points, the new fast numerical routines has made this practical on commonly available microcomputers
Enabling Factor Analysis on Thousand-Subject Neuroimaging Datasets
The scale of functional magnetic resonance image data is rapidly increasing
as large multi-subject datasets are becoming widely available and
high-resolution scanners are adopted. The inherent low-dimensionality of the
information in this data has led neuroscientists to consider factor analysis
methods to extract and analyze the underlying brain activity. In this work, we
consider two recent multi-subject factor analysis methods: the Shared Response
Model and Hierarchical Topographic Factor Analysis. We perform analytical,
algorithmic, and code optimization to enable multi-node parallel
implementations to scale. Single-node improvements result in 99x and 1812x
speedups on these two methods, and enables the processing of larger datasets.
Our distributed implementations show strong scaling of 3.3x and 5.5x
respectively with 20 nodes on real datasets. We also demonstrate weak scaling
on a synthetic dataset with 1024 subjects, on up to 1024 nodes and 32,768
cores
- …