2,569 research outputs found
Locality-aware parallel block-sparse matrix-matrix multiplication using the Chunks and Tasks programming model
We present a method for parallel block-sparse matrix-matrix multiplication on
distributed memory clusters. By using a quadtree matrix representation, data
locality is exploited without prior information about the matrix sparsity
pattern. A distributed quadtree matrix representation is straightforward to
implement due to our recent development of the Chunks and Tasks programming
model [Parallel Comput. 40, 328 (2014)]. The quadtree representation combined
with the Chunks and Tasks model leads to favorable weak and strong scaling of
the communication cost with the number of processes, as shown both
theoretically and in numerical experiments.
Matrices are represented by sparse quadtrees of chunk objects. The leaves in
the hierarchy are block-sparse submatrices. Sparsity is dynamically detected by
the matrix library and may occur at any level in the hierarchy and/or within
the submatrix leaves. In case graphics processing units (GPUs) are available,
both CPUs and GPUs are used for leaf-level multiplication work, thus making use
of the full computing capacity of each node.
The performance is evaluated for matrices with different sparsity structures,
including examples from electronic structure calculations. Compared to methods
that do not exploit data locality, our locality-aware approach reduces
communication significantly, achieving essentially constant communication per
node in weak scaling tests.Comment: 35 pages, 14 figure
Distributed Finite Element Analysis Using a Transputer Network
The principal objective of this research effort was to demonstrate the extraordinarily cost effective acceleration of finite element structural analysis problems using a transputer-based parallel processing network. This objective was accomplished in the form of a commercially viable parallel processing workstation. The workstation is a desktop size, low-maintenance computing unit capable of supercomputer performance yet costs two orders of magnitude less. To achieve the principal research objective, a transputer based structural analysis workstation termed XPFEM was implemented with linear static structural analysis capabilities resembling commercially available NASTRAN. Finite element model files, generated using the on-line preprocessing module or external preprocessing packages, are downloaded to a network of 32 transputers for accelerated solution. The system currently executes at about one third Cray X-MP24 speed but additional acceleration appears likely. For the NASA selected demonstration problem of a Space Shuttle main engine turbine blade model with about 1500 nodes and 4500 independent degrees of freedom, the Cray X-MP24 required 23.9 seconds to obtain a solution while the transputer network, operated from an IBM PC-AT compatible host computer, required 71.7 seconds. Consequently, the 15,000,000 Cray X-MP24 system
Parallelization for image processing algorithms based chain and mid-crack codes
Freeman chain code is a widely-used description for a contour image. Another mid-crack code algorithm was proposed as a more precise method for image representation. We have developed a coding algorithm which is suitable to generate either chain code description or mid-crack code description by switching between two different tables. Since there is a strong urge to use parallel processing in image related problems, a parallel coding algorithm is implemented. This algorithm is developed on a pyramid architecture and a N cube architecture. Using link-list data structure and neighbor identification, the algorithm gains efficiency because no sorting or neighborhood pairing is needed.
In this dissertation, the local symmetry deficiency (LSD) computation to calculate the local k-symmetry is embedded in the coding algorithm. Therefore, we can finish the code extraction and the LSD computation in one pass. The embedding process is not limited to the k-symmetry algorithm and has the capability of parallelism.
An adaptive quadtree to chain code conversion algorithm is also presented. This algorithm is designed for constructing the chain codes of the resulting quadtree from the boolean operation of two quadtrees by using the chain codes of the original one. The algorithm has the parallelism and is ready to be implemented on a pyramid architecture.
Our parallel processing approach can be viewed as a parallelization paradigm - a template to embed image processing algorithms in the chain coding process and to implement them in a parallel approach
Constellation Queries over Big Data
A geometrical pattern is a set of points with all pairwise distances (or,
more generally, relative distances) specified. Finding matches to such patterns
has applications to spatial data in seismic, astronomical, and transportation
contexts. For example, a particularly interesting geometric pattern in
astronomy is the Einstein cross, which is an astronomical phenomenon in which a
single quasar is observed as four distinct sky objects (due to gravitational
lensing) when captured by earth telescopes. Finding such crosses, as well as
other geometric patterns, is a challenging problem as the potential number of
sets of elements that compose shapes is exponentially large in the size of the
dataset and the pattern. In this paper, we denote geometric patterns as
constellation queries and propose algorithms to find them in large data
applications. Our methods combine quadtrees, matrix multiplication, and
unindexed join processing to discover sets of points that match a geometric
pattern within some additive factor on the pairwise distances. Our distributed
experiments show that the choice of composition algorithm (matrix
multiplication or nested loops) depends on the freedom introduced in the query
geometry through the distance additive factor. Three clearly identified blocks
of threshold values guide the choice of the best composition algorithm.
Finally, solving the problem for relative distances requires a novel
continuous-to-discrete transformation. To the best of our knowledge this paper
is the first to investigate constellation queries at scale
Recommended from our members
Using topological sweep to extract the boundaries of regions in maps represented by region quadtrees
A variant of the plane sweep paradigm known as topological sweep is adapted to solve geometric problems involving two-dimensional regions when the underlying representation is a region quadtree. The utility of this technique is illustrated by showing how it can be used to extract the boundaries of a map in O(M) space and O(Ma(M)) time, where M is the number of quad tree blocks in the map, and a(·) is the (extremely slowly growing) inverse of Ackerman's function. The algorithm works for maps that contain multiple regions as well as holes. The algorithm makes use of active objects (in the form of regions) and an active border. It keeps track of the current position in the active border so that at each step no search is necessary. The algorithm represents a considerable improvement over a previous approach whose worst-case execution time is proportional to the product of the number of blocks in the map and the resolution of the quad tree (i.e., the maximum level of decomposition). The algorithm works for many different quadtree representations including those where the quadtree is stored in external storage
- …