Search CORE

2,569 research outputs found

Locality-aware parallel block-sparse matrix-matrix multiplication using the Chunks and Tasks programming model

Author: Rubensson Emanuel H.
Rudberg Elias
Publication venue: 'Elsevier BV'
Publication date: 01/01/2016
Field of study

We present a method for parallel block-sparse matrix-matrix multiplication on distributed memory clusters. By using a quadtree matrix representation, data locality is exploited without prior information about the matrix sparsity pattern. A distributed quadtree matrix representation is straightforward to implement due to our recent development of the Chunks and Tasks programming model [Parallel Comput. 40, 328 (2014)]. The quadtree representation combined with the Chunks and Tasks model leads to favorable weak and strong scaling of the communication cost with the number of processes, as shown both theoretically and in numerical experiments. Matrices are represented by sparse quadtrees of chunk objects. The leaves in the hierarchy are block-sparse submatrices. Sparsity is dynamically detected by the matrix library and may occur at any level in the hierarchy and/or within the submatrix leaves. In case graphics processing units (GPUs) are available, both CPUs and GPUs are used for leaf-level multiplication work, thus making use of the full computing capacity of each node. The performance is evaluated for matrices with different sparsity structures, including examples from electronic structure calculations. Compared to methods that do not exploit data locality, our locality-aware approach reduces communication significantly, achieving essentially constant communication per node in weak scaling tests.Comment: 35 pages, 14 figure

arXiv.org e-Print Archive

Crossref

Publikationer från Uppsala Universitet

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Distributed Finite Element Analysis Using a Transputer Network

Author: Baehmann Peggy
Danial Albert
Favenesi James
Reynolds Brian
Shephard Mark
Tombrello Joseph
Turrentine Ronald
Watson James
Yang Dabby
Publication venue
Publication date
Field of study

The principal objective of this research effort was to demonstrate the extraordinarily cost effective acceleration of finite element structural analysis problems using a transputer-based parallel processing network. This objective was accomplished in the form of a commercially viable parallel processing workstation. The workstation is a desktop size, low-maintenance computing unit capable of supercomputer performance yet costs two orders of magnitude less. To achieve the principal research objective, a transputer based structural analysis workstation termed XPFEM was implemented with linear static structural analysis capabilities resembling commercially available NASTRAN. Finite element model files, generated using the on-line preprocessing module or external preprocessing packages, are downloaded to a network of 32 transputers for accelerated solution. The system currently executes at about one third Cray X-MP24 speed but additional acceleration appears likely. For the NASA selected demonstration problem of a Space Shuttle main engine turbine blade model with about 1500 nodes and 4500 independent degrees of freedom, the Cray X-MP24 required 23.9 seconds to obtain a solution while the transputer network, operated from an IBM PC-AT compatible host computer, required 71.7 seconds. Consequently, the

80,000 transputer network demonstrated a cost-performance ratio about 60 times better than the

15,000,000 Cray X-MP24 system

NASA Technical Reports Server

Parallelization for image processing algorithms based chain and mid-crack codes

Author: Wong Wai-Tak
Publication venue: Digital Commons @ NJIT
Publication date: 31/01/1999
Field of study

Freeman chain code is a widely-used description for a contour image. Another mid-crack code algorithm was proposed as a more precise method for image representation. We have developed a coding algorithm which is suitable to generate either chain code description or mid-crack code description by switching between two different tables. Since there is a strong urge to use parallel processing in image related problems, a parallel coding algorithm is implemented. This algorithm is developed on a pyramid architecture and a N cube architecture. Using link-list data structure and neighbor identification, the algorithm gains efficiency because no sorting or neighborhood pairing is needed. In this dissertation, the local symmetry deficiency (LSD) computation to calculate the local k-symmetry is embedded in the coding algorithm. Therefore, we can finish the code extraction and the LSD computation in one pass. The embedding process is not limited to the k-symmetry algorithm and has the capability of parallelism. An adaptive quadtree to chain code conversion algorithm is also presented. This algorithm is designed for constructing the chain codes of the resulting quadtree from the boolean operation of two quadtrees by using the chain codes of the original one. The algorithm has the parallelism and is ready to be implemented on a pyramid architecture. Our parallel processing approach can be viewed as a parallelization paradigm - a template to embed image processing algorithms in the chain coding process and to implement them in a parallel approach

Digital Commons @ New Jersey Institute of Technology (NJIT)

Constellation Queries over Big Data

Author: Khatibi Amir
Nobre João R.
Ogasawara Eduardo
Porto Fabio
Shasha Dennis
Valduriez Patrick
Publication venue
Publication date: 07/03/2017
Field of study

A geometrical pattern is a set of points with all pairwise distances (or, more generally, relative distances) specified. Finding matches to such patterns has applications to spatial data in seismic, astronomical, and transportation contexts. For example, a particularly interesting geometric pattern in astronomy is the Einstein cross, which is an astronomical phenomenon in which a single quasar is observed as four distinct sky objects (due to gravitational lensing) when captured by earth telescopes. Finding such crosses, as well as other geometric patterns, is a challenging problem as the potential number of sets of elements that compose shapes is exponentially large in the size of the dataset and the pattern. In this paper, we denote geometric patterns as constellation queries and propose algorithms to find them in large data applications. Our methods combine quadtrees, matrix multiplication, and unindexed join processing to discover sets of points that match a geometric pattern within some additive factor on the pairwise distances. Our distributed experiments show that the choice of composition algorithm (matrix multiplication or nested loops) depends on the freedom introduced in the query geometry through the distance additive factor. Three clearly identified blocks of threshold values guide the choice of the best composition algorithm. Finally, solving the problem for relative distances requires a novel continuous-to-discrete transformation. To the best of our knowledge this paper is the first to investigate constellation queries at scale

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL-Rennes 1

An advanced multi-element microcellular ray tracing model

Author: Ng KH
Nix AR
Tameh EK
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/09/2004
Field of study

Explore Bristol Research

Recommended from our members

Using topological sweep to extract the boundaries of regions in maps represented by region quadtrees

Author: Dillencourt Michael B.
Samet Hanan
Publication venue: eScholarship, University of California
Publication date: 01/01/1991
Field of study

A variant of the plane sweep paradigm known as topological sweep is adapted to solve geometric problems involving two-dimensional regions when the underlying representation is a region quadtree. The utility of this technique is illustrated by showing how it can be used to extract the boundaries of a map in O(M) space and O(Ma(M)) time, where M is the number of quad tree blocks in the map, and a(·) is the (extremely slowly growing) inverse of Ackerman's function. The algorithm works for maps that contain multiple regions as well as holes. The algorithm makes use of active objects (in the form of regions) and an active border. It keeps track of the current position in the active border so that at each step no search is necessary. The algorithm represents a considerable improvement over a previous approach whose worst-case execution time is proportional to the product of the number of blocks in the map and the resolution of the quad tree (i.e., the maximum level of decomposition). The algorithm works for many different quadtree representations including those where the quadtree is stored in external storage

eScholarship - University of California