6 research outputs found
The Reverse Cuthill-McKee Algorithm in Distributed-Memory
Ordering vertices of a graph is key to minimize fill-in and data structure
size in sparse direct solvers, maximize locality in iterative solvers, and
improve performance in graph algorithms. Except for naturally parallelizable
ordering methods such as nested dissection, many important ordering methods
have not been efficiently mapped to distributed-memory architectures. In this
paper, we present the first-ever distributed-memory implementation of the
reverse Cuthill-McKee (RCM) algorithm for reducing the profile of a sparse
matrix. Our parallelization uses a two-dimensional sparse matrix decomposition.
We achieve high performance by decomposing the problem into a small number of
primitives and utilizing optimized implementations of these primitives. Our
implementation shows strong scaling up to 1024 cores for smaller matrices and
up to 4096 cores for larger matrices
Bandwidth and Wavefront Reduction for Static Variable Ordering in Symbolic Model Checking
We demonstrate the applicability of bandwidth and wavefront reduction algorithms to static variable ordering. In symbolic model checking event locality plays a major role in time and memory usage. For example, in Petri nets event locality can be captured by dependency matrices, where nonzero entries indicate whether a transition modifies a place. The quality of event locality has been expressed as a metric called (weighted) event span. The bandwidth of a matrix is a metric indicating the distance of nonzero elements to the diagonal. Wavefront is a metric indicating the degree of nonzeros on one end of the diagonal of the matrix. Bandwidth and wavefront are well studied metrics used in sparse matrix solvers. \ud
In this work we prove that span is limited by twice the bandwidth of a matrix. This observation makes bandwidth reduction algorithms useful for obtaining good variable orders. One major issue we address is that the reduction algorithms can only be applied on symmetric matrices, while the dependency matrices are asymmetric. We show that the Sloan algorithm executed on the total graph of the adjacency graph gives the best variable orders. Practically, we demonstrate that our work allows to call standard sparse matrix operations in Boost and ViennaCL, computing very good static variable orders in milliseconds. Future work is promising, because a whole new spectrum of more off-the-shelf algorithms, including metaheuristic ones, become available for variable ordering
An Object-Oriented Algorithmic Laboratory for Ordering Sparse Matrices
We focus on two known NP-hard problems that have applications in sparse matrix computations: the envelope/wavefront reduction problem and the fill reduction problem. Envelope/wavefront reducing orderings have a wide range of applications including profile and frontal solvers, incomplete factorization preconditioning, graph reordering for cache performance, gene sequencing, and spatial databases. Fill reducing orderings are generally limited to—but an inextricable part of—sparse matrix factorization.
Our major contribution to this field is the design of new and improved heuristics for these NP-hard problems and their efficient implementation in a robust, cross-platform, object-oriented software package. In this body of research, we (1) examine current ordering algorithms, analyze their asymptotic complexity, and characterize their behavior in model problems, (2) introduce new and improved algorithms that address deficiencies found in previous heuristics, (3) implement an object-oriented library of these algorithms in a robust, modular fashion without significant loss of efficiency, and (4) extend our algorithms and software to address both generalized and constrained problems. We stress that the major contribution is the algorithms and the implementation; the whole being greater than the sum of its parts.
The initial motivation for implementing our algorithms in object-oriented software was to manage the inherent complexity. During our research came the realization that the object-oriented implementation enabled new possibilities for augmented algorithms that would not have been as natural to generalize from a procedural implementation. Some extensions are constructed from a family of related algorithmic components, thereby creating a poly-algorithm that can adapt its strategy to the properties of the specific problem instance dynamically. Other algorithms are tailored for special constraints by aggregating algorithmic components and having them collaboratively generate the global ordering.
Our software laboratory, “Spindle,” implements state-of-the-art ordering algorithms for sparse matrices and graphs. We have used it to examine and augment the behavior of existing algorithms and test new ones. Its 40,000+ lines of C++ code includes a base library test drivers, sample applications, and interfaces to C, C++, Matlab, and PETSc. Spindle is freely available and can be built on a variety of UNIX platforms as well as WindowsNT