6 research outputs found

    The Reverse Cuthill-McKee Algorithm in Distributed-Memory

    Full text link
    Ordering vertices of a graph is key to minimize fill-in and data structure size in sparse direct solvers, maximize locality in iterative solvers, and improve performance in graph algorithms. Except for naturally parallelizable ordering methods such as nested dissection, many important ordering methods have not been efficiently mapped to distributed-memory architectures. In this paper, we present the first-ever distributed-memory implementation of the reverse Cuthill-McKee (RCM) algorithm for reducing the profile of a sparse matrix. Our parallelization uses a two-dimensional sparse matrix decomposition. We achieve high performance by decomposing the problem into a small number of primitives and utilizing optimized implementations of these primitives. Our implementation shows strong scaling up to 1024 cores for smaller matrices and up to 4096 cores for larger matrices

    Bandwidth and Wavefront Reduction for Static Variable Ordering in Symbolic Model Checking

    Get PDF
    We demonstrate the applicability of bandwidth and wavefront reduction algorithms to static variable ordering. In symbolic model checking event locality plays a major role in time and memory usage. For example, in Petri nets event locality can be captured by dependency matrices, where nonzero entries indicate whether a transition modifies a place. The quality of event locality has been expressed as a metric called (weighted) event span. The bandwidth of a matrix is a metric indicating the distance of nonzero elements to the diagonal. Wavefront is a metric indicating the degree of nonzeros on one end of the diagonal of the matrix. Bandwidth and wavefront are well studied metrics used in sparse matrix solvers. \ud In this work we prove that span is limited by twice the bandwidth of a matrix. This observation makes bandwidth reduction algorithms useful for obtaining good variable orders. One major issue we address is that the reduction algorithms can only be applied on symmetric matrices, while the dependency matrices are asymmetric. We show that the Sloan algorithm executed on the total graph of the adjacency graph gives the best variable orders. Practically, we demonstrate that our work allows to call standard sparse matrix operations in Boost and ViennaCL, computing very good static variable orders in milliseconds. Future work is promising, because a whole new spectrum of more off-the-shelf algorithms, including metaheuristic ones, become available for variable ordering

    An Object-Oriented Algorithmic Laboratory for Ordering Sparse Matrices

    Get PDF
    We focus on two known NP-hard problems that have applications in sparse matrix computations: the envelope/wavefront reduction problem and the fill reduction problem. Envelope/wavefront reducing orderings have a wide range of applications including profile and frontal solvers, incomplete factorization preconditioning, graph reordering for cache performance, gene sequencing, and spatial databases. Fill reducing orderings are generally limited to—but an inextricable part of—sparse matrix factorization. Our major contribution to this field is the design of new and improved heuristics for these NP-hard problems and their efficient implementation in a robust, cross-platform, object-oriented software package. In this body of research, we (1) examine current ordering algorithms, analyze their asymptotic complexity, and characterize their behavior in model problems, (2) introduce new and improved algorithms that address deficiencies found in previous heuristics, (3) implement an object-oriented library of these algorithms in a robust, modular fashion without significant loss of efficiency, and (4) extend our algorithms and software to address both generalized and constrained problems. We stress that the major contribution is the algorithms and the implementation; the whole being greater than the sum of its parts. The initial motivation for implementing our algorithms in object-oriented software was to manage the inherent complexity. During our research came the realization that the object-oriented implementation enabled new possibilities for augmented algorithms that would not have been as natural to generalize from a procedural implementation. Some extensions are constructed from a family of related algorithmic components, thereby creating a poly-algorithm that can adapt its strategy to the properties of the specific problem instance dynamically. Other algorithms are tailored for special constraints by aggregating algorithmic components and having them collaboratively generate the global ordering. Our software laboratory, “Spindle,” implements state-of-the-art ordering algorithms for sparse matrices and graphs. We have used it to examine and augment the behavior of existing algorithms and test new ones. Its 40,000+ lines of C++ code includes a base library test drivers, sample applications, and interfaces to C, C++, Matlab, and PETSc. Spindle is freely available and can be built on a variety of UNIX platforms as well as WindowsNT
    corecore