3,077 research outputs found

    Analyzing and enhancing OSKI for sparse matrix-vector multiplication

    Get PDF
    Sparse matrix-vector multiplication (SpMxV) is a kernel operation widely used in iterative linear solvers. The same sparse matrix is multiplied by a dense vector repeatedly in these solvers. Matrices with irregular sparsity patterns make it difficult to utilize cache locality effectively in SpMxV computations. In this work, we investigate single- and multiple-SpMxV frameworks for exploiting cache locality in SpMxV computations. For the single-SpMxV framework, we propose two cache-size-aware top-down row/column-reordering methods based on 1D and 2D sparse matrix partitioning by utilizing the column-net and enhancing the row-column-net hypergraph models of sparse matrices. The multiple-SpMxV framework depends on splitting a given matrix into a sum of multiple nonzero-disjoint matrices so that the SpMxV operation is performed as a sequence of multiple input- and output-dependent SpMxV operations. For an effective matrix splitting required in this framework, we propose a cache-size-aware top-down approach based on 2D sparse matrix partitioning by utilizing the row-column-net hypergraph model. The primary objective in all of the three methods is to maximize the exploitation of temporal locality. We evaluate the validity of our models and methods on a wide range of sparse matrices by performing actual runs through using OSKI. Experimental results show that proposed methods and models outperform state-of-the-art schemes.Comment: arXiv admin note: substantial text overlap with arXiv:1202.385

    Progress Toward Affordable High Fidelity Combustion Simulations Using Filtered Density Functions for Hypersonic Flows in Complex Geometries

    Get PDF
    Significant progress has been made in the development of subgrid scale (SGS) closures based on a filtered density function (FDF) for large eddy simulations (LES) of turbulent reacting flows. The FDF is the counterpart of the probability density function (PDF) method, which has proven effective in Reynolds averaged simulations (RAS). However, while systematic progress is being made advancing the FDF models for relatively simple flows and lab-scale flames, the application of these methods in complex geometries and high speed, wall-bounded flows with shocks remains a challenge. The key difficulties are the significant computational cost associated with solving the FDF transport equation and numerically stiff finite rate chemistry. For LES/FDF methods to make a more significant impact in practical applications a pragmatic approach must be taken that significantly reduces the computational cost while maintaining high modeling fidelity. An example of one such ongoing effort is at the NASA Langley Research Center, where the first generation FDF models, namely the scalar filtered mass density function (SFMDF) are being implemented into VULCAN, a production-quality RAS and LES solver widely used for design of high speed propulsion flowpaths. This effort leverages internal and external collaborations to reduce the overall computational cost of high fidelity simulations in VULCAN by: implementing high order methods that allow reduction in the total number of computational cells without loss in accuracy; implementing first generation of high fidelity scalar PDF/FDF models applicable to high-speed compressible flows; coupling RAS/PDF and LES/FDF into a hybrid framework to efficiently and accurately model the effects of combustion in the vicinity of the walls; developing efficient Lagrangian particle tracking algorithms to support robust solutions of the FDF equations for high speed flows; and utilizing finite rate chemistry parametrization, such as flamelet models, to reduce the number of transported reactive species and remove numerical stiffness. This paper briefly introduces the SFMDF model (highlighting key benefits and challenges), and discusses particle tracking for flows with shocks, the hybrid coupled RAS/PDF and LES/FDF model, flamelet generated manifolds (FGM) model, and the Irregularly Portioned Lagrangian Monte Carlo Finite Difference (IPLMCFD) methodology for scalable simulation of high-speed reacting compressible flows

    Efficient data restructuring and aggregation for I/O acceleration in PIDX

    Get PDF
    pre-printHierarchical, multiresolution data representations enable interactive analysis and visualization of large-scale simulations. One promising application of these techniques is to store high performance computing simulation output in a hierarchical Z (HZ) ordering that translates data from a Cartesian coordinate scheme to a one-dimensional array ordered by locality at different resolution levels. However, when the dimensions of the simulation data are not an even power of 2, parallel HZ ordering produces sparse memory and network access patterns that inhibit I/O performance. This work presents a new technique for parallel HZ ordering of simulation datasets that restructures simulation data into large (power of 2) blocks to facilitate efficient I/O aggregation. We perform both weak and strong scaling experiments using the S3D combustion application on both Cray-XE6 (65,536 cores) and IBM Blue Gene/P (131,072 cores) platforms. We demonstrate that data can be written in hierarchical, multiresolution format with performance competitive to that of native data-ordering methods

    Gunrock: GPU Graph Analytics

    Full text link
    For large-scale graph analytics on the GPU, the irregularity of data access and control flow, and the complexity of programming GPUs, have presented two significant challenges to developing a programmable high-performance graph library. "Gunrock", our graph-processing system designed specifically for the GPU, uses a high-level, bulk-synchronous, data-centric abstraction focused on operations on a vertex or edge frontier. Gunrock achieves a balance between performance and expressiveness by coupling high performance GPU computing primitives and optimization strategies with a high-level programming model that allows programmers to quickly develop new graph primitives with small code size and minimal GPU programming knowledge. We characterize the performance of various optimization strategies and evaluate Gunrock's overall performance on different GPU architectures on a wide range of graph primitives that span from traversal-based algorithms and ranking algorithms, to triangle counting and bipartite-graph-based algorithms. The results show that on a single GPU, Gunrock has on average at least an order of magnitude speedup over Boost and PowerGraph, comparable performance to the fastest GPU hardwired primitives and CPU shared-memory graph libraries such as Ligra and Galois, and better performance than any other GPU high-level graph library.Comment: 52 pages, invited paper to ACM Transactions on Parallel Computing (TOPC), an extended version of PPoPP'16 paper "Gunrock: A High-Performance Graph Processing Library on the GPU

    A novel framework for spatio-temporal prediction of environmental data using deep learning

    Get PDF
    As the role played by statistical and computational sciences in climate and environmental modelling and prediction becomes more important, Machine Learning researchers are becoming more aware of the relevance of their work to help tackle the climate crisis. Indeed, being universal nonlinear function approximation tools, Machine Learning algorithms are efficient in analysing and modelling spatially and temporally variable environmental data. While Deep Learning models have proved to be able to capture spatial, temporal, and spatio-temporal dependencies through their automatic feature representation learning, the problem of the interpolation of continuous spatio-temporal fields measured on a set of irregular points in space is still under-investigated. To fill this gap, we introduce here a framework for spatio-temporal prediction of climate and environmental data using deep learning. Specifically, we show how spatio-temporal processes can be decomposed in terms of a sum of products of temporally referenced basis functions, and of stochastic spatial coefficients which can be spatially modelled and mapped on a regular grid, allowing the reconstruction of the complete spatio-temporal signal. Applications on two case studies based on simulated and real-world data will show the effectiveness of the proposed framework in modelling coherent spatio-temporal fields.Comment: 11 pages, 8 figure

    RIACS

    Get PDF
    The Research Institute for Advanced Computer Science (RIACS) was established by the Universities Space Research Association (USRA) at the NASA Ames Research Center (ARC) on June 6, 1983. RIACS is privately operated by USRA, a consortium of universities that serves as a bridge between NASA and the academic community. Under a five-year co-operative agreement with NASA, research at RIACS is focused on areas that are strategically enabling to the Ames Research Center's role as NASA's Center of Excellence for Information Technology. The primary mission of RIACS is charted to carry out research and development in computer science. This work is devoted in the main to tasks that are strategically enabling with respect to NASA's bold mission in space exploration and aeronautics. There are three foci for this work: (1) Automated Reasoning. (2) Human-Centered Computing. and (3) High Performance Computing and Networking. RIACS has the additional goal of broadening the base of researcher in these areas of importance to the nation's space and aeronautics enterprises. Through its visiting scientist program, RIACS facilitates the participation of university-based researchers, including both faculty and students, in the research activities of NASA and RIACS. RIACS researchers work in close collaboration with NASA computer scientists on projects such as the Remote Agent Experiment on Deep Space One mission, and Super-Resolution Surface Modeling
    corecore