Search CORE

3,077 research outputs found

Recent Advances in Graph Partitioning

Author: A Buluç
A Felner
A George
A Lisser
A Pothen
A Trifunović
AB Kahng
AE Feldmann
AH Land
AJ Soper
B Brandfass
B Hendrickson
B Hendrickson
B Hendrickson
B Junker
B Monien
B Peng
BW Kernighan
C Aykanat
C Chevalier
C Chevalier
C Farhat
C Lanczos
C Walshaw
C Walshaw
C Walshaw
C Walshaw
C Walshaw
C Walshaw
CE Bichot
CE Ferreira
D Delling
D Delling
D Delling
D Drake
D Luxen
D Ron
D Ron
D Wagner
DA Papa
DE Drake Vinkemeier
E Jeannot
E Rolland
F Comellas
F Glover
F Glover
F Pellegrini
F Pellegrini
F Pellegrini
F Schulz
FT Leighton
G Even
G Karypis
G Karypis
G Karypis
G Zumbusch
H Li
H Meyerhenke
H Meyerhenke
H Meyerhenke
H Meyerhenke
H Meyerhenke
HD Simon
HD Simon
I Moulitsas
I Safro
I Safro
J Chen
J Cong
J Fietz
J Hromkovič
J Hungershöfer
J Maue
J Maue
J Shalf
JR Gilbert
K Andreev
K Lang
K Schloegel
K Schloegel
K Schloegel
KS Camilus
L Brunetta
L Grady
L Lovász
LA Sanchis
LR Ford
M Armbruster
M Bader
M Birn
M Fiedler
M Jerrum
M Newman
M Sellmann
M Zhou
MR Garey
N Sensen
O Goldschmidt
P Chardaire
P Galinier
P Korosec
P Sanders
P Sanders
R Diekmann
R Diekmann
R Glantz
R Preis
RD Williams
S Arora
S Huang
S Lafon
S Lloyd
S Pettie
SE Karisch
SY Chan
T Bui
T Kieritz
U Benlic
U Benlic
U Feige
V Osipov
WE Donath
WE Donath
WW Hager
WW Hager
X Sui
Y Low
YM Kim
Ü Çatalyürek
Publication venue
Publication date: 03/02/2015
Field of study

We survey recent trends in practical algorithms for balanced graph partitioning together with applications and future research directions

arXiv.org e-Print Archive

Crossref

eScholarship - University of California

Analyzing and enhancing OSKI for sparse matrix-vector multiplication

Author: Agarwal R. C.
Al-Furaih I.
Berry M. W.
Cevdet Aykanat
Das R.
Davis T. A.
Enver Kayaaslan
Haque S. A.
Kadir Akbudak
Mirchandaney R.
Pinar A.
Strout M. M.
Temam O.
Çatalyürek Ü. V.
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 13/03/2012
Field of study

Sparse matrix-vector multiplication (SpMxV) is a kernel operation widely used in iterative linear solvers. The same sparse matrix is multiplied by a dense vector repeatedly in these solvers. Matrices with irregular sparsity patterns make it difficult to utilize cache locality effectively in SpMxV computations. In this work, we investigate single- and multiple-SpMxV frameworks for exploiting cache locality in SpMxV computations. For the single-SpMxV framework, we propose two cache-size-aware top-down row/column-reordering methods based on 1D and 2D sparse matrix partitioning by utilizing the column-net and enhancing the row-column-net hypergraph models of sparse matrices. The multiple-SpMxV framework depends on splitting a given matrix into a sum of multiple nonzero-disjoint matrices so that the SpMxV operation is performed as a sequence of multiple input- and output-dependent SpMxV operations. For an effective matrix splitting required in this framework, we propose a cache-size-aware top-down approach based on 2D sparse matrix partitioning by utilizing the row-column-net hypergraph model. The primary objective in all of the three methods is to maximize the exploitation of temporal locality. We evaluate the validity of our models and methods on a wide range of sparse matrices by performing actual runs through using OSKI. Experimental results show that proposed methods and models outperform state-of-the-art schemes.Comment: arXiv admin note: substantial text overlap with arXiv:1202.385

arXiv.org e-Print Archive

CiteSeerX

Crossref

Bilkent University Institutional Repository

Progress Toward Affordable High Fidelity Combustion Simulations Using Filtered Density Functions for Hypersonic Flows in Complex Geometries

Author: Drozda Tomasz G.
Pisciuneri Patrick H.
Quinlan Jesse R.
Yilmaz S. Levent
Publication venue
Publication date: 30/07/2012
Field of study

Significant progress has been made in the development of subgrid scale (SGS) closures based on a filtered density function (FDF) for large eddy simulations (LES) of turbulent reacting flows. The FDF is the counterpart of the probability density function (PDF) method, which has proven effective in Reynolds averaged simulations (RAS). However, while systematic progress is being made advancing the FDF models for relatively simple flows and lab-scale flames, the application of these methods in complex geometries and high speed, wall-bounded flows with shocks remains a challenge. The key difficulties are the significant computational cost associated with solving the FDF transport equation and numerically stiff finite rate chemistry. For LES/FDF methods to make a more significant impact in practical applications a pragmatic approach must be taken that significantly reduces the computational cost while maintaining high modeling fidelity. An example of one such ongoing effort is at the NASA Langley Research Center, where the first generation FDF models, namely the scalar filtered mass density function (SFMDF) are being implemented into VULCAN, a production-quality RAS and LES solver widely used for design of high speed propulsion flowpaths. This effort leverages internal and external collaborations to reduce the overall computational cost of high fidelity simulations in VULCAN by: implementing high order methods that allow reduction in the total number of computational cells without loss in accuracy; implementing first generation of high fidelity scalar PDF/FDF models applicable to high-speed compressible flows; coupling RAS/PDF and LES/FDF into a hybrid framework to efficiently and accurately model the effects of combustion in the vicinity of the walls; developing efficient Lagrangian particle tracking algorithms to support robust solutions of the FDF equations for high speed flows; and utilizing finite rate chemistry parametrization, such as flamelet models, to reduce the number of transported reactive species and remove numerical stiffness. This paper briefly introduces the SFMDF model (highlighting key benefits and challenges), and discusses particle tracking for flows with shocks, the hybrid coupled RAS/PDF and LES/FDF model, flamelet generated manifolds (FGM) model, and the Irregularly Portioned Lagrangian Monte Carlo Finite Difference (IPLMCFD) methodology for scalable simulation of high-speed reacting compressible flows

NASA Technical Reports Server

Efficient data restructuring and aggregation for I/O acceleration in PIDX

Author: Kumar Sidharth
Pascucci Valerio
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2012
Field of study

pre-printHierarchical, multiresolution data representations enable interactive analysis and visualization of large-scale simulations. One promising application of these techniques is to store high performance computing simulation output in a hierarchical Z (HZ) ordering that translates data from a Cartesian coordinate scheme to a one-dimensional array ordered by locality at different resolution levels. However, when the dimensions of the simulation data are not an even power of 2, parallel HZ ordering produces sparse memory and network access patterns that inhibit I/O performance. This work presents a new technique for parallel HZ ordering of simulation datasets that restructures simulation data into large (power of 2) blocks to facilitate efficient I/O aggregation. We perform both weak and strong scaling experiments using the S3D combustion application on both Cray-XE6 (65,536 cores) and IBM Blue Gene/P (131,072 cores) platforms. We demonstrate that data can be written in hierarchical, multiresolution format with performance competitive to that of native data-ordering methods

The University of Utah: J. Willard Marriott Digital Library

Gunrock: GPU Graph Analytics

Author: Davidson Andrew
Liu Weitang
Osama Muhammad
Owens John D.
Pan Yuechao
Riffel Andy T.
Wang Leyuan
Wang Yangzihao
Wu Yuduo
Yang Carl
Yuan Chenshan
Publication venue
Publication date: 04/01/2017
Field of study

For large-scale graph analytics on the GPU, the irregularity of data access and control flow, and the complexity of programming GPUs, have presented two significant challenges to developing a programmable high-performance graph library. "Gunrock", our graph-processing system designed specifically for the GPU, uses a high-level, bulk-synchronous, data-centric abstraction focused on operations on a vertex or edge frontier. Gunrock achieves a balance between performance and expressiveness by coupling high performance GPU computing primitives and optimization strategies with a high-level programming model that allows programmers to quickly develop new graph primitives with small code size and minimal GPU programming knowledge. We characterize the performance of various optimization strategies and evaluate Gunrock's overall performance on different GPU architectures on a wide range of graph primitives that span from traversal-based algorithms and ranking algorithms, to triangle counting and bipartite-graph-based algorithms. The results show that on a single GPU, Gunrock has on average at least an order of magnitude speedup over Boost and PowerGraph, comparable performance to the fastest GPU hardwired primitives and CPU shared-memory graph libraries such as Ligra and Galois, and better performance than any other GPU high-level graph library.Comment: 52 pages, invited paper to ACM Transactions on Parallel Computing (TOPC), an extended version of PPoPP'16 paper "Gunrock: A High-Performance Graph Processing Library on the GPU

arXiv.org e-Print Archive

eScholarship - University of California

FigShare

A novel framework for spatio-temporal prediction of environmental data using deep learning

Author: Amato Federico
Guignard Fabian
Kanevski Mikhail
Robert Sylvain
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/12/2020
Field of study

As the role played by statistical and computational sciences in climate and environmental modelling and prediction becomes more important, Machine Learning researchers are becoming more aware of the relevance of their work to help tackle the climate crisis. Indeed, being universal nonlinear function approximation tools, Machine Learning algorithms are efficient in analysing and modelling spatially and temporally variable environmental data. While Deep Learning models have proved to be able to capture spatial, temporal, and spatio-temporal dependencies through their automatic feature representation learning, the problem of the interpolation of continuous spatio-temporal fields measured on a set of irregular points in space is still under-investigated. To fill this gap, we introduce here a framework for spatio-temporal prediction of climate and environmental data using deep learning. Specifically, we show how spatio-temporal processes can be decomposed in terms of a sum of products of temporally referenced basis functions, and of stochastic spatial coefficients which can be spatially modelled and mapped on a regular grid, allowing the reconstruction of the complete spatio-temporal signal. Applications on two case studies based on simulated and real-world data will show the effectiveness of the proposed framework in modelling coherent spatio-temporal fields.Comment: 11 pages, 8 figure

arXiv.org e-Print Archive

Serveur académique lausannois

RIACS

Author: Moore Robert C.
Publication venue
Publication date
Field of study

The Research Institute for Advanced Computer Science (RIACS) was established by the Universities Space Research Association (USRA) at the NASA Ames Research Center (ARC) on June 6, 1983. RIACS is privately operated by USRA, a consortium of universities that serves as a bridge between NASA and the academic community. Under a five-year co-operative agreement with NASA, research at RIACS is focused on areas that are strategically enabling to the Ames Research Center's role as NASA's Center of Excellence for Information Technology. The primary mission of RIACS is charted to carry out research and development in computer science. This work is devoted in the main to tasks that are strategically enabling with respect to NASA's bold mission in space exploration and aeronautics. There are three foci for this work: (1) Automated Reasoning. (2) Human-Centered Computing. and (3) High Performance Computing and Networking. RIACS has the additional goal of broadening the base of researcher in these areas of importance to the nation's space and aeronautics enterprises. Through its visiting scientist program, RIACS facilitates the participation of university-based researchers, including both faculty and students, in the research activities of NASA and RIACS. RIACS researchers work in close collaboration with NASA computer scientists on projects such as the Remote Agent Experiment on Deep Space One mission, and Super-Resolution Surface Modeling

NASA Technical Reports Server