Search CORE

156 research outputs found

A Fast and Scalable Graph Coloring Algorithm for Multi-core and Many-core Architectures

Author: D Chakrabarti
H Cougny De
MM Strout
MR Garey
MT Jones
ÜV Çatalyürek
Publication venue
Publication date: 18/05/2015
Field of study

Irregular computations on unstructured data are an important class of problems for parallel programming. Graph coloring is often an important preprocessing step, e.g. as a way to perform dependency analysis for safe parallel execution. The total run time of a coloring algorithm adds to the overall parallel overhead of the application whereas the number of colors used determines the amount of exposed parallelism. A fast and scalable coloring algorithm using as few colors as possible is vital for the overall parallel performance and scalability of many irregular applications that depend upon runtime dependency analysis. Catalyurek et al. have proposed a graph coloring algorithm which relies on speculative, local assignment of colors. In this paper we present an improved version which runs even more optimistically with less thread synchronization and reduced number of conflicts compared to Catalyurek et al.'s algorithm. We show that the new technique scales better on multi-core and many-core systems and performs up to 1.5x faster than its predecessor on graphs with high-degree vertices, while keeping the number of colors at the same near-optimal levels.Comment: To appear in the proceedings of Euro Par 201

arXiv.org e-Print Archive

Crossref

Spiral - Imperial College Digital Repository

SciDAC Institute: Combinatorial Scientific Computing and Petascale Simulations (CSCAPES). Final Report

Author
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date
Field of study

Crossref

Fine-grained Locality-aware Parallel Scheme for Anisotropic Mesh Adaptation

Author: Ledoux Franck
Pommereau Franck
Rakotoarivelo Hoby
Publication venue: The Author(s). Published by Elsevier Ltd.
Publication date: 27/09/2016
Field of study

AbstractIn this paper, we provide a fine-grained parallel scheme for anisotropic mesh adaptation on NUMA11Non-Uniform Memory Access architectures.Data dependencies are expressed by a graph for each kernel, and concurrency is extracted through fine-grained graph coloring. Tasks are structured into bulk-synchronous steps to avoid data races and to aggregate shared-data accesses.To ensure performance prediction, time cost and load imbalance are theoretically characterized.The devised scheme was evaluated on a 4 NUMA node (2-socket) machine, and a mean efficiency of 70% was reached on 32 cores for 3 kernels out of 4. The impact of irregular degree distribution and data layout on scalability is highlighted

HAL Evry

Elsevier - Publisher Connector

HAL-CEA

Doctor of Philosophy

Author: Fu Zhisong
Publication venue: University of Utah
Publication date: 01/12/2013
Field of study

dissertationPartial differential equations (PDEs) are widely used in science and engineering to model phenomena such as sound, heat, and electrostatics. In many practical science and engineering applications, the solutions of PDEs require the tessellation of computational domains into unstructured meshes and entail computationally expensive and time-consuming processes. Therefore, efficient and fast PDE solving techniques on unstructured meshes are important in these applications. Relative to CPUs, the faster growth curves in the speed and greater power efficiency of the SIMD streaming processors, such as GPUs, have gained them an increasingly important role in the high-performance computing area. Combining suitable parallel algorithms and these streaming processors, we can develop very efficient numerical solvers of PDEs. The contributions of this dissertation are twofold: proposal of two general strategies to design efficient PDE solvers on GPUs and the specific applications of these strategies to solve different types of PDEs. Specifically, this dissertation consists of four parts. First, we describe the general strategies, the domain decomposition strategy and the hybrid gathering strategy. Next, we introduce a parallel algorithm for solving the eikonal equation on fully unstructured meshes efficiently. Third, we present the algorithms and data structures necessary to move the entire FEM pipeline to the GPU. Fourth, we propose a parallel algorithm for solving the levelset equation on fully unstructured 2D or 3D meshes or manifolds. This algorithm combines a narrowband scheme with domain decomposition for efficient levelset equation solving

The University of Utah: J. Willard Marriott Digital Library

Recommended from our members

Preparing sparse solvers for exascale computing.

Author: Anzt Hartwig
Boman Erik
Curfman McInnes Lois
Falgout Rob
Ghysels Pieter
Heroux Michael
Li Xiaoye
Meier Yang Ulrike
Rajamanickam Sivasankaran
Rupp Karl
Smith Barry
Tran Mills Richard
Yamazaki Ichitaro
Publication venue: eScholarship, University of California
Publication date: 01/03/2020
Field of study

Sparse solvers provide essential functionality for a wide variety of scientific applications. Highly parallel sparse solvers are essential for continuing advances in high-fidelity, multi-physics and multi-scale simulations, especially as we target exascale platforms. This paper describes the challenges, strategies and progress of the US Department of Energy Exascale Computing project towards providing sparse solvers for exascale computing platforms. We address the demands of systems with thousands of high-performance node devices where exposing concurrency, hiding latency and creating alternative algorithms become essential. The efforts described here are works in progress, highlighting current success and upcoming challenges. This article is part of a discussion meeting issue 'Numerical algorithms for high-performance computational science'

eScholarship - University of California

A Novel Multithreaded Algorithm for Extracting Maximal Chordal Subgraphs

Author: Ali Hesham
Bhowmick Sanjukta
Cooper Kathryn Dempsey
Feo John
Halappanavar Mahantesh
Publication venue: DigitalCommons@UNO
Publication date: 01/01/2012
Field of study

Chordal graphs are triangulated graphs where any cycle larger than three is bisected by a chord. Many combinatorial optimization problems such as computing the size of the maximum clique and the chromatic number are NP-hard on general graphs but have polynomial time solutions on chordal graphs. In this paper, we present a novel multithreaded algorithm to extract a maximal chordal sub graph from a general graph. We develop an iterative approach where each thread can asynchronously update a subset of edges that are dynamically assigned to it per iteration and implement our algorithm on two different multithreaded architectures - Cray XMT, a massively multithreaded platform, and AMD Magny-Cours, a shared memory multicore platform. In addition to the proof of correctness, we present the performance of our algorithm using a test set of synthetical graphs with up to half-a-billion edges and real world networks from gene correlation studies and demonstrate that our algorithm achieves high scalability for all inputs on both types of architectures

The University of Nebraska, Omaha