Search CORE

466 research outputs found

Efficient Core Utilization in a Hybrid Parallel Delaunay Meshing Algorithm on Distributed-Memory Cluster

Author: Chernikov Andrey N.
Chrisochoides Nikos P.
Feng Daming
Publication venue: ODU Digital Commons
Publication date: 01/01/2017
Field of study

Most of the current supercomputer architectures consist of clusters of nodes that are used by many clients (users). A user wants his/her job submitted in the job queue to be scheduled promptly. However, the resource sharing and job scheduling policies that are used in the scheduling system to manage the jobs are usually beyond the control of users. Therefore, in order to reduce the waiting time of their jobs, it is becoming more and more crucial for the users to consider how to implement the algorithms that are suitable to the system scheduling policies and are able to effectively and efficiently utilize the available resources of the supercomputers. We proposed a hybrid MPI+Threads parallel mesh generation algorithm on distributed memory clusters with efficient core utilization. The algorithm takes the system scheduling information into account and is able to utilize the nodes that have been partially occupied by the jobs of other users. The experimental results demonstrated that the algorithm is effective and efficient to utilize available cores, which reduces the waiting time of the algorithm in the system job scheduling queue. It is up to 12.74 times faster than the traditional implementation without efficient core utilization when a mesh with 2.58 billion elements is created for 400 cores

Routing on the Channel Dependency Graph:: A New Approach to Deadlock-Free, Destination-Based, High-Performance Routing for Lossless Interconnection Networks

Author: Domke Jens
Publication venue
Publication date: 16/06/2017
Field of study

In the pursuit for ever-increasing compute power, and with Moore's law slowly coming to an end, high-performance computing started to scale-out to larger systems. Alongside the increasing system size, the interconnection network is growing to accommodate and connect tens of thousands of compute nodes. These networks have a large influence on total cost, application performance, energy consumption, and overall system efficiency of the supercomputer. Unfortunately, state-of-the-art routing algorithms, which define the packet paths through the network, do not utilize this important resource efficiently. Topology-aware routing algorithms become increasingly inapplicable, due to irregular topologies, which either are irregular by design, or most often a result of hardware failures. Exchanging faulty network components potentially requires whole system downtime further increasing the cost of the failure. This management approach becomes more and more impractical due to the scale of today's networks and the accompanying steady decrease of the mean time between failures. Alternative methods of operating and maintaining these high-performance interconnects, both in terms of hardware- and software-management, are necessary to mitigate negative effects experienced by scientific applications executed on the supercomputer. However, existing topology-agnostic routing algorithms either suffer from poor load balancing or are not bounded in the number of virtual channels needed to resolve deadlocks in the routing tables. Using the fail-in-place strategy, a well-established method for storage systems to repair only critical component failures, is a feasible solution for current and future HPC interconnects as well as other large-scale installations such as data center networks. Although, an appropriate combination of topology and routing algorithm is required to minimize the throughput degradation for the entire system. This thesis contributes a network simulation toolchain to facilitate the process of finding a suitable combination, either during system design or while it is in operation. On top of this foundation, a key contribution is a novel scheduling-aware routing, which reduces fault-induced throughput degradation while improving overall network utilization. The scheduling-aware routing performs frequent property preserving routing updates to optimize the path balancing for simultaneously running batch jobs. The increased deployment of lossless interconnection networks, in conjunction with fail-in-place modes of operation and topology-agnostic, scheduling-aware routing algorithms, necessitates new solutions to solve the routing-deadlock problem. Therefore, this thesis further advances the state-of-the-art by introducing a novel concept of routing on the channel dependency graph, which allows the design of an universally applicable destination-based routing capable of optimizing the path balancing without exceeding a given number of virtual channels, which are a common hardware limitation. This disruptive innovation enables implicit deadlock-avoidance during path calculation, instead of solving both problems separately as all previous solutions

Technische Universität Dresden: Qucosa

Topology Agnostic Methods for Routing, Reconfiguration and Virtualization of Interconnection Networks

Author: Solheim Åshild Grønstad
Publication venue
Publication date: 01/01/2012
Field of study

Modern computing systems, such as supercomputers, data centers and multicore chips, generally require efficient communication between their different system units; tolerance towards component faults; flexibility to expand or merge; and a high utilization of their resources. Interconnection networks are used in a variety of such computing systems in order to enable communication between their diverse system units. Investigation and proposal of new or improved solutions to topology agnostic routing and reconfiguration of interconnection networks are main objectives of this thesis. In addition, topology agnostic routing and reconfiguration algorithms are utilized in the development of new and flexible approaches to processor allocation. The thesis aims to present versatile solutions that can be used for the interconnection networks of a number of different computing systems. No particular routing algorithm was specified for an interconnection network technology which is now incorporated in Dolphin Express. The thesis states a set of criteria for a suitable routing algorithm, evaluates a number of existing routing algorithms, and recommend that one of the algorithms – which fulfils all of the criteria – is used. Further investigations demonstrate how this routing algorithm inherently supports fault-tolerance, and how it can be optimized for some network topologies. These considerations are also relevant for the InfiniBand interconnection network technology. Reconfiguration of interconnection networks (change of routing function) is a deadlock prone process. Some existing reconfiguration strategies include deadlock avoidance mechanisms that significantly reduce the network service offered to running applications. The thesis expands the area of application for one of the most versatile and efficient reconfiguration algorithms available in the literature, and proposes an optimization of this algorithm that improves the network service offered to running applications. Moreover, a new reconfiguration algorithm is presented that supports a replacement of the routing function without causing performance penalties. Processor allocation strategies that guarantee traffic-containment commonly pose strict requirements on the shape of partitions, and thus achieve only a limited utilization of a system’s computing resources. The thesis introduces two new approaches that are more flexible. Both approaches utilize the properties of a topology agnostic routing algorithm in order to enforce traffic-containment within arbitrarily shaped partitions. Consequently, a high resource utilization as well as isolation of traffic between different partitions is achieved

A grid-enabled problem solving environment for parallel computational engineering design

Author: Allen
Brandt
Brown
C.E. Goodyer
Dabdub
Dew
Dowson
Dowson
Fairlie
Foster
Fox
Giles
Goodyer
Goodyer
Goodyer
Haber
Inselberg
Jameson
Johnson
Johnson
Karonis
L.E. Scales
Linden
Llorente
M. Berzins
McBryan
Nelder
Nurgat
P.K. Jimack
Parkinson
Tuminaro
Venner
Walkley
Walton
Wang
Wood
Publication venue: 'Elsevier BV'
Publication date: 01/07/2006
Field of study

This paper describes the development and application of a piece of engineering software that provides a problem solving environment (PSE) capable of launching, and interfacing with, computational jobs executing on remote resources on a computational grid. In particular it is demonstrated how a complex, serial, engineering optimisation code may be efficiently parallelised, grid-enabled and embedded within a PSE. The environment is highly flexible, allowing remote users from different sites to collaborate, and permitting computational tasks to be executed in parallel across multiple grid resources, each of which may be a parallel architecture. A full working prototype has been built and successfully applied to a computationally demanding engineering optimisation problem. This particular problem stems from elastohydrodynamic lubrication and involves optimising the computational model for a lubricant based on the match between simulation results and experimentally observed data

SpECTRE: A Task-based Discontinuous Galerkin Code for Relativistic Astrophysics

Author: Bohn Andy
Deppe Nils
Diener Peter
Field Scott E.
Foucart Francois
Hébert François
Kidder Lawrence E.
Lippuner Jonas
Miller Jonah
Ott Christian D.
Scheel Mark A.
Schnetter Erik
Teukolsky Saul A.
Vincent Trevor
Publication venue: 'Elsevier BV'
Publication date: 15/04/2017
Field of study

We introduce a new relativistic astrophysics code, SpECTRE, that combines a discontinuous Galerkin method with a task-based parallelism model. SpECTRE's goal is to achieve more accurate solutions for challenging relativistic astrophysics problems such as core-collapse supernovae and binary neutron star mergers. The robustness of the discontinuous Galerkin method allows for the use of high-resolution shock capturing methods in regions where (relativistic) shocks are found, while exploiting high-order accuracy in smooth regions. A task-based parallelism model allows efficient use of the largest supercomputers for problems with a heterogeneous workload over disparate spatial and temporal scales. We argue that the locality and algorithmic structure of discontinuous Galerkin methods will exhibit good scalability within a task-based parallelism framework. We demonstrate the code on a wide variety of challenging benchmark problems in (non)-relativistic (magneto)-hydrodynamics. We demonstrate the code's scalability including its strong scaling on the NCSA Blue Waters supercomputer up to the machine's full capacity of 22,380 nodes using 671,400 threads.Comment: 41 pages, 13 figures, and 7 tables. Ancillary data contains simulation input file

arXiv.org e-Print Archive

Louisiana State University

Caltech Authors