Search CORE

23 research outputs found

The portals 4.0.1 network programming interface.

Author: Barrett Brian W.
Brightwell Ronald Brian
Hemmert Karl Scott
Hudson Trammell B.
Maccabe Arthur Bernard
Pedretti Kevin Thomas Tauke
Riesen Rolf E.
Underwood Keith Douglas
Wheeler Kyle Bruce
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date: 01/04/2013
Field of study

This report presents a specification for the Portals 4.0 network programming interface. Portals 4.0 is intended to allow scalable, high-performance network communication between nodes of a parallel computing system. Portals 4.0 is well suited to massively parallel processing and embedded systems. Portals 4.0 represents an adaption of the data movement layer developed for massively parallel processing platforms, such as the 4500-node Intel TeraFLOPS machine. Sandia's Cplant cluster project motivated the development of Version 3.0, which was later extended to Version 3.3 as part of the Cray Red Storm machine and XT line. Version 4.0 is targeted to the next generation of machines employing advanced network interface architectures that support enhanced offload capabilities.

Crossref

UNT Digital Library

Recommended from our members

The Portals 4.0 network programming interface.

Author: Barrett Brian W.
Brightwell Ronald Brian
Hemmert Karl Scott
Hudson Trammell B.
Maccabe Arthur Bernard
Pedretti Kevin Thomas Tauke
Riesen Rolf E.
Underwood Keith Douglas
Wheeler Kyle Bruce
Publication venue: Sandia National Laboratories
Publication date: 01/11/2012
Field of study

UNT Digital Library

Computers working at the speed of light

Author: Selviah D.R.
Publication venue
Publication date: 19/12/2008
Field of study

UCL Discovery

Introduction to RADR 2019

Author: Beckman Pete
Jeannot Emmanuel
Perarnau Swann
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 20/05/2019
Field of study

International audienceThe question of efficient dynamic allocation of compute-node resources, such as cores, by independent libraries or runtime systems can be an nightmare. Scientists writing application components have no way to efficiently specify and compose resource-hungry components. As application software stacks become deeper and the interaction of multiple runtime layers compete for resources from the operating system, it has become clear that intelligent cooperation is needed. Resources such as compute cores, in-package memory, and even electrical power must be orchestrated dynamically across application components, with the ability to query each other and respond appropriately. A more integrated solution would reduce intra-application resource competition and improve performance. Furthermore, application runtime systems could request and allocate specific hardware assets and adjust runtime tuning parameters up and down the software stack. The goal of this workshop is to gather and share the latest scholarly research from the community working on these issues, at all levels of the HPC software stack. This include thread allocation, resource arbitration and management, containers, and so on, from runtime-system designers to compilers. We will also use panel sessions and keynote talks to discuss these issues, share visions, and present solutions. Scope Over the last five years, the number of nodes in large supercomputers has remained largely unchanged. In fact, the Oak Ridge National Laboratory computer leading the Top500 list, Summit, has fewer nodes than its predecessor, which is 20 times slower. Machines are getting faster not by adding nodes, but by adding parallelism, cores, and hierarchical memory to each compute node. This shift in how computers are scaled up makes it imperative that parallel computer resources within a node be carefully orchestrated to achieve maximum performance. Dynamically allocating and managing threads and the mapping of these threads to cores is a challenge that requires cooperation and coordination between the different components of the software stack

Crossref

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

rMPI : increasing fault resiliency in a message-passing environment.

Author
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date
Field of study

Crossref

Keeping checkpoint/restart viable for exascale systems

Author: Ferreira Kurt
Publication venue: UNM Digital Repository
Publication date: 01/12/2011
Field of study

Next-generation exascale systems, those capable of performing a quintillion operations per second, are expected to be delivered in the next 8-10 years. These systems, which will be 1,000 times faster than current systems, will be of unprecedented scale. As these systems continue to grow in size, faults will become increasingly common, even over the course of small calculations. Therefore, issues such as fault tolerance and reliability will limit application scalability. Current techniques to ensure progress across faults like checkpoint/restart, the dominant fault tolerance mechanism for the last 25 years, are increasingly problematic at the scales of future systems due to their excessive overheads. In this work, we evaluate a number of techniques to decrease the overhead of checkpoint/restart and keep this method viable for future exascale systems. More specifically, this work evaluates state-machine replication to dramatically increase the checkpoint interval (the time between successive checkpoints) and hash-based, probabilistic incremental checkpointing using graphics processing units to decrease the checkpoint commit time (the time to save one checkpoint). Using a combination of empirical analysis, modeling, and simulation, we study the costs and benefits of these approaches on a wide range of parameters. These results, which cover of number of high-performance computing capability workloads, different failure distributions, hardware mean time to failures, and I/O bandwidths, show the potential benefits of these techniques for meeting the reliability demands of future exascale platforms

G-LOMARC-TS: Lookahead group matchmaking for time/space sharing on multi-core parallel machines

Author: Zeng Xijie
Publication venue: 'University of Windsor Leddy Library'
Publication date: 01/01/2009
Field of study

Parallel machines with multi-core nodes are becoming increasingly popular. The performances of applications running on these machines are improved gradually due to the resource competition in each node. Researches have found that coscheduling different applications with complementary resource characteristics on the same set of nodes (semi time sharing) may improve the performance. We propose a scheduling algorithm G-LOMARC-TS which incorporates both space and semi time sharing scheduling methods and matches groups of jobs if possible for coscheduling. Since matchmaking may select jobs further down the waiting queue and the jobs in front of the queue may be delayed subsequently, fairness for each individual job will be watched and the delay will be kept within a limited bound. Several heuristics are used to solve the NP-complete problem of forming groups. Our experiment results show both utilization gain and average relative response time improvements of G-LOMARC-TS over other several scheduling policies

Scholarship at UWindsor

I/O Performance of the Santos Dumont Supercomputer

Author: Fagundes Bruno,
Leite da Silva Dias Pedro
Luca Bez Jean
Méhaut Jean-François
Navaux Philippe,
Osthoff Carla
Pavan Pablo,
Ramos Carneiro André
Soldera Girelli Valéria
Zanon Boito Francieli
Publication venue: 'SAGE Publications'
Publication date: 01/01/2019
Field of study

International audienc

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server