Search CORE

134 research outputs found

netloc: Towards a Comprehensive View of the HPC System Topology

Author: Goglin Brice
Hursey Joshua
Squyres Jeffrey M.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/09/2014
Field of study

International audienceThe increasing complexity of High Performance Computing (HPC) server architectures and networks has made topology- and affinity-awareness a critical component of HPC application optimization. Although there is a portable mechanism for accessing the server-internal topology there is no such mechanism for accessing the network topology of modern HPC systems in an equally portable manner. The Network Locality (netloc) project provides mechanisms for portably discovering and abstractly representing the network topology of modern HPC systems. Additionally, netloc provides the ability to merge the network topology with the server-internal topologies resulting in a comprehensive map of the HPC system topology. Using a modular infrastructure, netloc provides support for a variety of network types and discovery techniques. By representing the network topology as a graph, netloc supports any network topology configuration. The netloc architecture hides the topology discovery mechanism from the application developer thus allowing them to focus on traversing and analyzing the resulting map of the HPC system topology

Crossref

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Oskar Bordeaux

Traces Generation To Simulate Large-Scale Distributed Applications

Author: Dalle Olivier
Mancini Emilio
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/12/2011
Field of study

International audienceIn order to study the performance of scheduling algorithms, simulators of parallel and distributed applications need accurate models of the application's behavior during execution. For this purpose, traces of low-level events collected during the actual execution of real applications are needed. Collecting such traces is a difficult task due to the timing, to the interference of instrumentation code, and to the storage and transfer of the collected data. To address this problem we propose a comprehensive software architecture, which instruments the application's executables, gather hierarchically the traces, and post-process them in order to feed simulation models. We designed it to be scalable, modular and extensible

Crossref

INRIA a CCSD electronic archive server

Flexible multi-layer virtual machine design for virtual laboratory in distributed systems and grids.

Author: Kim Dohan
Publication venue: 'University of Windsor Leddy Library'
Publication date: 01/01/2005
Field of study

We propose a flexible Multi-layer Virtual Machine (MVM) design intended to improve efficiencies in distributed and grid computing and to overcome the known current problems that exist within traditional virtual machine architectures and those used in distributed and grid systems. This thesis presents a novel approach to building a virtual laboratory to support e-science by adapting MVMs within the distributed systems and grids, thereby providing enhanced flexibility and reconfigurability by raising the level of abstraction. The MVM consists of three layers. They are OS-level VM, queue VMs, and components VMs. The group of MVMs provides the virtualized resources, virtualized networks, and reconfigurable components layer for virtual laboratories. We demonstrate how our reconfigurable virtual machine can allow software designers and developers to reuse parallel communication patterns. In our framework, the virtual machines can be created on-demand and their applications can be distributed at the source-code level, compiled and instantiated in runtime. (Abstract shortened by UMI.) Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2005 .K56. Source: Masters Abstracts International, Volume: 44-03, page: 1405. Thesis (M.Sc.)--University of Windsor (Canada), 2005

Scholarship at UWindsor

TACCL: Guiding Collective Algorithm Synthesis using Communication Sketches

Author: Chidambaram Vijay
Cowan Meghan
Maleki Saeed
Musuvathi Madan
Mytkowicz Todd
Nelson Jacob
Saarikivi Olli
Shah Aashaka
Singh Rachee
Publication venue
Publication date: 10/07/2022
Field of study

Machine learning models are increasingly being trained across multiple GPUs and multiple machines. In this setting, data is transferred between GPUs using communication collectives such as AlltoAll and AllReduce, which can become a significant bottleneck in large models. It is important to use efficient algorithms for collective communication. We introduce TACCL, a tool that allows algorithm designers to guide a synthesizer into automatically generating algorithms for a given hardware configuration and communication collective. TACCL uses the novel communication sketch abstraction to obtain crucial information from the designer that is used to significantly reduce the state space and guide the synthesizer towards better algorithms. TACCL also uses a novel encoding of the problem that allows it to scale beyond single-node topologies. We use TACCL to synthesize algorithms for three collectives and two hardware topologies: DGX-2 and NDv2. We demonstrate that the algorithms synthesized by TACCL outperform the NVIDIA Collective Communication Library (NCCL) by up to 6.7

\times

. We also show that TACCL can speed up end-to-end training of Transformer-XL and BERT models by 11%--2.3

\times

for different batch sizes.Comment: Accepted at NSDI'23. Contains 17 pages, 11 figures, including Appendi

arXiv.org e-Print Archive

Creating a Worldwide Network For the Global Environment for Network Innovations (GENI) and Related Experimental Environments

Author: A Abelem
A Jukan
B Ahlgren
B Belter
D Kim
D Schwerdel
D Schwerdel
D Trossen
J Jofre
J Mambretti
J Mambretti
JJ Ham van der
M Berman
M Campanella
M Ghijsen
M Stanton
M Suñé
M-Y Luo
N McKeown
R Koning
R Strijkers
T Rakotoarivelo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Many important societal activities are global in scope, and as these activities continually expand world-wide, they are increasingly based on a foundation of advanced communication services and underlying innovative network architecture, technology, and core infrastructure. To continue progress in these areas, research activities cannot be limited to campus labs and small local testbeds or even to national testbeds. Researchers must be able to explore concepts at scale—to conduct experiments on world-wide testbeds that approximate the attributes of the real world. Today, it is possible to take advantage of several macro information technology trends, especially virtualization and capabilities for programming technology resources at a highly granulated level, to design, implement and operate network research environments at a global scale. GENI is developing such an environment, as are research communities in a number of other countries. Recently, these communities have not only been investigating techniques for federating these research environments across multiple domains, but they have also been demonstration prototypes of such federations. This chapter provides an overview of key topics and experimental activities related to GENI international networking and to related projects throughout the world

Crossref

International Migration, Integration and Social Cohesion online publications

UvA-DARE

Creating a Worldwide Network For the Global Environment for Network Innovations (GENI) and Related Experimental Environments

Author: Chen J.
de Laat C.
Ge J.
Grosso P.
Li T.
Liu T.-L.
Luo M.-Y.
Mambretti J.
Müller P.
Nakao A.
Reed M.
Stanton M.
van der Pol R.
Yang C.-S.
Yeh F.
You J.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

International Migration, Integration and Social Cohesion online publications

Creation of the AVIDD Data Facility: A Distributed Facility for Managing, Analyizing and Visualizing Instrument-Driven Data (AVIDD)

Author: Bramley Randall
Huffaman John C.
McRobbie Michael
Stewart Craig
Publication venue: The Trustees of Indiana University
Publication date: 01/09/2003
Field of study

IUScholarWorks (University of Indiana)

Designing, Building, and Modeling Maneuverable Applications within Shared Computing Resources

Author: Moody William Clay
Publication venue: Clemson University Libraries
Publication date: 01/05/2015
Field of study

Extending the military principle of maneuver into war-ﬁghting domain of cyberspace, academic and military researchers have produced many theoretical and strategic works, though few have focused on researching actual applications and systems that apply this principle. We present our research in designing, building and modeling maneuverable applications in order to gain the system advantages of resource provisioning, application optimization, and cybersecurity improvement. We have coined the phrase “Maneuverable Applications” to be deﬁned as distributed and parallel application that take advantage of the modiﬁcation, relocation, addition or removal of computing resources, giving the perception of movement. Our work with maneuverable applications has been within shared computing resources, such as the Clemson University Palmetto cluster, where multiple users share access and time to a collection of inter-networked computers and servers. In this dissertation, we describe our implementation and analytic modeling of environments and systems to maneuver computational nodes, network capabilities, and security enhancements for overcoming challenges to a cyberspace platform. Speciﬁcally we describe our work to create a system to provision a big data computational resource within academic environments. We also present a computing testbed built to allow researchers to study network optimizations of data centers. We discuss our Petri Net model of an adaptable system, which increases its cybersecurity posture in the face of varying levels of threat from malicious actors. Lastly, we present work and investigation into integrating these technologies into a prototype resource manager for maneuverable applications and validating our model using this implementation

Clemson University: TigerPrints

Evaluation of technologies of parallel computers' communication networks for a real-time triggering application in a high-energy physics experiment at CERN

Author: Hörtnagl C
Publication venue: Union of Concerned Scientists
Publication date: 01/01/1997
Field of study

CERN Document Server