134 research outputs found

    netloc: Towards a Comprehensive View of the HPC System Topology

    Get PDF
    International audienceThe increasing complexity of High Performance Computing (HPC) server architectures and networks has made topology- and affinity-awareness a critical component of HPC application optimization. Although there is a portable mechanism for accessing the server-internal topology there is no such mechanism for accessing the network topology of modern HPC systems in an equally portable manner. The Network Locality (netloc) project provides mechanisms for portably discovering and abstractly representing the network topology of modern HPC systems. Additionally, netloc provides the ability to merge the network topology with the server-internal topologies resulting in a comprehensive map of the HPC system topology. Using a modular infrastructure, netloc provides support for a variety of network types and discovery techniques. By representing the network topology as a graph, netloc supports any network topology configuration. The netloc architecture hides the topology discovery mechanism from the application developer thus allowing them to focus on traversing and analyzing the resulting map of the HPC system topology

    Traces Generation To Simulate Large-Scale Distributed Applications

    Get PDF
    International audienceIn order to study the performance of scheduling algorithms, simulators of parallel and distributed applications need accurate models of the application's behavior during execution. For this purpose, traces of low-level events collected during the actual execution of real applications are needed. Collecting such traces is a difficult task due to the timing, to the interference of instrumentation code, and to the storage and transfer of the collected data. To address this problem we propose a comprehensive software architecture, which instruments the application's executables, gather hierarchically the traces, and post-process them in order to feed simulation models. We designed it to be scalable, modular and extensible

    Flexible multi-layer virtual machine design for virtual laboratory in distributed systems and grids.

    Get PDF
    We propose a flexible Multi-layer Virtual Machine (MVM) design intended to improve efficiencies in distributed and grid computing and to overcome the known current problems that exist within traditional virtual machine architectures and those used in distributed and grid systems. This thesis presents a novel approach to building a virtual laboratory to support e-science by adapting MVMs within the distributed systems and grids, thereby providing enhanced flexibility and reconfigurability by raising the level of abstraction. The MVM consists of three layers. They are OS-level VM, queue VMs, and components VMs. The group of MVMs provides the virtualized resources, virtualized networks, and reconfigurable components layer for virtual laboratories. We demonstrate how our reconfigurable virtual machine can allow software designers and developers to reuse parallel communication patterns. In our framework, the virtual machines can be created on-demand and their applications can be distributed at the source-code level, compiled and instantiated in runtime. (Abstract shortened by UMI.) Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2005 .K56. Source: Masters Abstracts International, Volume: 44-03, page: 1405. Thesis (M.Sc.)--University of Windsor (Canada), 2005

    TACCL: Guiding Collective Algorithm Synthesis using Communication Sketches

    Full text link
    Machine learning models are increasingly being trained across multiple GPUs and multiple machines. In this setting, data is transferred between GPUs using communication collectives such as AlltoAll and AllReduce, which can become a significant bottleneck in large models. It is important to use efficient algorithms for collective communication. We introduce TACCL, a tool that allows algorithm designers to guide a synthesizer into automatically generating algorithms for a given hardware configuration and communication collective. TACCL uses the novel communication sketch abstraction to obtain crucial information from the designer that is used to significantly reduce the state space and guide the synthesizer towards better algorithms. TACCL also uses a novel encoding of the problem that allows it to scale beyond single-node topologies. We use TACCL to synthesize algorithms for three collectives and two hardware topologies: DGX-2 and NDv2. We demonstrate that the algorithms synthesized by TACCL outperform the NVIDIA Collective Communication Library (NCCL) by up to 6.7×\times. We also show that TACCL can speed up end-to-end training of Transformer-XL and BERT models by 11%--2.3×\times for different batch sizes.Comment: Accepted at NSDI'23. Contains 17 pages, 11 figures, including Appendi

    Creating a Worldwide Network For the Global Environment for Network Innovations (GENI) and Related Experimental Environments

    Get PDF
    Many important societal activities are global in scope, and as these activities continually expand world-wide, they are increasingly based on a foundation of advanced communication services and underlying innovative network architecture, technology, and core infrastructure. To continue progress in these areas, research activities cannot be limited to campus labs and small local testbeds or even to national testbeds. Researchers must be able to explore concepts at scale—to conduct experiments on world-wide testbeds that approximate the attributes of the real world. Today, it is possible to take advantage of several macro information technology trends, especially virtualization and capabilities for programming technology resources at a highly granulated level, to design, implement and operate network research environments at a global scale. GENI is developing such an environment, as are research communities in a number of other countries. Recently, these communities have not only been investigating techniques for federating these research environments across multiple domains, but they have also been demonstration prototypes of such federations. This chapter provides an overview of key topics and experimental activities related to GENI international networking and to related projects throughout the world

    Designing, Building, and Modeling Maneuverable Applications within Shared Computing Resources

    Get PDF
    Extending the military principle of maneuver into war-fighting domain of cyberspace, academic and military researchers have produced many theoretical and strategic works, though few have focused on researching actual applications and systems that apply this principle. We present our research in designing, building and modeling maneuverable applications in order to gain the system advantages of resource provisioning, application optimization, and cybersecurity improvement. We have coined the phrase “Maneuverable Applications” to be defined as distributed and parallel application that take advantage of the modification, relocation, addition or removal of computing resources, giving the perception of movement. Our work with maneuverable applications has been within shared computing resources, such as the Clemson University Palmetto cluster, where multiple users share access and time to a collection of inter-networked computers and servers. In this dissertation, we describe our implementation and analytic modeling of environments and systems to maneuver computational nodes, network capabilities, and security enhancements for overcoming challenges to a cyberspace platform. Specifically we describe our work to create a system to provision a big data computational resource within academic environments. We also present a computing testbed built to allow researchers to study network optimizations of data centers. We discuss our Petri Net model of an adaptable system, which increases its cybersecurity posture in the face of varying levels of threat from malicious actors. Lastly, we present work and investigation into integrating these technologies into a prototype resource manager for maneuverable applications and validating our model using this implementation
    corecore