736 research outputs found

    Using Pilot Systems to Execute Many Task Workloads on Supercomputers

    Full text link
    High performance computing systems have historically been designed to support applications comprised of mostly monolithic, single-job workloads. Pilot systems decouple workload specification, resource selection, and task execution via job placeholders and late-binding. Pilot systems help to satisfy the resource requirements of workloads comprised of multiple tasks. RADICAL-Pilot (RP) is a modular and extensible Python-based pilot system. In this paper we describe RP's design, architecture and implementation, and characterize its performance. RP is capable of spawning more than 100 tasks/second and supports the steady-state execution of up to 16K concurrent tasks. RP can be used stand-alone, as well as integrated with other application-level tools as a runtime system

    Efficient HTTP based I/O on very large datasets for high performance computing with the libdavix library

    Full text link
    Remote data access for data analysis in high performance computing is commonly done with specialized data access protocols and storage systems. These protocols are highly optimized for high throughput on very large datasets, multi-streams, high availability, low latency and efficient parallel I/O. The purpose of this paper is to describe how we have adapted a generic protocol, the Hyper Text Transport Protocol (HTTP) to make it a competitive alternative for high performance I/O and data analysis applications in a global computing grid: the Worldwide LHC Computing Grid. In this work, we first analyze the design differences between the HTTP protocol and the most common high performance I/O protocols, pointing out the main performance weaknesses of HTTP. Then, we describe in detail how we solved these issues. Our solutions have been implemented in a toolkit called davix, available through several recent Linux distributions. Finally, we describe the results of our benchmarks where we compare the performance of davix against a HPC specific protocol for a data analysis use case.Comment: Presented at: Very large Data Bases (VLDB) 2014, Hangzho

    PMT: Power Measurement Toolkit

    Full text link
    Efficient use of energy is essential for today's supercomputing systems, as energy cost is generally a major component of their operational cost. Research into "green computing" is needed to reduce the environmental impact of running these systems. As such, several scientific communities are evaluating the trade-off between time-to-solution and energy-to-solution. While the runtime of an application is typically easy to measure, power consumption is not. Therefore, we present the Power Measurement Toolkit (PMT), a high-level software library capable of collecting power consumption measurements on various hardware. The library provides a standard interface to easily measure the energy use of devices such as CPUs and GPUs in critical application sections

    Monitoring Cluster on Online Compiler with Ganglia

    Get PDF
    Ganglia is an open source monitoring system for high performance computing (HPC) that collect both a whole cluster and every nodes status and report to the user. We use Ganglia to monitor our spasi.informatika.lipi.go.id (SPASI), a customized-fedora10-based cluster, for our cluster online compiler, CLAW (cluster access through web). Our experience on using Ganglia shows that Ganglia has a capability to view our cluster status and allow us to track them

    Machine Learning with Kay

    Get PDF
    Computational power is very important when training Deep Learning (DL) models with large amounts of data (Wooldridge, 2021). Hence, High-Performance Computing (HPC) can be leveraged to reduce computational cost, and the Irish Centre for High-End Computing (ICHEC) provides significant infrastructure and services for research and development to both academia and industry. A portion of ICHEC\u27s HPC system has been allocated for institutional access, and this paper presents a case study of how to use Kay (Ireland\u27s national supercomputer) in the remote sensing domain. Specifically, this study uses clusters of Kay Graphics Processing Units (GPUs) for training DL models to extract buildings from satellite imagery using a large number of input data samples

    Swarming the SC’17 Student Cluster Competition

    Get PDF
    The Student Cluster Competition is a suite of challenges where teams of undergraduates design a computer cluster and then compete against each other through various benchmark applications. The present study will provide a select summary of the experiences of Team Swarm who represented the Georgia Institute of Technology at the SC’17 Student Cluster Competition. This report will first describe the competition and the members of Team Swarm. After this introduction, it focuses on three major aspects of the experience: the hardware and software architecture of the team’s computer cluster, the team’s system administration workflow and the team’s usage of cloud resources. Additionally, the appendix provides a brief description of the team members and their method of preparation.Undergraduat

    Containers for Portable, Productive, and Performant Scientific Computing

    Get PDF
    Containers are an emerging technology that holds promise for improving productivity and code portability in scientific computing. The authors examine Linux container technology for the distribution of a nontrivial scientific computing software stack and its execution on a spectrum of platforms from laptop computers through high-performance computing systems. For Python code run on large parallel computers, the runtime is reduced inside a container due to faster library imports. The software distribution approach and data that the authors present will help developers and users decide on whether container technology is appropriate for them. The article also provides guidance for vendors of HPC systems that rely on proprietary libraries for performance on what they can do to make containers work seamlessly and without performance penalty
    • …
    corecore