2,157 research outputs found

    Evaluating Emerging CXL-enabled Memory Pooling for HPC Systems

    Full text link
    Current HPC systems provide memory resources that are statically configured and tightly coupled with compute nodes. However, workloads on HPC systems are evolving. Diverse workloads lead to a need for configurable memory resources to achieve high performance and utilization. In this study, we evaluate a memory subsystem design leveraging CXL-enabled memory pooling. Two promising use cases of composable memory subsystems are studied -- fine-grained capacity provisioning and scalable bandwidth provisioning. We developed an emulator to explore the performance impact of various memory compositions. We also provide a profiler to identify the memory usage patterns in applications and their optimization opportunities. Seven scientific and six graph applications are evaluated on various emulated memory configurations. Three out of seven scientific applications had less than 10% performance impact when the pooled memory backed 75% of their memory footprint. The results also show that a dynamically configured high-bandwidth system can effectively support bandwidth-intensive unstructured mesh-based applications like OpenFOAM. Finally, we identify interference through shared memory pools as a practical challenge for adoption on HPC systems.Comment: 10 pages, 13 figures. Accepted for publication in Workshop on Memory Centric High Performance Computing (MCHPC'22) at SC2

    Power efficient job scheduling by predicting the impact of processor manufacturing variability

    Get PDF
    Modern CPUs suffer from performance and power consumption variability due to the manufacturing process. As a result, systems that do not consider such variability caused by manufacturing issues lead to performance degradations and wasted power. In order to avoid such negative impact, users and system administrators must actively counteract any manufacturing variability. In this work we show that parallel systems benefit from taking into account the consequences of manufacturing variability when making scheduling decisions at the job scheduler level. We also show that it is possible to predict the impact of this variability on specific applications by using variability-aware power prediction models. Based on these power models, we propose two job scheduling policies that consider the effects of manufacturing variability for each application and that ensure that power consumption stays under a system-wide power budget. We evaluate our policies under different power budgets and traffic scenarios, consisting of both single- and multi-node parallel applications, utilizing up to 4096 cores in total. We demonstrate that they decrease job turnaround time, compared to contemporary scheduling policies used on production clusters, up to 31% while saving up to 5.5% energy.Postprint (author's final draft

    A Quantitative Approach for Adopting Disaggregated Memory in HPC Systems

    Full text link
    Memory disaggregation has recently been adopted in data centers to improve resource utilization, motivated by cost and sustainability. Recent studies on large-scale HPC facilities have also highlighted memory underutilization. A promising and non-disruptive option for memory disaggregation is rack-scale memory pooling, where shared memory pools supplement node-local memory. This work outlines the prospects and requirements for adoption and clarifies several misconceptions. We propose a quantitative method for dissecting application requirements on the memory system from the top down in three levels, moving from general, to multi-tier memory systems, and then to memory pooling. We provide a multi-level profiling tool and LBench to facilitate the quantitative approach. We evaluate a set of representative HPC workloads on an emulated platform. Our results show that prefetching activities can significantly influence memory traffic profiles. Interference in memory pooling has varied impacts on applications, depending on their access ratios to memory tiers and arithmetic intensities. Finally, in two case studies, we show the benefits of our findings at the application and system levels, achieving 50% reduction in remote access and 13% speedup in BFS, and reducing performance variation of co-located workloads in interference-aware job scheduling.Comment: Accepted to SC23 (The International Conference for High Performance Computing, Networking, Storage, and Analysis 2023

    Scheduling Heterogeneous HPC Applications in Next-Generation Exascale Systems

    Get PDF
    Next generation HPC applications will increasingly time-share system resources with emerging workloads such as in-situ analytics, resilience tasks, runtime adaptation services and power management activities. HPC systems must carefully schedule these co-located codes in order to reduce their impact on application performance. Among the techniques traditionally used to mitigate the performance effects of time- share systems is gang scheduling. This approach, however, leverages global synchronization and time agreement mechanisms that will become hard to support as systems increase in size. Alternative performance interference mitigation approaches must be explored for future HPC systems. This dissertation evaluates the impacts of workload concurrency in future HPC systems. It uses simulation and modeling techniques to study the performance impacts of existing and emerging interference sources on a selection of HPC benchmarks, mini-applications, and applications. It also quantifies the cost and benefits of different approaches to scheduling co-located workloads, studies performance interference mitigation solutions based on gang scheduling, and examines their synchronization requirements. To do so, this dissertation presents and leverages a new Extreme Value Theory- based model to characterize interference sources, and investigate their impact on Bulk Synchronous Parallel (BSP) applications. It demonstrates how this model can be used to analyze the interference attenuation effects of alternative fine-grained OS scheduling approaches based on periodic real time schedulers. This analysis can, in turn, guide the design of those mitigation techniques by providing tools to understand the tradeoffs of selecting scheduling parameters

    High Performance Embedded Computing

    Get PDF
    Nowadays, the prevalence of computing systems in our lives is so ubiquitous that we live in a cyber-physical world dominated by computer systems, from pacemakers to cars and airplanes. These systems demand for more computational performance to process large amounts of data from multiple data sources with guaranteed processing times. Actuating outside of the required timing bounds may cause the failure of the system, being vital for systems like planes, cars, business monitoring, e-trading, etc. High-Performance and Time-Predictable Embedded Computing presents recent advances in software architecture and tools to support such complex systems, enabling the design of embedded computing devices which are able to deliver high-performance whilst guaranteeing the application required timing bounds. Technical topics discussed in the book include: Parallel embedded platforms Programming models Mapping and scheduling of parallel computations Timing and schedulability analysis Runtimes and operating systemsThe work reflected in this book was done in the scope of the European project P SOCRATES, funded under the FP7 framework program of the European Commission. High-performance and time-predictable embedded computing is ideal for personnel in computer/communication/embedded industries as well as academic staff and master/research students in computer science, embedded systems, cyber-physical systems and internet-of-things

    High-Performance and Time-Predictable Embedded Computing

    Get PDF
    Nowadays, the prevalence of computing systems in our lives is so ubiquitous that we live in a cyber-physical world dominated by computer systems, from pacemakers to cars and airplanes. These systems demand for more computational performance to process large amounts of data from multiple data sources with guaranteed processing times. Actuating outside of the required timing bounds may cause the failure of the system, being vital for systems like planes, cars, business monitoring, e-trading, etc. High-Performance and Time-Predictable Embedded Computing presents recent advances in software architecture and tools to support such complex systems, enabling the design of embedded computing devices which are able to deliver high-performance whilst guaranteeing the application required timing bounds. Technical topics discussed in the book include: Parallel embedded platforms Programming models Mapping and scheduling of parallel computations Timing and schedulability analysis Runtimes and operating systems The work reflected in this book was done in the scope of the European project P SOCRATES, funded under the FP7 framework program of the European Commission. High-performance and time-predictable embedded computing is ideal for personnel in computer/communication/embedded industries as well as academic staff and master/research students in computer science, embedded systems, cyber-physical systems and internet-of-things.info:eu-repo/semantics/publishedVersio

    Scalability in the Presence of Variability

    Get PDF
    Supercomputers are used to solve some of the world’s most computationally demanding problems. Exascale systems, to be comprised of over one million cores and capable of 10^18 floating point operations per second, will probably exist by the early 2020s, and will provide unprecedented computational power for parallel computing workloads. Unfortunately, while these machines hold tremendous promise and opportunity for applications in High Performance Computing (HPC), graph processing, and machine learning, it will be a major challenge to fully realize their potential, because to do so requires balanced execution across the entire system and its millions of processing elements. When different processors take different amounts of time to perform the same amount of work, performance imbalance arises, large portions of the system sit idle, and time and energy are wasted. Larger systems incorporate more processors and thus greater opportunity for imbalance to arise, as well as larger performance/energy penalties when it does. This phenomenon is referred to as performance variability and is the focus of this dissertation. In this dissertation, we explain how to design system software to mitigate variability on large scale parallel machines. Our approaches span (1) the design, implementation, and evaluation of a new high performance operating system to reduce some classes of performance variability, (2) a new performance evaluation framework to holistically characterize key features of variability on new and emerging architectures, and (3) a distributed modeling framework that derives predictions of how and where imbalance is manifesting in order to drive reactive operations such as load balancing and speed scaling. Collectively, these efforts provide a holistic set of tools to promote scalability through the mitigation of variability
    • …
    corecore