24 research outputs found
ALBADross: active learning based anomaly diagnosis for production HPC systems
000000000000000000000000000000000000000000000000000002263712 - Sandia National Laboratories; Sandia National LaboratoriesAccepted manuscrip
Processor allocation on Cplant: Achieving general processor locality using one-dimensional allocation strategies.
Abstract Follows 3 Abstract The Computational Plant or Cplant is a commodity-based supercomputer under development at Sandia National Laboratories. This paper describes resource-allocation strategies to achieve processor locality for parallel jobs in Cplant and other supercomputers. Users of Cplant and other Sandia supercomputers submit parallel jobs to a job queue. When a job is scheduled to run, it is assigned to a set of processors. To obtain maximum throughput, jobs should be allocated to localized clusters of processors to minimize communication costs and to avoid bandwidth contention caused by overlapping jobs. This paper introduces new allocation strategies and performance metrics based on space-filling curves and one dimensional allocation strategies. These algorithms are general and simple. Preliminary simulations and Cplant experiments indicate that both space-filling curves and one-dimensional packing improve processor locality compared to the sorted free list strategy previously used on Cplant. These new allocation strategies are implemented in the new release of the Cplant System Software, Version 2.0, phased into th
Programming Abstractions for Data Locality
The goal of the workshop and this report is to identify common themes and standardize concepts for locality-preserving abstractions for exascale programming models. Current software tools are built on the premise that computing is the most expensive component, we are rapidly moving to an era that computing is cheap and massively parallel while data movement dominates energy and performance costs. In order to respond to exascale systems (the next generation of high performance computing systems), the scientific computing community needs to refactor their applications to align with the emerging data-centric paradigm. Our applications must be evolved to express information about data locality. Unfortunately current programming environments offer few ways to do so. They ignore the incurred cost of communication and simply rely on the hardware cache coherency to virtualize data movement. With the increasing importance of task-level parallelism on future systems, task models have to support constructs that express data locality and affinity. At the system level, communication libraries implicitly assume all the processing elements are equidistant to each other. In order to take advantage of emerging technologies, application developers need a set of programming abstractions to describe data locality for the new computing ecosystem. The new programming paradigm should be more data centric and allow to describe how to decompose and how to layout data in the memory.Fortunately, there are many emerging concepts such as constructs for tiling, data layout, array views, task and thread affinity, and topology aware communication libraries for managing data locality. There is an opportunity to identify commonalities in strategy to enable us to combine the best of these concepts to develop a comprehensive approach to expressing and managing data locality on exascale programming systems. These programming model abstractions can expose crucial information about data locality to the compiler and runtime system to enable performance-portable code. The research question is to identify the right level of abstraction, which includes techniques that range from template libraries all the way to completely new languages to achieve this goal
Recommended from our members
The undecidability of the modified edit distance
Given two strings, X and Y, over a finite alphabet E, the modified edit distance between X and Y is the minimal cost of an edit sequence that changes X into Y, where the cost of substituting a character in Y for a character in X is context free, and the cost of deleting a substring from X or inserting a substring from Y into X is somewhat context sensitive. The modified edit distance does not require that the minimum cost over all edit sequences where the cost of substituting a character in E for a character in a string is context free, the cost of deleting a substring from a string is somewhat context sensitive, and the cost of inserting a string Z into X to obtain a string X' is equivalent to the cost of deleting Z from X' to obtain X again. We show that if the minimum cost over all edit sequences must be obtained, the modified edit distance becomes undecidable
Recommended from our members
The undecidability of the modified edit distance
Given two strings, X and Y, over a finite alphabet E, the modified edit distance between X and Y is the minimal cost of an edit sequence that changes X into Y, where the cost of substituting a character in Y for a character in X is context free, and the cost of deleting a substring from X or inserting a substring from Y into X is somewhat context sensitive. The modified edit distance does not require that the minimum cost over all edit sequences where the cost of substituting a character in E for a character in a string is context free, the cost of deleting a substring from a string is somewhat context sensitive, and the cost of inserting a string Z into X to obtain a string X' is equivalent to the cost of deleting Z from X' to obtain X again. We show that if the minimum cost over all edit sequences must be obtained, the modified edit distance becomes undecidable
Recommended from our members
Simulation results for traffic signal control
In this paper, we discuss simulation results for the traffic signal control problem. Our algorithms are motivated by theoretical results from a model for scheduling jobs that may be competing for mutually exclusive resources. The conflicts between jobs are modeled by a conflict graph so that the set of all concurrently running jobs must form an independent set in the graph. We focus on the problem of minimizing the maximum response time of any job that enters the system. For the specific graph which arises in the traffic intersection control problem, we have shown [14] a simple algorithm which achieves the optimal competitive ratio. We have also studied scheduling with conflicts under probabilistic assumptions about the input. Each node i has a value pi such that a job arrives at node i in any given time unit with probability Pi. Arrivals at different nodes and during different time periods are independent. Under reasonable assumptions on the input sequence, if the conflict graph is a perfect graph, we have given [15] an algorithm whose competitive ratio converges to 1. Using the methodology of Recker, Ramanathan, Yu, and McNally and a modification of their software, we show that some of our algorithms achieve significant improvements over full-actuated control, the most advanced traffic signal control method in the public domain