1,658 research outputs found
Using Pilot Systems to Execute Many Task Workloads on Supercomputers
High performance computing systems have historically been designed to support
applications comprised of mostly monolithic, single-job workloads. Pilot
systems decouple workload specification, resource selection, and task execution
via job placeholders and late-binding. Pilot systems help to satisfy the
resource requirements of workloads comprised of multiple tasks. RADICAL-Pilot
(RP) is a modular and extensible Python-based pilot system. In this paper we
describe RP's design, architecture and implementation, and characterize its
performance. RP is capable of spawning more than 100 tasks/second and supports
the steady-state execution of up to 16K concurrent tasks. RP can be used
stand-alone, as well as integrated with other application-level tools as a
runtime system
Reliable scalable symbolic computation: The design of SymGridPar2
Symbolic computation is an important area of both Mathematics and Computer Science, with many large computations that would benefit from parallel execution. Symbolic computations are, however, challenging to parallelise as they have complex data and control structures, and both dynamic and highly irregular parallelism. The SymGridPar framework (SGP) has been developed to address these challenges on small-scale parallel architectures. However the multicore revolution means that the number of cores and the number of failures are growing exponentially, and that the communication topology is becoming increasingly complex. Hence an improved parallel symbolic computation framework is required.
This paper presents the design and initial evaluation of SymGridPar2 (SGP2), a successor to SymGridPar that is designed to provide scalability onto 10^5 cores, and hence also provide fault tolerance. We present the SGP2 design goals, principles and architecture. We describe how scalability is achieved using layering and by allowing the programmer to control task placement. We outline how fault tolerance is provided by supervising remote computations, and outline higher-level fault tolerance abstractions.
We describe the SGP2 implementation status and development plans. We report the scalability and efficiency, including weak scaling to about 32,000 cores, and investigate the overheads of tolerating faults for simple symbolic computations
Probability of local bifurcation type from a fixed point: A random matrix perspective
Results regarding probable bifurcations from fixed points are presented in
the context of general dynamical systems (real, random matrices), time-delay
dynamical systems (companion matrices), and a set of mappings known for their
properties as universal approximators (neural networks). The eigenvalue spectra
is considered both numerically and analytically using previous work of Edelman
et. al. Based upon the numerical evidence, various conjectures are presented.
The conclusion is that in many circumstances, most bifurcations from fixed
points of large dynamical systems will be due to complex eigenvalues.
Nevertheless, surprising situations are presented for which the aforementioned
conclusion is not general, e.g. real random matrices with Gaussian elements
with a large positive mean and finite variance.Comment: 21 pages, 19 figure
Reliable scalable symbolic computation: The design of SymGridPar2
Symbolic computation is an important area of both Mathematics and Computer Science, with many large computations that would benefit from parallel execution. Symbolic computations are, however, challenging to parallelise as they have complex data and control structures, and both dynamic and highly irregular parallelism. The SymGridPar framework (SGP) has been developed to address these challenges on small-scale parallel architectures. However the multicore revolution means that the number of cores and the number of failures are growing exponentially, and that the communication topology is becoming increasingly complex. Hence an improved parallel symbolic computation framework is required.
This paper presents the design and initial evaluation of SymGridPar2 (SGP2), a successor to SymGridPar that is designed to provide scalability onto 10^5 cores, and hence also provide fault tolerance. We present the SGP2 design goals, principles and architecture. We describe how scalability is achieved using layering and by allowing the programmer to control task placement. We outline how fault tolerance is provided by supervising remote computations, and outline higher-level fault tolerance abstractions.
We describe the SGP2 implementation status and development plans. We report the scalability and efficiency, including weak scaling to about 32,000 cores, and investigate the overheads of tolerating faults for simple symbolic computations
Energy constrained sandpile models
We study two driven dynamical systems with conserved energy. The two automata
contain the basic dynamical rules of the Bak, Tang and Wiesenfeld sandpile
model. In addition a global constraint on the energy contained in the lattice
is imposed. In the limit of an infinitely slow driving of the system, the
conserved energy becomes the only parameter governing the dynamical
behavior of the system. Both models show scale free behavior at a critical
value of the fixed energy. The scaling with respect to the relevant
scaling field points out that the developing of critical correlations is in a
different universality class than self-organized critical sandpiles. Despite
this difference, the activity (avalanche) probability distributions appear to
coincide with the one of the standard self-organized critical sandpile.Comment: 4 pages including 3 figure
A method of evaluation of high-performance computing batch schedulers
According to Sterling et al., a batch scheduler, also called workload management, is an application or set of services that provide a method to monitor and manage the flow of work through the system [Sterling01]. The purpose of this research was to develop a method to assess the execution speed of workloads that are submitted to a batch scheduler. While previous research exists, this research is different in that more complex jobs were devised that fully exercised the scheduler with established benchmarks. This research is important because the reduction of latency even if it is miniscule can lead to massive savings of electricity, time, and money over the long term. This is especially important in the era of green computing [Reuther18]. The methodology used to assess these schedulers involved the execution of custom automation scripts. These custom scripts were developed as part of this research to automatically submit custom jobs to the schedulers, take measurements, and record the results. There were multiple experiments conducted throughout the course of the research. These experiments were designed to apply the methodology and assess the execution speed of a small selection of batch schedulers. Due to time constraints, the research was limited to four schedulers. x The measurements that were taken during the experiments were wall time, RAM usage, and CPU usage. These measurements captured the utilization of system resources of each of the schedulers. The custom scripts were executed using, 1, 2, and 4 servers to determine how well a scheduler scales with network growth. The experiments were conducted on local school resources. All hardware was similar and was co-located within the same data-center. While the schedulers that were investigated as part of the experiments are agnostic to whether the system is grid, cluster, or super-computer; the investigation was limited to a cluster architecture
High-Performance Computer Algebra: A Hecke Algebra Case Study
We describe the first ever parallelisation of an algebraic computation at modern HPC scale. Our case study poses challenges typical of the domain: it is a multi-phase application with dynamic task creation and irregular parallelism over complex control and data structures.
Our starting point is a sequential algorithm for finding invariant bilinear forms in the representation theory of Hecke algebras, implemented in the GAP computational group theory system. After optimising the sequential code we develop a parallel algorithm that exploits the new skeleton-based SGP2 framework to parallelise the three most computationally-intensive phases. To this end we develop a new domain-specific skeleton, parBufferTryReduce. We report good parallel performance both on a commodity cluster and on a national HPC, delivering speedups up to 548 over the optimised sequential implementation on 1024 cores
- …