5 research outputs found
Recommended from our members
Final Report: Performance Modeling Activities in PERC2
Progress in Performance Modeling for PERC2 resulted in: • Automated modeling tools that are robust, able to characterize large applications running at scale while simultaneously simulating the memory hierarchies of mul-tiple machines in parallel. • Porting of the requisite tracer tools to multiple platforms. • Improved performance models by using higher resolution memory models that ever before. • Adding control-flow and data dependency analysis to the tracers used in perform-ance tools. • Exploring and developing several new modeling methodologies. • Using modeling tools to develop performance models for strategic codes. • Application of modeling methodology to make a large number of “blind” per-formance predictions on certain mission partner applications, targeting most cur-rently available system architectures. • Error analysis to correct some systematic biases encountered as part of the large-scale blind prediction exercises. • Addition of instrumentation capabilities for communication libraries other than MPI. • Dissemination the tools and modeling methods to several mission partners, in-cluding DoD HPCMO and two DARPA HPCS vendors (Cray and IBM), as well as to the wider HPC community via a series of tutorials
Extending Scojo-PECT by migration based on system-level checkpointing
In recent years, a significant amount of research has been done on job scheduling in high performance computing area. Parallel jobs have different running time and require a different number of processors, thus jobs need to be scheduled and packed to improve system utilization. Scojo-PECT is a job scheduler which provides service guarantees by using coarse-grain time sharing. However, Scojo-PECT does not provide process migration. We extend the Scojo-PECT by migrating parallel jobs based on system-level checkpointing. We investigate different cases in the Scojo-PECT scheduling algorithm where migration based on system-level checkpointing can be used to improve resource utilization and reduce job response time. Our experimental results show reduction of relative response times on medium jobs over the results of the original Scojo-PECT scheduler and the long jobs do not suffer any disadvantage
G-LOMARC-TS: Lookahead group matchmaking for time/space sharing on multi-core parallel machines
Parallel machines with multi-core nodes are becoming increasingly popular. The performances of applications running on these machines are improved gradually due to the resource competition in each node. Researches have found that coscheduling different applications with complementary resource characteristics on the same set of nodes (semi time sharing) may improve the performance. We propose a scheduling algorithm G-LOMARC-TS which incorporates both space and semi time sharing scheduling methods and matches groups of jobs if possible for coscheduling. Since matchmaking may select jobs further down the waiting queue and the jobs in front of the queue may be delayed subsequently, fairness for each individual job will be watched and the delay will be kept within a limited bound. Several heuristics are used to solve the NP-complete problem of forming groups. Our experiment results show both utilization gain and average relative response time improvements of G-LOMARC-TS over other several scheduling policies