3 research outputs found

    Optimization and Management of Large-scale Scientific Workflows in Heterogeneous Network Environments: From Theory to Practice

    Get PDF
    Next-generation computation-intensive scientific applications feature large-scale computing workflows of various structures, which can be modeled as simple as linear pipelines or as complex as Directed Acyclic Graphs (DAGs). Supporting such computing workflows and optimizing their end-to-end network performance are crucial to the success of scientific collaborations that require fast system response, smooth data flow, and reliable distributed operation.We construct analytical cost models and formulate a class of workflow mapping problems with different mapping objectives and network constraints. The difficulty of these mapping problems essentially arises from the topological matching nature in the spatial domain, which is further compounded by the resource sharing complicacy in the temporal dimension. We provide detailed computational complexity analysis and design optimal or heuristic algorithms with rigorous correctness proof or performance analysis. We decentralize the proposed mapping algorithms and also investigate these optimization problems in unreliable network environments for fault tolerance.To examine and evaluate the performance of the workflow mapping algorithms before actual deployment and implementation, we implement a simulation program that simulates the execution dynamics of distributed computing workflows. We also develop a scientific workflow automation and management platform based on an existing workflow engine for experimentations in real environments. The performance superiority of the proposed mapping solutions are illustrated by extensive simulation-based comparisons with existing algorithms and further verified by large-scale experiments on real-life scientific workflow applications through effective system implementation and deployment in real networks

    An Intelligent Robust Mouldable Scheduler for HPC & Elastic Environments

    Get PDF
    Traditional scheduling techniques are of a by-gone era and do not cater for the dynamism of new and emerging computing paradigms. Budget constraints now push researchers to migrate their workloads to public clouds or to buy into shared computing services as funding for large capital expenditures are few and far between. The sites still hosting large or shared computing infrastructure have to ensure that the system utilisation and efficiency is as high as ossible. However, the efficiency can not come at the cost of quality of service as the availability of public clouds now means that users can move away. This thesis presents a novel scheduling system to improve job turn-around-time. The Robust Mouldable Scheduler outlined in these pages utilises real application benchmarks to profile system performance and predict job execution times at different allocations, something no other scheduler does at present. The system is able to make an allocation decisions ensuring the jobs can fit into spaces available on the system using fewer resources without delaying the job completion time. The results demonstrate significant improvement in workload turn-around-times using real High Performance Computing (HPC) trace logs. Utilising three years of the University of Huddersfield trace logs the mouldable scheduler consistently simulated faster workload completion. Further, the results establish that by not relying on the user to suggest resource allocations for jobs the system is able to mitigate bad-put into the system leading to improved efficiency. A thorough investigation of Research Computing Systems (RCS), workload management systems, scheduling algorithms and strategies, benchmarking and profiling toolkits, and simulators is presented to establish the state of the art. Within this thesis a method to profile applications and workloads that leverages common open-source tools on HPC systems is presented. The resultant toolkit is used to profile the University of Huddersfield workload. This workload forms the basis to evaluate the mouldable scheduler. The research includes advance computing paradigms such as utilising Artificial Intelligence methods to improve the efficiency of the scheduler, or Surge Computing, where workloads are scaled beyond institutional firewalls through elastic compute systems
    corecore