2,214 research outputs found
Policy-based techniques for self-managing parallel applications
This paper presents an empirical investigation of policy-based self-management techniques for parallel applications executing in loosely-coupled environments. The dynamic and heterogeneous nature of these environments is discussed and the special considerations for parallel applications are identified. An adaptive strategy for the run-time deployment of tasks of parallel applications is presented. The strategy is based on embedding numerous policies which are informed by contextual and environmental inputs. The policies govern various aspects of behaviour, enhancing flexibility so that the goals of efficiency and performance are achieved despite high levels of environmental variability. A prototype self-managing parallel application is used as a vehicle to explore the feasibility and benefits of the strategy. In particular, several aspects of stability are investigated. The implementation and behaviour of three policies are discussed and sample results examined
Many-Task Computing and Blue Waters
This report discusses many-task computing (MTC) generically and in the
context of the proposed Blue Waters systems, which is planned to be the largest
NSF-funded supercomputer when it begins production use in 2012. The aim of this
report is to inform the BW project about MTC, including understanding aspects
of MTC applications that can be used to characterize the domain and
understanding the implications of these aspects to middleware and policies.
Many MTC applications do not neatly fit the stereotypes of high-performance
computing (HPC) or high-throughput computing (HTC) applications. Like HTC
applications, by definition MTC applications are structured as graphs of
discrete tasks, with explicit input and output dependencies forming the graph
edges. However, MTC applications have significant features that distinguish
them from typical HTC applications. In particular, different engineering
constraints for hardware and software must be met in order to support these
applications. HTC applications have traditionally run on platforms such as
grids and clusters, through either workflow systems or parallel programming
systems. MTC applications, in contrast, will often demand a short time to
solution, may be communication intensive or data intensive, and may comprise
very short tasks. Therefore, hardware and software for MTC must be engineered
to support the additional communication and I/O and must minimize task dispatch
overheads. The hardware of large-scale HPC systems, with its high degree of
parallelism and support for intensive communication, is well suited for MTC
applications. However, HPC systems often lack a dynamic resource-provisioning
feature, are not ideal for task communication via the file system, and have an
I/O system that is not optimized for MTC-style applications. Hence, additional
software support is likely to be required to gain full benefit from the HPC
hardware
A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing
Data Grids have been adopted as the platform for scientific communities that
need to share, access, transport, process and manage large data collections
distributed worldwide. They combine high-end computing technologies with
high-performance networking and wide-area storage management techniques. In
this paper, we discuss the key concepts behind Data Grids and compare them with
other data sharing and distribution paradigms such as content delivery
networks, peer-to-peer networks and distributed databases. We then provide
comprehensive taxonomies that cover various aspects of architecture, data
transportation, data replication and resource allocation and scheduling.
Finally, we map the proposed taxonomy to various Data Grid systems not only to
validate the taxonomy but also to identify areas for future exploration.
Through this taxonomy, we aim to categorise existing systems to better
understand their goals and their methodology. This would help evaluate their
applicability for solving similar problems. This taxonomy also provides a "gap
analysis" of this area through which researchers can potentially identify new
issues for investigation. Finally, we hope that the proposed taxonomy and
mapping also helps to provide an easy way for new practitioners to understand
this complex area of research.Comment: 46 pages, 16 figures, Technical Repor
Hybrid meta-heuristic algorithms for independent job scheduling in grid computing
The file attached to this record is the author's final peer reviewed version. The Publisher's final version can be found by following the DOI link.The term ’grid computing’ is used to describe an infrastructure that connects geographically distributed computers and heterogeneous platforms owned by multiple organizations allowing their computational power, storage capabilities and other resources to be selected and shared. The job scheduling problem is recognized as being one of the most important and challenging issues in grid computing environments. This paper proposes two strongly coupled hybrid meta-heuristic schedulers. The first scheduler combines Ant Colony Optimisation and Variable Neighbourhood Search in which the former acts as the primary algorithm which, during its execution, calls the latter as a supporting algorithm, while the second merges the Genetic Algorithm with Variable Neighbourhood Search in the same fashion. Several experiments were carried out to analyse the performance of the proposed schedulers in terms of minimizing the makespan using well known benchmarks. The experiments show that the proposed schedulers achieved impressive results compared to other selected approaches from the bibliography
Hybrid Meta-heuristic Algorithms for Static and Dynamic Job Scheduling in Grid Computing
The term ’grid computing’ is used to describe an infrastructure that connects geographically
distributed computers and heterogeneous platforms owned by multiple organizations
allowing their computational power, storage capabilities and other resources to be selected
and shared. Allocating jobs to computational grid resources in an efficient manner is one
of the main challenges facing any grid computing system; this allocation is called job
scheduling in grid computing. This thesis studies the application of hybrid meta-heuristics
to the job scheduling problem in grid computing, which is recognized as being one of
the most important and challenging issues in grid computing environments. Similar to
job scheduling in traditional computing systems, this allocation is known to be an NPhard
problem. Meta-heuristic approaches such as the Genetic Algorithm (GA), Variable
Neighbourhood Search (VNS) and Ant Colony Optimisation (ACO) have all proven their
effectiveness in solving different scheduling problems. However, hybridising two or more
meta-heuristics shows better performance than applying a stand-alone approach. The new
high level meta-heuristic will inherit the best features of the hybridised algorithms, increasing
the chances of skipping away from local minima, and hence enhancing the overall
performance. In this thesis, the application of VNS for the job scheduling problem in grid
computing is introduced. Four new neighbourhood structures, together with a modified
local search, are proposed. The proposed VNS is hybridised using two meta-heuristic
methods, namely GA and ACO, in loosely and strongly coupled fashions, yielding four
new sequential hybrid meta-heuristic algorithms for the problem of static and dynamic
single-objective independent batch job scheduling in grid computing. For the static version
of the problem, several experiments were carried out to analyse the performance of the
proposed schedulers in terms of minimising the makespan using well known benchmarks.
The experiments show that the proposed schedulers achieved impressive results compared
to other traditional, heuristic and meta-heuristic approaches selected from the bibliography.
To model the dynamic version of the problem, a simple simulator, which uses
the rescheduling technique, is designed and new problem instances are generated, by
using a well-known methodology, to evaluate the performance of the proposed hybrid
schedulers. The experimental results show that the use of rescheduling provides significant
improvements in terms of the makespan compared to other non-rescheduling approaches
A hyper-heuristic for adaptive scheduling in computational grids
In this paper we present the design and implementation of an hyper-heuristic for efficiently scheduling independent jobs in computational grids. An efficient scheduling of jobs to grid resources depends on many parameters, among others, the characteristics of the resources and jobs (such as computing capacity, consistency of computing, workload, etc.). Moreover, these characteristics change over time due to the dynamic nature of grid environment, therefore the planning of jobs to resources should be adaptively done. Existing ad hoc scheduling methods (batch and immediate mode) have shown their efficacy for certain types of resource and job characteristics. However, as stand alone methods, they are not able to produce the best planning of jobs to resources for different types of Grid resources and job characteristics. In this work we have designed and implemented a hyper-heuristic that uses a set of ad hoc (immediate and batch mode) scheduling methods to provide the scheduling of jobs to Grid resources according to the Grid and job characteristics. The hyper-heuristic is a high level algorithm, which examines the state and characteristics of the Grid system (jobs and resources), and selects and applies the ad hoc method that yields the best planning of jobs. The resulting hyper-heuristic based scheduler can be thus used to develop network-aware applications that need efficient planning of jobs to resources. The hyper-heuristic has been tested and evaluated in a dynamic setting through a prototype of a Grid simulator. The experimental evaluation showed the usefulness of the hyper-heuristic for planning of jobs to resources as compared to planning without knowledge of the resource and job characteristics.Peer ReviewedPostprint (author's final draft
Agentless robust load sharing strategy for utilising hetero-geneous resources over wide area network
Resource monitoring and performance prediction services have always been regarded as important keys to improving the performance of load sharing strategy. However, the traditional methodologies usually require specific performance information, which can only be collected by installing proprietary agents on all participating resources. This requirement of implementing a single unified monitoring service may not be feasible because of the differences in the underlying systems and organisation policies. To address this problem, we define a new load sharing strategy which bases the load decision on a simple performance estimation that can be measured easily at the coordinator node. Our proposed strategy relies on a stage-based dynamic task allocation to handle the imprecision of our performance estimation and to correct load distribution on-the-fly. The simulation results showed that the performance of our strategy is comparable or better than traditional strategies, especially when the performance information from the monitoring service is not accurate
OGSA first impressions: a case study re-engineering a scientific applicationwith the open grid services architecture
We present a case study of our experience re-engineeringa scientific application using the Open Grid Services Architecture(OGSA), a new specification for developing Gridapplications using web service technologies such as WSDLand SOAP. During the last decade, UCL?s Chemistry departmenthas developed a computational approach for predictingthe crystal structures of small molecules. However,each search involves running large iterations of computationallyexpensive calculations and currently takes a fewmonths to perform. Making use of early implementationsof the OGSA specification we have wrapped the Fortranbinaries into OGSI-compliant service interfaces to exposethe existing scientific application as a set of loosely coupledweb services. We show how the OGSA implementationfacilitates the distribution of such applications across alarge network, radically improving performance of the systemthrough parallel CPU capacity, coordinated resourcemanagement and automation of the computational process.We discuss the difficulties that we encountered turning Fortranexecutables into OGSA services and delivering a robust,scalable system. One unusual aspect of our approachis the way we transfer input and output data for the Fortrancodes. Instead of employing a file transfer service wetransform the XML encoded data in the SOAP message tonative file format, where possible using XSLT stylesheets.We also discuss a computational workflow service that enablesusers to distribute and manage parts of the computationalprocess across different clusters and administrativedomains. We examine how our experience re-engineeringthe polymorph prediction application led to this approachand to what extent our efforts have succeeded
Execution time supports for adaptive scientific algorithms on distributed memory machines
Optimizations are considered that are required for efficient execution of code segments that consists of loops over distributed data structures. The PARTI (Parallel Automated Runtime Toolkit at ICASE) execution time primitives are designed to carry out these optimizations and can be used to implement a wide range of scientific algorithms on distributed memory machines. These primitives allow the user to control array mappings in a way that gives an appearance of shared memory. Computations can be based on a global index set. Primitives are used to carry out gather and scatter operations on distributed arrays. Communications patterns are derived at runtime, and the appropriate send and receive messages are automatically generated
Software Support for Irregular and Loosely Synchronous Problems
A large class of scientific and engineering applications may be classified as irregular and loosely synchronous from the perspective of parallel processing. We present a partial classification of such problems. This classification has motivated us to enhance Fortran D to provide language support for irregular, loosely synchronous problems. We present techniques for parallelization of such problems in the context of Fortran D
- …