2,214 research outputs found

    Policy-based techniques for self-managing parallel applications

    Get PDF
    This paper presents an empirical investigation of policy-based self-management techniques for parallel applications executing in loosely-coupled environments. The dynamic and heterogeneous nature of these environments is discussed and the special considerations for parallel applications are identified. An adaptive strategy for the run-time deployment of tasks of parallel applications is presented. The strategy is based on embedding numerous policies which are informed by contextual and environmental inputs. The policies govern various aspects of behaviour, enhancing flexibility so that the goals of efficiency and performance are achieved despite high levels of environmental variability. A prototype self-managing parallel application is used as a vehicle to explore the feasibility and benefits of the strategy. In particular, several aspects of stability are investigated. The implementation and behaviour of three policies are discussed and sample results examined

    Many-Task Computing and Blue Waters

    Full text link
    This report discusses many-task computing (MTC) generically and in the context of the proposed Blue Waters systems, which is planned to be the largest NSF-funded supercomputer when it begins production use in 2012. The aim of this report is to inform the BW project about MTC, including understanding aspects of MTC applications that can be used to characterize the domain and understanding the implications of these aspects to middleware and policies. Many MTC applications do not neatly fit the stereotypes of high-performance computing (HPC) or high-throughput computing (HTC) applications. Like HTC applications, by definition MTC applications are structured as graphs of discrete tasks, with explicit input and output dependencies forming the graph edges. However, MTC applications have significant features that distinguish them from typical HTC applications. In particular, different engineering constraints for hardware and software must be met in order to support these applications. HTC applications have traditionally run on platforms such as grids and clusters, through either workflow systems or parallel programming systems. MTC applications, in contrast, will often demand a short time to solution, may be communication intensive or data intensive, and may comprise very short tasks. Therefore, hardware and software for MTC must be engineered to support the additional communication and I/O and must minimize task dispatch overheads. The hardware of large-scale HPC systems, with its high degree of parallelism and support for intensive communication, is well suited for MTC applications. However, HPC systems often lack a dynamic resource-provisioning feature, are not ideal for task communication via the file system, and have an I/O system that is not optimized for MTC-style applications. Hence, additional software support is likely to be required to gain full benefit from the HPC hardware

    A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing

    Full text link
    Data Grids have been adopted as the platform for scientific communities that need to share, access, transport, process and manage large data collections distributed worldwide. They combine high-end computing technologies with high-performance networking and wide-area storage management techniques. In this paper, we discuss the key concepts behind Data Grids and compare them with other data sharing and distribution paradigms such as content delivery networks, peer-to-peer networks and distributed databases. We then provide comprehensive taxonomies that cover various aspects of architecture, data transportation, data replication and resource allocation and scheduling. Finally, we map the proposed taxonomy to various Data Grid systems not only to validate the taxonomy but also to identify areas for future exploration. Through this taxonomy, we aim to categorise existing systems to better understand their goals and their methodology. This would help evaluate their applicability for solving similar problems. This taxonomy also provides a "gap analysis" of this area through which researchers can potentially identify new issues for investigation. Finally, we hope that the proposed taxonomy and mapping also helps to provide an easy way for new practitioners to understand this complex area of research.Comment: 46 pages, 16 figures, Technical Repor

    Hybrid meta-heuristic algorithms for independent job scheduling in grid computing

    Get PDF
    The file attached to this record is the author's final peer reviewed version. The Publisher's final version can be found by following the DOI link.The term ’grid computing’ is used to describe an infrastructure that connects geographically distributed computers and heterogeneous platforms owned by multiple organizations allowing their computational power, storage capabilities and other resources to be selected and shared. The job scheduling problem is recognized as being one of the most important and challenging issues in grid computing environments. This paper proposes two strongly coupled hybrid meta-heuristic schedulers. The first scheduler combines Ant Colony Optimisation and Variable Neighbourhood Search in which the former acts as the primary algorithm which, during its execution, calls the latter as a supporting algorithm, while the second merges the Genetic Algorithm with Variable Neighbourhood Search in the same fashion. Several experiments were carried out to analyse the performance of the proposed schedulers in terms of minimizing the makespan using well known benchmarks. The experiments show that the proposed schedulers achieved impressive results compared to other selected approaches from the bibliography

    Hybrid Meta-heuristic Algorithms for Static and Dynamic Job Scheduling in Grid Computing

    Get PDF
    The term ’grid computing’ is used to describe an infrastructure that connects geographically distributed computers and heterogeneous platforms owned by multiple organizations allowing their computational power, storage capabilities and other resources to be selected and shared. Allocating jobs to computational grid resources in an efficient manner is one of the main challenges facing any grid computing system; this allocation is called job scheduling in grid computing. This thesis studies the application of hybrid meta-heuristics to the job scheduling problem in grid computing, which is recognized as being one of the most important and challenging issues in grid computing environments. Similar to job scheduling in traditional computing systems, this allocation is known to be an NPhard problem. Meta-heuristic approaches such as the Genetic Algorithm (GA), Variable Neighbourhood Search (VNS) and Ant Colony Optimisation (ACO) have all proven their effectiveness in solving different scheduling problems. However, hybridising two or more meta-heuristics shows better performance than applying a stand-alone approach. The new high level meta-heuristic will inherit the best features of the hybridised algorithms, increasing the chances of skipping away from local minima, and hence enhancing the overall performance. In this thesis, the application of VNS for the job scheduling problem in grid computing is introduced. Four new neighbourhood structures, together with a modified local search, are proposed. The proposed VNS is hybridised using two meta-heuristic methods, namely GA and ACO, in loosely and strongly coupled fashions, yielding four new sequential hybrid meta-heuristic algorithms for the problem of static and dynamic single-objective independent batch job scheduling in grid computing. For the static version of the problem, several experiments were carried out to analyse the performance of the proposed schedulers in terms of minimising the makespan using well known benchmarks. The experiments show that the proposed schedulers achieved impressive results compared to other traditional, heuristic and meta-heuristic approaches selected from the bibliography. To model the dynamic version of the problem, a simple simulator, which uses the rescheduling technique, is designed and new problem instances are generated, by using a well-known methodology, to evaluate the performance of the proposed hybrid schedulers. The experimental results show that the use of rescheduling provides significant improvements in terms of the makespan compared to other non-rescheduling approaches

    A hyper-heuristic for adaptive scheduling in computational grids

    Get PDF
    In this paper we present the design and implementation of an hyper-heuristic for efficiently scheduling independent jobs in computational grids. An efficient scheduling of jobs to grid resources depends on many parameters, among others, the characteristics of the resources and jobs (such as computing capacity, consistency of computing, workload, etc.). Moreover, these characteristics change over time due to the dynamic nature of grid environment, therefore the planning of jobs to resources should be adaptively done. Existing ad hoc scheduling methods (batch and immediate mode) have shown their efficacy for certain types of resource and job characteristics. However, as stand alone methods, they are not able to produce the best planning of jobs to resources for different types of Grid resources and job characteristics. In this work we have designed and implemented a hyper-heuristic that uses a set of ad hoc (immediate and batch mode) scheduling methods to provide the scheduling of jobs to Grid resources according to the Grid and job characteristics. The hyper-heuristic is a high level algorithm, which examines the state and characteristics of the Grid system (jobs and resources), and selects and applies the ad hoc method that yields the best planning of jobs. The resulting hyper-heuristic based scheduler can be thus used to develop network-aware applications that need efficient planning of jobs to resources. The hyper-heuristic has been tested and evaluated in a dynamic setting through a prototype of a Grid simulator. The experimental evaluation showed the usefulness of the hyper-heuristic for planning of jobs to resources as compared to planning without knowledge of the resource and job characteristics.Peer ReviewedPostprint (author's final draft

    Agentless robust load sharing strategy for utilising hetero-geneous resources over wide area network

    Get PDF
    Resource monitoring and performance prediction services have always been regarded as important keys to improving the performance of load sharing strategy. However, the traditional methodologies usually require specific performance information, which can only be collected by installing proprietary agents on all participating resources. This requirement of implementing a single unified monitoring service may not be feasible because of the differences in the underlying systems and organisation policies. To address this problem, we define a new load sharing strategy which bases the load decision on a simple performance estimation that can be measured easily at the coordinator node. Our proposed strategy relies on a stage-based dynamic task allocation to handle the imprecision of our performance estimation and to correct load distribution on-the-fly. The simulation results showed that the performance of our strategy is comparable or better than traditional strategies, especially when the performance information from the monitoring service is not accurate

    OGSA first impressions: a case study re-engineering a scientific applicationwith the open grid services architecture

    Get PDF
    We present a case study of our experience re-engineeringa scientific application using the Open Grid Services Architecture(OGSA), a new specification for developing Gridapplications using web service technologies such as WSDLand SOAP. During the last decade, UCL?s Chemistry departmenthas developed a computational approach for predictingthe crystal structures of small molecules. However,each search involves running large iterations of computationallyexpensive calculations and currently takes a fewmonths to perform. Making use of early implementationsof the OGSA specification we have wrapped the Fortranbinaries into OGSI-compliant service interfaces to exposethe existing scientific application as a set of loosely coupledweb services. We show how the OGSA implementationfacilitates the distribution of such applications across alarge network, radically improving performance of the systemthrough parallel CPU capacity, coordinated resourcemanagement and automation of the computational process.We discuss the difficulties that we encountered turning Fortranexecutables into OGSA services and delivering a robust,scalable system. One unusual aspect of our approachis the way we transfer input and output data for the Fortrancodes. Instead of employing a file transfer service wetransform the XML encoded data in the SOAP message tonative file format, where possible using XSLT stylesheets.We also discuss a computational workflow service that enablesusers to distribute and manage parts of the computationalprocess across different clusters and administrativedomains. We examine how our experience re-engineeringthe polymorph prediction application led to this approachand to what extent our efforts have succeeded

    Execution time supports for adaptive scientific algorithms on distributed memory machines

    Get PDF
    Optimizations are considered that are required for efficient execution of code segments that consists of loops over distributed data structures. The PARTI (Parallel Automated Runtime Toolkit at ICASE) execution time primitives are designed to carry out these optimizations and can be used to implement a wide range of scientific algorithms on distributed memory machines. These primitives allow the user to control array mappings in a way that gives an appearance of shared memory. Computations can be based on a global index set. Primitives are used to carry out gather and scatter operations on distributed arrays. Communications patterns are derived at runtime, and the appropriate send and receive messages are automatically generated

    Software Support for Irregular and Loosely Synchronous Problems

    Get PDF
    A large class of scientific and engineering applications may be classified as irregular and loosely synchronous from the perspective of parallel processing. We present a partial classification of such problems. This classification has motivated us to enhance Fortran D to provide language support for irregular, loosely synchronous problems. We present techniques for parallelization of such problems in the context of Fortran D
    corecore