16 research outputs found

    Probabilistic grid scheduling based on job statistics and monitoring information

    Get PDF
    This transfer thesis presents a novel, probabilistic approach to scheduling applications on computational Grids based on their historical behaviour, current state of the Grid and predictions of the future execution times and resource utilisation of such applications. The work lays a foundation for enabling a more intuitive, user-friendly and effective scheduling technique termed deadline scheduling. Initial work has established motivation and requirements for a more efficient Grid scheduler, able to adaptively handle dynamic nature of the Grid resources and submitted workload. Preliminary scheduler research identified the need for a detailed monitoring of Grid resources on the process level, and for a tool to simulate non-deterministic behaviour and statistical properties of Grid applications. A simulation tool, GridLoader, has been developed to enable modelling of application loads similar to a number of typical Grid applications. GridLoader is able to simulate CPU utilisation, memory allocation and network transfers according to limits set through command line parameters or a configuration file. Its specific strength is in achieving set resource utilisation targets in a probabilistic manner, thus creating a dynamic environment, suitable for testing the scheduler’s adaptability and its prediction algorithm. To enable highly granular monitoring of Grid applications, a monitoring framework based on the Ganglia Toolkit was developed and tested. The suite is able to collect resource usage information of individual Grid applications, integrate it into standard XML based information flow, provide visualisation through a Web portal, and export data into a format suitable for off-line analysis. The thesis also presents initial investigation of the utilisation of University College London Central Computing Cluster facility running Sun Grid Engine middleware. Feasibility of basic prediction concepts based on the historical information and process meta-data have been successfully established and possible scheduling improvements using such predictions identified. The thesis is structured as follows: Section 1 introduces Grid computing and its major concepts; Section 2 presents open research issues and specific focus of the author’s research; Section 3 gives a survey of the related literature, schedulers, monitoring tools and simulation packages; Section 4 presents the platform for author’s work – the Self-Organising Grid Resource management project; Sections 5 and 6 give detailed accounts of the monitoring framework and simulation tool developed; Section 7 presents the initial data analysis while Section 8.4 concludes the thesis with appendices and references

    Self-adaptive Grid Resource Monitoring and discovery

    Get PDF
    The Grid provides a novel platform where the scientific and engineering communities can share data and computation across multiple administrative domains. There are several key services that must be offered by Grid middleware; one of them being the Grid Information Service( GIS). A GIS is a Grid middleware component which maintains information about hardware, software, services and people participating in a virtual organisation( VO). There is an inherent need in these systems for the delivery of reliable performance. This thesis describes a number of approaches which detail the development and application of a suite of benchmarks for the prediction of the process of resource discovery and monitoring on the Grid. A series of experimental studies of the characterisation of performance using benchmarking, are carried out. Several novel predictive algorithms are presented and evaluated in terms of their predictive error. Furthermore, predictive methods are developed which describe the behaviour of MDS2 for a variable number of user requests. The MDS is also extended to include job information from a local scheduler; this information is queried using requests of greatly varying complexity. The response of the MDS to these queries is then assessed in terms of several performance metrics. The benchmarking of the dynamic nature of information within MDS3 which is based on the Open Grid Services Architecture (OGSA), and also the successor to MDS2, is also carried out. The performance of both the pull and push query mechanisms is analysed. GridAdapt (Self-adaptive Grid Resource Monitoring) is a new system that is proposed, built upon the Globus MDS3 benchmarking. It offers self-adaptation, autonomy and admission control at the Index Service, whilst ensuring that the MIDS is not overloaded and can meet its quality-of-service,f or example,i n terms of its average response time for servicing synchronous queries and the total number of queries returned per unit time

    Self-adaptive Grid Resource Monitoring and discovery

    Get PDF
    The Grid provides a novel platform where the scientific and engineering communities can share data and computation across multiple administrative domains. There are several key services that must be offered by Grid middleware; one of them being the Grid Information Service( GIS). A GIS is a Grid middleware component which maintains information about hardware, software, services and people participating in a virtual organisation( VO). There is an inherent need in these systems for the delivery of reliable performance. This thesis describes a number of approaches which detail the development and application of a suite of benchmarks for the prediction of the process of resource discovery and monitoring on the Grid. A series of experimental studies of the characterisation of performance using benchmarking, are carried out. Several novel predictive algorithms are presented and evaluated in terms of their predictive error. Furthermore, predictive methods are developed which describe the behaviour of MDS2 for a variable number of user requests. The MDS is also extended to include job information from a local scheduler; this information is queried using requests of greatly varying complexity. The response of the MDS to these queries is then assessed in terms of several performance metrics. The benchmarking of the dynamic nature of information within MDS3 which is based on the Open Grid Services Architecture (OGSA), and also the successor to MDS2, is also carried out. The performance of both the pull and push query mechanisms is analysed. GridAdapt (Self-adaptive Grid Resource Monitoring) is a new system that is proposed, built upon the Globus MDS3 benchmarking. It offers self-adaptation, autonomy and admission control at the Index Service, whilst ensuring that the MIDS is not overloaded and can meet its quality-of-service,f or example,i n terms of its average response time for servicing synchronous queries and the total number of queries returned per unit time.EThOS - Electronic Theses Online ServiceUniversity of Warwick (UoW)GBUnited Kingdo

    Resource management for data streaming applications

    Get PDF
    This dissertation investigates novel middleware mechanisms for building streaming applications. Developing streaming applications is a challenging task because (i) they are continuous in nature; (ii) they require fusion of data coming from multiple sources to derive higher level information; (iii) they require efficient transport of data from/to distributed sources and sinks; (iv) they need access to heterogeneous resources spanning sensor networks and high performance computing; and (v) they are time critical in nature. My thesis is that an intuitive programming abstraction will make it easier to build dynamic, distributed, and ubiquitous data streaming applications. Moreover, such an abstraction will enable an efficient allocation of shared and heterogeneous computational resources thereby making it easier for domain experts to build these applications. In support of the thesis, I present a novel programming abstraction, called DFuse, that makes it easier to develop these applications. A domain expert only needs to specify the input and output connections to fusion channels, and the fusion functions. The subsystems developed in this dissertation take care of instantiating the application, allocating resources for the application (via the scheduling heuristic developed in this dissertation) and dynamically managing the resources (via the dynamic scheduling algorithm presented in this dissertation). Through extensive performance evaluation, I demonstrate that the resources are allocated efficiently to optimize the throughput and latency constraints of an application.Ph.D.Committee Chair: Ramachandran, Umakishore; Committee Member: Chervenak, Ann; Committee Member: Cooper, Brian; Committee Member: Liu, Ling; Committee Member: Schwan, Karste

    Grid'5000: a large scale and highly reconfigurable grid experimental testbed

    Full text link
    Large scale distributed systems such as Grids are difficult to study from theoretical models and simulators only. Most Grids deployed at large scale are production plat-forms that are inappropriate research tools because of their limited reconfiguration, control and monitoring capa-bilities. In this paper, we present Grid’5000, a 5000 CPU nation-wide infrastructure for research in Grid computing. Grid’5000 is designed to provide a scientific tool for com-puter scientists similar to the large-scale instruments used by physicists, astronomers, and biologists. We describe the motivations, design considerations, architec-ture, control, and monitoring infrastructure of this experi-mental platform. We present configuration examples and performance results for the reconfiguration subsystem

    Autonomous grid scheduling using probabilistic job runtime scheduling

    Get PDF
    Computational Grids are evolving into a global, service-oriented architecture – a universal platform for delivering future computational services to a range of applications of varying complexity and resource requirements. The thesis focuses on developing a new scheduling model for general-purpose, utility clusters based on the concept of user requested job completion deadlines. In such a system, a user would be able to request each job to finish by a certain deadline, and possibly to a certain monetary cost. Implementing deadline scheduling is dependent on the ability to predict the execution time of each queued job, and on an adaptive scheduling algorithm able to use those predictions to maximise deadline adherence. The thesis proposes novel solutions to these two problems and documents their implementation in a largely autonomous and self-managing way. The starting point of the work is an extensive analysis of a representative Grid workload revealing consistent workflow patterns, usage cycles and correlations between the execution times of jobs and its properties commonly collected by the Grid middleware for accounting purposes. An automated approach is proposed to identify these dependencies and use them to partition the highly variable workload into subsets of more consistent and predictable behaviour. A range of time-series forecasting models, applied in this context for the first time, were used to model the job execution times as a function of their historical behaviour and associated properties. Based on the resulting predictions of job runtimes a novel scheduling algorithm is able to estimate the latest job start time necessary to meet the requested deadline and sort the queue accordingly to minimise the amount of deadline overrun. The testing of the proposed approach was done using the actual job trace collected from a production Grid facility. The best performing execution time predictor (the auto-regressive moving average method) coupled to workload partitioning based on three simultaneous job properties returned the median absolute percentage error centroid of only 4.75%. This level of prediction accuracy enabled the proposed deadline scheduling method to reduce the average deadline overrun time ten-fold compared to the benchmark batch scheduler. Overall, the thesis demonstrates that deadline scheduling of computational jobs on the Grid is achievable using statistical forecasting of job execution times based on historical information. The proposed approach is easily implementable, substantially self-managing and better matched to the human workflow making it well suited for implementation in the utility Grids of the future

    Autonomous grid scheduling using probabilistic job runtime forecasting.

    Get PDF
    Computational Grids are evolving into a global, service-oriented architecture a universal platform for delivering future computational services to a range of applications of varying complexity and resource requirements. The thesis focuses on developing a new scheduling model for general-purpose, utility clusters based on the concept of user requested job completion deadlines. In such a system, a user would be able to request each job to finish by a certain deadline. and possibly to a certain monetary cost. Implementing deadline scheduling is dependent on the ability to predict the execution time of each queued job. and on an adaptive scheduling algorithm able to use those predictions to maximise deadline adherence. The thesis proposes novel solutions to these two problems and documents their implementation in a largely autonomous and self-managing way. The starting point of the work is an extensive analysis of a representative Grid workload revealing consistent workflow patterns, usage cycles and correlations between the execution times of jobs and its properties commonly collected by the Grid middleware for accounting purposes. An automated approach is proposed to identify these dependencies and use them to partition the highly variable workload into subsets of more consistent and predictable behaviour. A range of time-series forecasting models, applied in this context for the first time, were used to model the job execution times as a function of their historical behaviour and associated properties. Based on the resulting predictions of job runtimes a novel scheduling algorithm is able to estimate the latest job start time necessary to meet the requested deadline and sort the queue accordingly to minimise the amount of deadline overrun. The testing of the proposed approach was done using the actual job trace collected from a production Grid facility. The best performing execution time predictor (the auto-regressive moving average method) coupled to workload partitioning based on three simultaneous job properties returned the median absolute percentage error eentroid of only 4.75CX. This level of prediction accuracy enabled the proposed deadline scheduling method to reduce the average deadline overrun time ten-fold compared to the benchmark batch scheduler. Overall, the thesis demonstrates that deadline scheduling of computational jobs on the Grid is achievable using statistical forecasting of job execution times based on historical information. The proposed approach is easily implementable, substantially self-managing and better matched to the human workflow making it well suited for implementation in the utility Grids of the future

    Semantic-Based, Scalable, Decentralized and Dynamic Resource Discovery for Internet-Based Distributed System

    Get PDF
    Resource Discovery (RD) is a key issue in Internet-based distributed sytems such as grid. RD is about locating an appropriate resource/service type that matches the user's application requirements. This is very important, as resource reservation and task scheduling are based on it. Unfortunately, RD in grid is very challenging as resources and users are distributed, resources are heterogeneous in their platforms, status of the resources is dynamic (resources can join or leave the system without any prior notice) and most recently the introduction of a new type of grid called intergrid (grid of grids) with the use of multi middlewares. Such situation requires an RD system that has rich interoperability, scalability, decentralization and dynamism features. However, existing grid RD systems have difficulties to attain these features. Not only that, they lack the review and evaluation studies, which may highlight the gap in achieving the required features. Therefore, this work discusses the problem associated with intergrid RD from two perspectives. First, reviewing and classifying the current grid RD systems in such a way that may be useful for discussing and comparing them. Second, propose a novel RD framework that has the aforementioned required RD features. In the former, we mainly focus on the studies that aim to achieve interoperability in the first place, which are known as RD systems that use semantic information (semantic technology). In particular, we classify such systems based on their qualitative use of the semantic information. We evaluate the classified studies based on their degree of accomplishment of interoperability and the other RD requirements, and draw the future research direction of this field. Meanwhile in the latter, we name the new framework as semantic-based scalable decentralized dynamic RD. The framework further contains two main components which are service description, and service registration and discovery models. The earlier consists of a set of ontologies and services. Ontologies are used as a data model for service description, whereas the services are to accomplish the description process. The service registration is also based on ontology, where nodes of the service (service providers) are classified to some classes according to the ontology concepts, which means each class represents a concept in the ontology. Each class has a head, which is elected among its own class I nodes/members. Head plays the role of a registry in its class and communicates with I the other heads of the classes in a peer to peer manner during the discovery process. We further introduce two intelligent agents to automate the discovery process which are Request Agent (RA) and Description Agent (DA). Eaclj. node is supposed to have both agents. DA describes the service capabilities based on the ontology, and RA I carries the service requests based on the ontology as well. We design a service search I algorithm for the RA that starts the service look up from the class of request origin first, then to the other classes. We finally evaluate the performance of our framework ~ith extensive simulation experiments, the result of which confirms the effectiveness of the proposed system in satisfying the required RD features (interoperability, scalability, decentralization and dynamism). In short, our main contributions are outlined new key taxonomy for the semantic-based grid RD studies; an interoperable semantic description RD component model for intergrid services metadata representation; a semantic distributed registry architecture for indexing service metadata; and an agent-qased service search and selection algorithm. Vll
    corecore