16 research outputs found
Probabilistic grid scheduling based on job statistics and monitoring information
This transfer thesis presents a novel, probabilistic approach to scheduling applications on computational Grids based on their historical behaviour, current state of the Grid and predictions of the future execution times and resource utilisation of such applications. The work lays a foundation for enabling a more intuitive, user-friendly and effective scheduling technique termed deadline scheduling.
Initial work has established motivation and requirements for a more efficient Grid scheduler, able to adaptively handle dynamic nature of the Grid resources and submitted workload. Preliminary scheduler research identified the need for a detailed monitoring of Grid resources on the process level, and for a tool to simulate non-deterministic behaviour and statistical properties of Grid applications.
A simulation tool, GridLoader, has been developed to enable modelling of application loads similar to a number of typical Grid applications. GridLoader is able to simulate CPU utilisation, memory allocation and network transfers according to limits set through command line parameters or a configuration file. Its specific strength is in achieving set resource utilisation targets in a probabilistic manner, thus creating a dynamic environment, suitable for testing the scheduler’s adaptability and its prediction algorithm.
To enable highly granular monitoring of Grid applications, a monitoring framework based on the Ganglia Toolkit was developed and tested. The suite is able to collect resource usage information of individual Grid applications, integrate it into standard XML based information flow, provide visualisation through a Web portal, and export data into a format suitable for off-line analysis.
The thesis also presents initial investigation of the utilisation of University College London Central Computing Cluster facility running Sun Grid Engine middleware. Feasibility of basic prediction concepts based on the historical information and process meta-data have been successfully established and possible scheduling improvements using such predictions identified.
The thesis is structured as follows: Section 1 introduces Grid computing and its major concepts; Section 2 presents open research issues and specific focus of the author’s research; Section 3 gives a survey of the related literature, schedulers, monitoring tools and simulation packages; Section 4 presents the platform for author’s work – the Self-Organising Grid Resource management project; Sections 5 and 6 give detailed accounts of the monitoring framework and simulation tool developed; Section 7 presents the initial data analysis while Section 8.4 concludes the thesis with appendices and references
Self-adaptive Grid Resource Monitoring and discovery
The Grid provides a novel platform where the scientific and engineering communities can share data and computation across multiple administrative domains. There are several key services that must be offered by Grid middleware; one of them being the Grid Information Service( GIS). A GIS is a Grid middleware component which maintains information about hardware, software, services and people participating in a virtual organisation( VO). There is an inherent need in these systems for the delivery of reliable performance. This thesis describes a number of approaches which detail the development and application of a suite of benchmarks for the prediction of the process of resource discovery and monitoring on the Grid. A series of experimental studies of the characterisation of performance using benchmarking, are carried out. Several novel predictive algorithms are presented and evaluated in terms of their predictive error. Furthermore, predictive methods are developed which describe the behaviour of MDS2 for a variable number of user requests. The MDS is also extended to include job information from a local scheduler; this information is queried using requests of greatly varying complexity. The response of the MDS to these queries is then assessed in terms of several performance metrics.
The benchmarking of the dynamic nature of information within MDS3 which is based on the Open Grid Services Architecture (OGSA), and also the successor to MDS2, is also carried out. The performance of both the pull and push query mechanisms is analysed. GridAdapt (Self-adaptive Grid Resource Monitoring) is a new system that is proposed, built upon the Globus MDS3 benchmarking. It offers self-adaptation, autonomy and admission control at the Index Service, whilst ensuring that the MIDS is not overloaded and can meet its quality-of-service,f or example,i n terms of its average response time for servicing synchronous queries and the total number of queries returned per unit time
Self-adaptive Grid Resource Monitoring and discovery
The Grid provides a novel platform where the scientific and engineering communities can share data and computation across multiple administrative domains. There are several key services that must be offered by Grid middleware; one of them being the Grid Information Service( GIS). A GIS is a Grid middleware component which maintains information about hardware, software, services and people participating in a virtual organisation( VO). There is an inherent need in these systems for the delivery of reliable performance. This thesis describes a number of approaches which detail the development and application of a suite of benchmarks for the prediction of the process of resource discovery and monitoring on the Grid. A series of experimental studies of the characterisation of performance using benchmarking, are carried out. Several novel predictive algorithms are presented and evaluated in terms of their predictive error. Furthermore, predictive methods are developed which describe the behaviour of MDS2 for a variable number of user requests. The MDS is also extended to include job information from a local scheduler; this information is queried using requests of greatly varying complexity. The response of the MDS to these queries is then assessed in terms of several performance metrics. The benchmarking of the dynamic nature of information within MDS3 which is based on the Open Grid Services Architecture (OGSA), and also the successor to MDS2, is also carried out. The performance of both the pull and push query mechanisms is analysed. GridAdapt (Self-adaptive Grid Resource Monitoring) is a new system that is proposed, built upon the Globus MDS3 benchmarking. It offers self-adaptation, autonomy and admission control at the Index Service, whilst ensuring that the MIDS is not overloaded and can meet its quality-of-service,f or example,i n terms of its average response time for servicing synchronous queries and the total number of queries returned per unit time.EThOS - Electronic Theses Online ServiceUniversity of Warwick (UoW)GBUnited Kingdo
Resource management for data streaming applications
This dissertation investigates novel middleware mechanisms for building streaming
applications. Developing streaming applications is a challenging task
because (i) they are continuous in nature; (ii) they require fusion of data coming from multiple sources to derive
higher level information; (iii) they require
efficient transport of data from/to distributed sources and sinks;
(iv) they need access to heterogeneous resources spanning sensor networks and high
performance computing; and (v) they are time critical in nature. My thesis is that an
intuitive programming abstraction will make it easier to build dynamic,
distributed, and ubiquitous data streaming applications. Moreover, such an abstraction will
enable an efficient allocation of shared and heterogeneous computational resources thereby making it easier for
domain experts to build these applications. In support of the thesis, I present a novel programming abstraction, called DFuse,
that makes it easier to develop these applications. A domain expert only needs to specify the input and output
connections to fusion channels, and the fusion functions. The subsystems developed in
this dissertation take care of instantiating the application,
allocating resources for the application (via the scheduling heuristic developed in this dissertation) and dynamically
managing the resources (via the dynamic scheduling algorithm presented in this dissertation). Through extensive
performance evaluation, I demonstrate that the resources are allocated efficiently to optimize the throughput and latency
constraints of an application.Ph.D.Committee Chair: Ramachandran, Umakishore; Committee Member: Chervenak, Ann; Committee Member: Cooper, Brian; Committee Member: Liu, Ling; Committee Member: Schwan, Karste
Grid'5000: a large scale and highly reconfigurable grid experimental testbed
Large scale distributed systems such as Grids are difficult to study from theoretical models and simulators only. Most Grids deployed at large scale are production plat-forms that are inappropriate research tools because of their limited reconfiguration, control and monitoring capa-bilities. In this paper, we present Grid’5000, a 5000 CPU nation-wide infrastructure for research in Grid computing. Grid’5000 is designed to provide a scientific tool for com-puter scientists similar to the large-scale instruments used by physicists, astronomers, and biologists. We describe the motivations, design considerations, architec-ture, control, and monitoring infrastructure of this experi-mental platform. We present configuration examples and performance results for the reconfiguration subsystem
Autonomous grid scheduling using probabilistic job runtime scheduling
Computational Grids are evolving into a global, service-oriented architecture –
a universal platform for delivering future computational services to a range of
applications of varying complexity and resource requirements. The thesis focuses
on developing a new scheduling model for general-purpose, utility clusters
based on the concept of user requested job completion deadlines. In such a
system, a user would be able to request each job to finish by a certain deadline,
and possibly to a certain monetary cost. Implementing deadline scheduling is
dependent on the ability to predict the execution time of each queued job, and
on an adaptive scheduling algorithm able to use those predictions to maximise
deadline adherence. The thesis proposes novel solutions to these two problems
and documents their implementation in a largely autonomous and self-managing
way.
The starting point of the work is an extensive analysis of a representative
Grid workload revealing consistent workflow patterns, usage cycles and correlations between the execution times of jobs and its properties commonly collected
by the Grid middleware for accounting purposes. An automated approach is
proposed to identify these dependencies and use them to partition the highly
variable workload into subsets of more consistent and predictable behaviour.
A range of time-series forecasting models, applied in this context for the first
time, were used to model the job execution times as a function of their historical
behaviour and associated properties. Based on the resulting predictions of job
runtimes a novel scheduling algorithm is able to estimate the latest job start
time necessary to meet the requested deadline and sort the queue accordingly to
minimise the amount of deadline overrun.
The testing of the proposed approach was done using the actual job trace
collected from a production Grid facility. The best performing execution time
predictor (the auto-regressive moving average method) coupled to workload
partitioning based on three simultaneous job properties returned the median
absolute percentage error centroid of only 4.75%. This level of prediction
accuracy enabled the proposed deadline scheduling method to reduce the average deadline overrun time ten-fold compared to the benchmark batch scheduler.
Overall, the thesis demonstrates that deadline scheduling of computational
jobs on the Grid is achievable using statistical forecasting of job execution times
based on historical information. The proposed approach is easily implementable,
substantially self-managing and better matched to the human workflow making
it well suited for implementation in the utility Grids of the future
Autonomous grid scheduling using probabilistic job runtime forecasting.
Computational Grids are evolving into a global, service-oriented architecture a universal platform for delivering future computational services to a range of applications of varying complexity and resource requirements. The thesis focuses on developing a new scheduling model for general-purpose, utility clusters based on the concept of user requested job completion deadlines. In such a system, a user would be able to request each job to finish by a certain deadline. and possibly to a certain monetary cost. Implementing deadline scheduling is dependent on the ability to predict the execution time of each queued job. and on an adaptive scheduling algorithm able to use those predictions to maximise deadline adherence. The thesis proposes novel solutions to these two problems and documents their implementation in a largely autonomous and self-managing way. The starting point of the work is an extensive analysis of a representative Grid workload revealing consistent workflow patterns, usage cycles and correlations between the execution times of jobs and its properties commonly collected by the Grid middleware for accounting purposes. An automated approach is proposed to identify these dependencies and use them to partition the highly variable workload into subsets of more consistent and predictable behaviour. A range of time-series forecasting models, applied in this context for the first time, were used to model the job execution times as a function of their historical behaviour and associated properties. Based on the resulting predictions of job runtimes a novel scheduling algorithm is able to estimate the latest job start time necessary to meet the requested deadline and sort the queue accordingly to minimise the amount of deadline overrun. The testing of the proposed approach was done using the actual job trace collected from a production Grid facility. The best performing execution time predictor (the auto-regressive moving average method) coupled to workload partitioning based on three simultaneous job properties returned the median absolute percentage error eentroid of only 4.75CX. This level of prediction accuracy enabled the proposed deadline scheduling method to reduce the average deadline overrun time ten-fold compared to the benchmark batch scheduler. Overall, the thesis demonstrates that deadline scheduling of computational jobs on the Grid is achievable using statistical forecasting of job execution times based on historical information. The proposed approach is easily implementable, substantially self-managing and better matched to the human workflow making it well suited for implementation in the utility Grids of the future
Semantic-Based, Scalable, Decentralized and Dynamic Resource Discovery for Internet-Based Distributed System
Resource Discovery (RD) is a key issue in Internet-based distributed sytems such as
grid. RD is about locating an appropriate resource/service type that matches the user's
application requirements. This is very important, as resource reservation and task
scheduling are based on it. Unfortunately, RD in grid is very challenging as resources
and users are distributed, resources are heterogeneous in their platforms, status of the
resources is dynamic (resources can join or leave the system without any prior notice)
and most recently the introduction of a new type of grid called intergrid (grid of grids)
with the use of multi middlewares. Such situation requires an RD system that has rich
interoperability, scalability, decentralization and dynamism features. However,
existing grid RD systems have difficulties to attain these features. Not only that, they
lack the review and evaluation studies, which may highlight the gap in achieving the
required features. Therefore, this work discusses the problem associated with intergrid
RD from two perspectives. First, reviewing and classifying the current grid RD
systems in such a way that may be useful for discussing and comparing them. Second,
propose a novel RD framework that has the aforementioned required RD features. In
the former, we mainly focus on the studies that aim to achieve interoperability in the
first place, which are known as RD systems that use semantic information (semantic
technology). In particular, we classify such systems based on their qualitative use of
the semantic information. We evaluate the classified studies based on their degree of
accomplishment of interoperability and the other RD requirements, and draw the
future research direction of this field. Meanwhile in the latter, we name the new
framework as semantic-based scalable decentralized dynamic RD. The framework
further contains two main components which are service description, and service
registration and discovery models. The earlier consists of a set of ontologies and
services. Ontologies are used as a data model for service description, whereas the
services are to accomplish the description process. The service registration is also based on ontology, where nodes of the service (service providers) are classified to
some classes according to the ontology concepts, which means each class represents a
concept in the ontology. Each class has a head, which is elected among its own class
I
nodes/members. Head plays the role of a registry in its class and communicates with
I
the other heads of the classes in a peer to peer manner during the discovery process.
We further introduce two intelligent agents to automate the discovery process which
are Request Agent (RA) and Description Agent (DA). Eaclj. node is supposed to have
both agents. DA describes the service capabilities based on the ontology, and RA
I
carries the service requests based on the ontology as well. We design a service search
I
algorithm for the RA that starts the service look up from the class of request origin
first, then to the other classes.
We finally evaluate the performance of our framework ~ith extensive simulation
experiments, the result of which confirms the effectiveness of the proposed system in
satisfying the required RD features (interoperability, scalability, decentralization and
dynamism). In short, our main contributions are outlined new key taxonomy for the
semantic-based grid RD studies; an interoperable semantic description RD component
model for intergrid services metadata representation; a semantic distributed registry
architecture for indexing service metadata; and an agent-qased service search and
selection algorithm.
Vll