15 research outputs found

    Dagstuhl News January - December 2001

    Get PDF
    "Dagstuhl News" is a publication edited especially for the members of the Foundation "Informatikzentrum Schloss Dagstuhl" to thank them for their support. The News give a summary of the scientific work being done in Dagstuhl. Each Dagstuhl Seminar is presented by a small abstract describing the contents and scientific highlights of the seminar as well as the perspectives or challenges of the research topic

    A Policy-Based Resource Brokering Environment for Computational Grids

    Get PDF
    With the advances in networking infrastructure in general, and the Internet in particular, we can build grid environments that allow users to utilize a diverse set of distributed and heterogeneous resources. Since the focus of such environments is the efficient usage of the underlying resources, a critical component is the resource brokering environment that mediates the discovery, access and usage of these resources. With the consumer\u27s constraints, provider\u27s rules, distributed heterogeneous resources and the large number of scheduling choices, the resource brokering environment needs to decide where to place the user\u27s jobs and when to start their execution in a way that yields the best performance for the user and the best utilization for the resource provider. As brokering and scheduling are very complicated tasks, most current resource brokering environments are either specific to a particular grid environment or have limited features. This makes them unsuitable for large applications with heterogeneous requirements. In addition, most of these resource brokering environments lack flexibility. Policies at the resource-, application-, and system-levels cannot be specified and enforced to provide commitment to the guaranteed level of allocation that can help in attracting grid users and contribute to establishing credibility for existing grid environments. In this thesis, we propose and prototype a flexible and extensible Policy-based Resource Brokering Environment (PROBE) that can be utilized by various grid systems. In designing PROBE, we follow a policy-based approach that provides PROBE with the intelligence to not only match the user\u27s request with the right set of resources, but also to assure the guaranteed level of the allocation. PROBE looks at the task allocation as a Service Level Agreement (SLA) that needs to be enforced between the resource provider and the resource consumer. The policy-based framework is useful in a typical grid environment where resources, most of the time, are not dedicated. In implementing PROBE, we have utilized a layered architecture and façade design patterns. These along with the well-defined API, make the framework independent of any architecture and allow for the incorporation of different types of scheduling algorithms, applications and platform adaptors as the underlying environment requires. We have utilized XML as a base for all the specification needs. This provides a flexible mechanism to specify the heterogeneous resources and user\u27s requests along with their allocation constraints. We have developed XML-based specifications by which high-level internal structures of resources, jobs and policies can be specified. This provides interoperability in which a grid system can utilize PROBE to discover and use resources controlled by other grid systems. We have implemented a prototype of PROBE to demonstrate its feasibility. We also describe a test bed environment and the evaluation experiments that we have conducted to demonstrate the usefulness and effectiveness of our approach

    Distributed and Multiprocessor Scheduling

    Get PDF
    This chapter discusses CPU scheduling in parallel and distributed systems. CPU scheduling is part of a broader class of resource allocation problems, and is probably the most carefully studied such problem. The main motivation for multiprocessor scheduling is the desire for increased speed in the execution of a workload. Parts of the workload, called tasks, can be spread across several processors and thus be executed more quickly than on a single processor. In this chapter, we will examine techniques for providing this facility. The scheduling problem for multiprocessor systems can be generally stated as \How can we execute a set of tasks T on a set of processors P subject to some set of optimizing criteria C? The most common goal of scheduling is to minimize the expected runtime of a task set. Examples of other scheduling criteria include minimizing the cost, minimizing communication delay, giving priority to certain users\u27 processes, or needs for specialized hardware devices. The scheduling policy for a multiprocessor system usually embodies a mixture of several of these criteria. Section 2 outlines general issues in multiprocessor scheduling and gives background material, including issues specific to either parallel or distributed scheduling. Section 3 describes the best practices from prior work in the area, including a broad survey of existing scheduling algorithms and mechanisms. Section 4 outlines research issues and gives a summary. Section 5 lists the terms defined in this chapter, while sections 6 and 7 give references to important research publications in the area

    Autonomous grid scheduling using probabilistic job runtime scheduling

    Get PDF
    Computational Grids are evolving into a global, service-oriented architecture – a universal platform for delivering future computational services to a range of applications of varying complexity and resource requirements. The thesis focuses on developing a new scheduling model for general-purpose, utility clusters based on the concept of user requested job completion deadlines. In such a system, a user would be able to request each job to finish by a certain deadline, and possibly to a certain monetary cost. Implementing deadline scheduling is dependent on the ability to predict the execution time of each queued job, and on an adaptive scheduling algorithm able to use those predictions to maximise deadline adherence. The thesis proposes novel solutions to these two problems and documents their implementation in a largely autonomous and self-managing way. The starting point of the work is an extensive analysis of a representative Grid workload revealing consistent workflow patterns, usage cycles and correlations between the execution times of jobs and its properties commonly collected by the Grid middleware for accounting purposes. An automated approach is proposed to identify these dependencies and use them to partition the highly variable workload into subsets of more consistent and predictable behaviour. A range of time-series forecasting models, applied in this context for the first time, were used to model the job execution times as a function of their historical behaviour and associated properties. Based on the resulting predictions of job runtimes a novel scheduling algorithm is able to estimate the latest job start time necessary to meet the requested deadline and sort the queue accordingly to minimise the amount of deadline overrun. The testing of the proposed approach was done using the actual job trace collected from a production Grid facility. The best performing execution time predictor (the auto-regressive moving average method) coupled to workload partitioning based on three simultaneous job properties returned the median absolute percentage error centroid of only 4.75%. This level of prediction accuracy enabled the proposed deadline scheduling method to reduce the average deadline overrun time ten-fold compared to the benchmark batch scheduler. Overall, the thesis demonstrates that deadline scheduling of computational jobs on the Grid is achievable using statistical forecasting of job execution times based on historical information. The proposed approach is easily implementable, substantially self-managing and better matched to the human workflow making it well suited for implementation in the utility Grids of the future

    Load Balancing Scientific Applications

    Get PDF
    The largest supercomputers have millions of independent processors, and concurrency levels are rapidly increasing. For ideal efficiency, developers of the simulations that run on these machines must ensure that computational work is evenly balanced among processors. Assigning work evenly is challenging because many large modern parallel codes simulate behavior of physical systems that evolve over time, and their workloads change over time. Furthermore, the cost of imbalanced load increases with scale because most large-scale scientific simulations today use a Single Program Multiple Data (SPMD) parallel programming model, and an increasing number of processors will wait for the slowest one at the synchronization points. To address load imbalance, many large-scale parallel applications use dynamic load balance algorithms to redistribute work evenly. The research objective of this dissertation is to develop methods to decide when and how to load balance the application, and to balance it effectively and affordably. We measure and evaluate the computational load of the application, and develop strategies to decide when and how to correct the imbalance. Depending on the simulation, a fast, local load balance algorithm may be suitable, or a more sophisticated and expensive algorithm may be required. We developed a model for comparison of load balance algorithms for a specific state of the simulation that enables the selection of a balancing algorithm that will minimize overall runtime. Dynamic load balancing of parallel applications becomes more critical at scale, while also being expensive. To make the load balance correction affordable at scale, we propose a lazy load balancing strategy that evaluates the imbalance and computes the new assignment of work to processes asynchronously to the main application computation. We decouple the load balance algorithm from the application and run it on potentially fewer, separate processors. In this Multiple Program Multiple Data (MPMD) configuration, the load balance algorithm can execute concurrently with the application and with higher parallel efficiency than if it were run on the same processors as the simulation. Work is reassigned lazily as directions become available, and the application need not wait for the load balance algorithm to complete. We show that we can save resources by running a load balance algorithm at higher parallel efficiency on a smaller number of processors. Using our framework, we explore the trade-offs of load balancing configurations and demonstrate a performance improvement of up to 46%

    Economic-based Distributed Resource Management and Scheduling for Grid Computing

    Full text link
    Computational Grids, emerging as an infrastructure for next generation computing, enable the sharing, selection, and aggregation of geographically distributed resources for solving large-scale problems in science, engineering, and commerce. As the resources in the Grid are heterogeneous and geographically distributed with varying availability and a variety of usage and cost policies for diverse users at different times and, priorities as well as goals that vary with time. The management of resources and application scheduling in such a large and distributed environment is a complex task. This thesis proposes a distributed computational economy as an effective metaphor for the management of resources and application scheduling. It proposes an architectural framework that supports resource trading and quality of services based scheduling. It enables the regulation of supply and demand for resources and provides an incentive for resource owners for participating in the Grid and motives the users to trade-off between the deadline, budget, and the required level of quality of service. The thesis demonstrates the capability of economic-based systems for peer-to-peer distributed computing by developing users' quality-of-service requirements driven scheduling strategies and algorithms. It demonstrates their effectiveness by performing scheduling experiments on the World-Wide Grid for solving parameter sweep applications

    Autonomous grid scheduling using probabilistic job runtime forecasting.

    Get PDF
    Computational Grids are evolving into a global, service-oriented architecture a universal platform for delivering future computational services to a range of applications of varying complexity and resource requirements. The thesis focuses on developing a new scheduling model for general-purpose, utility clusters based on the concept of user requested job completion deadlines. In such a system, a user would be able to request each job to finish by a certain deadline. and possibly to a certain monetary cost. Implementing deadline scheduling is dependent on the ability to predict the execution time of each queued job. and on an adaptive scheduling algorithm able to use those predictions to maximise deadline adherence. The thesis proposes novel solutions to these two problems and documents their implementation in a largely autonomous and self-managing way. The starting point of the work is an extensive analysis of a representative Grid workload revealing consistent workflow patterns, usage cycles and correlations between the execution times of jobs and its properties commonly collected by the Grid middleware for accounting purposes. An automated approach is proposed to identify these dependencies and use them to partition the highly variable workload into subsets of more consistent and predictable behaviour. A range of time-series forecasting models, applied in this context for the first time, were used to model the job execution times as a function of their historical behaviour and associated properties. Based on the resulting predictions of job runtimes a novel scheduling algorithm is able to estimate the latest job start time necessary to meet the requested deadline and sort the queue accordingly to minimise the amount of deadline overrun. The testing of the proposed approach was done using the actual job trace collected from a production Grid facility. The best performing execution time predictor (the auto-regressive moving average method) coupled to workload partitioning based on three simultaneous job properties returned the median absolute percentage error eentroid of only 4.75CX. This level of prediction accuracy enabled the proposed deadline scheduling method to reduce the average deadline overrun time ten-fold compared to the benchmark batch scheduler. Overall, the thesis demonstrates that deadline scheduling of computational jobs on the Grid is achievable using statistical forecasting of job execution times based on historical information. The proposed approach is easily implementable, substantially self-managing and better matched to the human workflow making it well suited for implementation in the utility Grids of the future

    Evolving a secure grid-enabled, distributed data warehouse : a standards-based perspective

    Get PDF
    As digital data-collection has increased in scale and number, it becomes an important type of resource serving a wide community of researchers. Cross-institutional data-sharing and collaboration introduce a suitable approach to facilitate those research institutions that are suffering the lack of data and related IT infrastructures. Grid computing has become a widely adopted approach to enable cross-institutional resource-sharing and collaboration. It integrates a distributed and heterogeneous collection of locally managed users and resources. This project proposes a distributed data warehouse system, which uses Grid technology to enable data-access and integration, and collaborative operations across multi-distributed institutions in the context of HV/AIDS research. This study is based on wider research into OGSA-based Grid services architecture, comprising a data-analysis system which utilizes a data warehouse, data marts, and near-line operational database that are hosted by distributed institutions. Within this framework, specific patterns for collaboration, interoperability, resource virtualization and security are included. The heterogeneous and dynamic nature of the Grid environment introduces a number of security challenges. This study also concerns a set of particular security aspects, including PKI-based authentication, single sign-on, dynamic delegation, and attribute-based authorization. These mechanisms, as supported by the Globus Toolkit’s Grid Security Infrastructure, are used to enable interoperability and establish trust relationship between various security mechanisms and policies within different institutions; manage credentials; and ensure secure interactions

    Metacomputing on clusters augmented with reconfigurable hardware

    Get PDF
    corecore