602 research outputs found

    Online scheduling on a single machine with one restart for all jobs to minimize the weighted makespan

    Get PDF
    In this paper, we consider the online scheduling problem on a single machine to minimize the weighted makespan. In this problem, all jobs arrive over time and they are allowed to be restarted only once. For the general case when the processing times of all jobs are arbitrary, we show that there is no online algorithm with a competitive ratio of less than 2, which matches the lower bound of the problem without restart. That is, only one restart for all jobs is invalid for improving the competitive ratio in the general case. For the special case when all jobs have the same processing time, we present the best possible online algorithm with a competitive ratio of 1.4656, which improves the competitive ratio of 1+521.618 \frac{1+\sqrt{5}}{2}\approx1.618 for the problem without restart

    A survey of scheduling problems with setup times or costs

    Get PDF
    Author name used in this publication: C. T. NgAuthor name used in this publication: T. C. E. Cheng2007-2008 > Academic research: refereed > Publication in refereed journalAccepted ManuscriptPublishe

    Effective task assignment strategies for distributed systems under highly variable workloads

    Get PDF
    Heavy-tailed workload distributions are commonly experienced in many areas of distributed computing. Such workloads are highly variable, where a small number of very large tasks make up a large proportion of the workload, making the load very hard to distribute effectively. Traditional task assignment policies are ineffective under these conditions as they were formulated based on the assumption of an exponentially distributed workload. Size-based task assignment policies have been proposed to handle heavy-tailed workloads, but their applications are limited by their static nature and assumption of prior knowledge of a task's service requirement. This thesis analyses existing approaches to load distribution under heavy-tailed workloads, and presents a new generalised task assignment policy that significantly improves performance for many distributed applications, by intelligently addressing the negative effects on performance that highly variable workloads cause. Many problems associated with the modelling and optimisations of systems under highly variable workloads were then addressed by a novel technique that approximated these workloads with simpler mathematical representations, without losing any of their pertinent original properties. Finally, we obtain advance queuing metrics (such as the variance of key measurements like waiting time and slowdown that are difficult to obtain analytically) through rigorous simulation

    A cyclic approach to large-scale short-term planning in chemical batch production

    Get PDF
    We deal with the scheduling of processes on a multi-product chemical batch production plant. Such a plant contains a number of multi-purpose processing units and storage facilities of limited capacity. Given primary requirements for the final products, the problem consists in dividing the net requirements for the final and the intermediate products into batches and scheduling the processing of these batches. Due to the computational intractability of the problem, the monolithic MILP models proposed in the literature can generally not be used for solving large-scale problem instances. The cyclic solution approach presented in this paper starts from the decomposition of the problem into a batching and a batch-scheduling problem. The complete production schedule is obtained by computing a cyclic subschedule, which is then repeated several times. In this way, good feasible schedules for large-scale problem instances are found within a short CPU tim

    Holistic Slowdown Driven Scheduling and Resource Management for Malleable Jobs

    Get PDF
    In job scheduling, the concept of malleability has been explored since many years ago. Research shows that malleability improves system performance, but its utilization in HPC never became widespread. The causes are the difficulty in developing malleable applications, and the lack of support and integration of the different layers of the HPC software stack. However, in the last years, malleability in job scheduling is becoming more critical because of the increasing complexity of hardware and workloads. In this context, using nodes in an exclusive mode is not always the most efficient solution as in traditional HPC jobs, where applications were highly tuned for static allocations, but offering zero flexibility to dynamic executions. This paper proposes a new holistic, dynamic job scheduling policy, Slowdown Driven (SD-Policy), which exploits the malleability of applications as the key technology to reduce the average slowdown and response time of jobs. SD-Policy is based on backfill and node sharing. It applies malleability to running jobs to make room for jobs that will run with a reduced set of resources, only when the estimated slowdown improves over the static approach. We implemented SD-Policy in SLURM and evaluated it in a real production environment, and with a simulator using workloads of up to 198K jobs. Results show better resource utilization with the reduction of makespan, response time, slowdown, and energy consumption, up to respectively 7%, 50%, 70%, and 6%, for the evaluated workloads

    Approaches to grid-based SAT solving

    Get PDF
    In this work we develop techniques for using distributed computing resources to efficiently solve instances of the propositional satisfiability problem (SAT). The computing resources considered in this work are assumed to be geographically distributed and connected by a non-dedicated network. Such systems are typically referred to as computational grid environments. The time a modern SAT solver consumes while solving an instance varies according to a random distribution. Unlike many other methods for distributed SAT solving, this work identifies the random distribution as a valuable resource for solving-time reduction. The methods which use randomness in the run times of a search algorithm, such as the ones discussed in this work, are examples of multi-search. The main contribution of this work is in developing and analyzing the multi-search approach in SAT solving and showing its efficiency with several experiments. For the purpose of the analysis, the work introduces a grid simulation model which captures several of the properties of a grid environment which are not observed in more traditional parallel computing systems. The work develops two algorithmic frameworks for multi-search in SAT. The first, SDSAT, is based on using properties of the distribution of the solving time so that the expected time required to solve an instance is reduced. Based on the analysis of SDSAT, the work proposes an algorithm for efficiently using large number of computing resources simultaneously to solve collections of SAT instances. The analysis of SDSAT also motivates the second algorithmic framework, CL-SDSAT. The framework is used to efficiently solve many industrial SAT instances by carefully combining information learned in the distributed SAT solvers. All methods described in the work are directly applicable in a wide range of grid environments and can be used together with virtually unmodified state-of-the-art SAT solvers. The methods are experimentally verified using standard benchmark SAT instances in a production-level grid environment. The experiments show that using the relatively simple methods developed in the work, SAT instances which cannot be solved efficiently in sequential settings can be now solved in a grid environment

    Online scheduling in fault-prone systems: performance optimization and energy efficiency

    Get PDF
    Mención Internacional en el título de doctorEveryone is familiar with the problem of online scheduling (even if they are not aware of it), from the way we prioritize our everyday decisions to the way a delivery service must decide on the route to follow in order to cover the ongoing requests. In computer science, this is a problem of even greater importance. This thesis considers two main families of online scheduling problems in computer science, and aims to provide an extended clear framework for their analysis, presenting at the same time some common characteristics that connect these problems. The first and main family of online scheduling problems considered, is task scheduling in fault-prone computing systems. As the number of clients and the possibilities offered by the rapid development of computing systems, grow with time, the increase of demands of computationally intensive tasks is inevitable. Uniprocessors are no longer capable of coping with the escalation of these demands, which among others, has led to the development of multicore-based parallel machines, Internet-based computing platforms and co-operational distributed systems. Nonetheless, the challenges of these systems, even of the simplest ones, are numerous: They have to deal with continuous dynamic requests from the clients, which are probably not of the same nature (require different amount of computational resources). The processing elements (i.e., machines) may suffer from unpredictable failures, either malicious or due to overload. Furthermore, depending on the size of these systems and the exact processing units, their power consumption may be of significant amount; even equal to the electricity needed for a small town. Hence, limiting their power consumption is another challenge. To analyze such a system one must consider the online nature of the problem; the dynamic task arrivals (client requests) of different sizes (computational demands), and the unpredictable machine crashes and restarts (failures). It is important to give guarantees for the performance of the algorithms used in these systems, thus the thesis conducts worst-case competitive analysis and covers a significant level of the three dimensions of the problem. More precisely, it studies the effects of the number of machines, the number of different task sizes and the speed of the machines – which as will be explained through the thesis, affects the power consumption of the system – on the efficiency of online scheduling algorithms. As performance measures, this thesis uses the completed load, the pending load and the latency competitiveness of the algorithms. In some cases, it considers the long-term competitiveness versions of these measures as well. One of the most important results shown, is that resource augmentation in the form of increasing the machine speedup, is necessary in order to achieve some competitiveness, or to reach optimal competitiveness. The sufficient amount of speedup is found, and online algorithms that achieve the desired competitiveness are proposed and analyzed. Apart from the algorithms designed, some of the most widely used algorithms in scheduling are also analyzed in the model considered for the first time; namely, Longest In System (LIS), Shortest In System (SIS), Largest Processing Time (LPT), and Smallest Processing Time (SPT). Nonetheless, deciding on the best algorithm between them, is not easy. Each algorithm behaves better with respect to a different evaluation metric and under different model parameters. The second family of problems considered, is packet scheduling over an unreliable wireless communication link. As claimed, these problems have a strong connection to the task scheduling problem, especially when considering one machine and no speedup, hence some of the results can be shared. A setting with a single pair of nodes is considered, connected through an unreliable wireless channel. The sending station transmits packets to a receiving station over the channel, which can be jammed and hence corrupt the packet being transmitted. First, worst-case scenarios are assumed for the channel jams, modeled by a malicious adversarial entity. The packet arrivals however, follow a stochastic distribution and competitive analysis of scheduling algorithms is pursued giving matching bounds for the most pessimistic scenarios of channel jams. The aim of the algorithms is to find the schedule (or order or transmission of the arriving packets) in order to maximize the asymptotic throughout, which corresponds to the long-term competitive ratio of total length of successfully transmitted packets. Then, a slightly different problem is considered, assuming infinite amount of data to be transmitted over the same unreliable communication link. This time however, an adversarial entity with constrained power is assumed for the channel jams. The constrained power is modeled by an Adversarial Queueing Theory (AQT) approach, defined with two main parameters; "the error availability rate", and, the maximum batch of errors available to the adversary at any time. This is the first time AQT is used to model channel jams; it has been mostly used to model the packet arrivals in networking problems. In this problem, the scheduling algorithms must decide on the length of the packets to be transmitted, with the objective of maximizing the goodput rate; the rate of successfully transmitted load. It is seen, that even for the simplest settings, the analysis and results are not trivial.This work has been supported by IMDEA Networks InstitutePrograma Oficial de Doctorado en Ingeniería TelemáticaPresidente: María Serna Iglesias.- Secretario: Vincenzo Mancuso.- Vocal: Leszek Antoni Gasieni

    A study in grid simulation and scheduling

    Get PDF
    Grid computing is emerging as an essential tool for large scale analysis and problem solving in scientific and business domains. Whilst the idea of stealing unused processor cycles is as old as the Internet, we are still far from reaching a position where many distributed resources can be seamlessly utilised on demand. One major issue preventing this vision is deciding how to effectively manage the remote resources and how to schedule the tasks amongst these resources. This thesis describes an investigation into Grid computing, specifically the problem of Grid scheduling. This complex problem has many unique features making it particularly difficult to solve and as a result many current Grid systems employ simplistic, inefficient solutions. This work describes the development of a simulation tool, G-Sim, which can be used to test the effectiveness of potential Grid scheduling algorithms under realistic operating conditions. This tool is used to analyse the effectiveness of a simple, novel scheduling technique in numerous scenarios. The results are positive and show that it could be applied to current procedures to enhance performance and decrease the negative effect of resource failure. Finally a conversion between the Grid scheduling problem and the classic computing problem SAT is provided. Such a conversion allows for the possibility of applying sophisticated SAT solving procedures to Grid scheduling providing potentially effective solutions

    Grid-centric scheduling strategies for workflow applications

    Get PDF
    Grid computing faces a great challenge because the resources are not localized, but distributed, heterogeneous and dynamic. Thus, it is essential to provide a set of programming tools that execute an application on the Grid resources with as little input from the user as possible. The thesis of this work is that Grid-centric scheduling techniques of workflow applications can provide good usability of the Grid environment by reliably executing the application on a large scale distributed system with good performance. We support our thesis with new and effective approaches in the following five aspects. First, we modeled the performance of the existing scheduling approaches in a multi-cluster Grid environment. We implemented several widely-used scheduling algorithms and identified the best candidate. The study further introduced a new measurement, based on our experiments, which can improve the schedule quality of some scheduling algorithms as much as 20 fold in a multi-cluster Grid environment. Second, we studied the scalability of the existing Grid scheduling algorithms. To deal with Grid systems consisting of hundreds of thousands of resources, we designed and implemented a novel approach that performs explicit resource selection decoupled from scheduling Our experimental evaluation confirmed that our decoupled approach can be scalable in such an environment without sacrificing the quality of the schedule by more than 10%. Third, we proposed solutions to address the dynamic nature of Grid computing with a new cluster-based hybrid scheduling mechanism. Our experimental results collected from real executions on production clusters demonstrated that this approach produces programs running 30% to 100% faster than the other scheduling approaches we implemented on both reserved and shared resources. Fourth, we improved the reliability of Grid computing by incorporating fault- tolerance and recovery mechanisms into the workow application execution. Our experiments on a simulated multi-cluster Grid environment demonstrated the effectiveness of our approach and also characterized the three-way trade-off between reliability, performance and resource usage when executing a workflow application. Finally, we improved the large batch-queue wait time often found in production Grid clusters. We developed a novel approach to partition the workow application and submit them judiciously to achieve less total batch-queue wait time. The experimental results derived from production site batch queue logs show that our approach can reduce total wait time by as much as 70%. Our approaches combined can greatly improve the usability of Grid computing while increasing the performance of workow applications on a multi-cluster Grid environment

    QoS-aware predictive workflow scheduling

    Full text link
    This research places the basis of QoS-aware predictive workflow scheduling. This research novel contributions will open up prospects for future research in handling complex big workflow applications with high uncertainty and dynamism. The results from the proposed workflow scheduling algorithm shows significant improvement in terms of the performance and reliability of the workflow applications
    corecore