1,192 research outputs found

    Scheduling Algorithms for Cloud: A Survey and Analysis

    Get PDF
    Cloud Computing is a fast growing computing paradigm due to the vast benefits it provides to the users. Scheduling becomes one of the key aspects due to the pay-as-you-go nature of the Cloud. The factors affecting the technique of scheduling applied change with change in scenarios. For instance for scheduling in hybrid clouds the data transfer speed has to be taken into consideration whereas for mobile environments scheduling becomes dependent on context change. Moreover scheduling can be improvised on many fronts such as energy efficiency, cost minimization, Maximization of resource utilization, etc. This paper surveys scheduling techniques in various Cloud Computing scenarios and sites the most efficient scheduling technique available for a particular set of user needs by comparing various techniques and the problems they address

    Resiliency in numerical algorithm design for extreme scale simulations

    Get PDF
    This work is based on the seminar titled ‘Resiliency in Numerical Algorithm Design for Extreme Scale Simulations’ held March 1–6, 2020, at Schloss Dagstuhl, that was attended by all the authors. Advanced supercomputing is characterized by very high computation speeds at the cost of involving an enormous amount of resources and costs. A typical large-scale computation running for 48 h on a system consuming 20 MW, as predicted for exascale systems, would consume a million kWh, corresponding to about 100k Euro in energy cost for executing 1023 floating-point operations. It is clearly unacceptable to lose the whole computation if any of the several million parallel processes fails during the execution. Moreover, if a single operation suffers from a bit-flip error, should the whole computation be declared invalid? What about the notion of reproducibility itself: should this core paradigm of science be revised and refined for results that are obtained by large-scale simulation? Naive versions of conventional resilience techniques will not scale to the exascale regime: with a main memory footprint of tens of Petabytes, synchronously writing checkpoint data all the way to background storage at frequent intervals will create intolerable overheads in runtime and energy consumption. Forecasts show that the mean time between failures could be lower than the time to recover from such a checkpoint, so that large calculations at scale might not make any progress if robust alternatives are not investigated. More advanced resilience techniques must be devised. The key may lie in exploiting both advanced system features as well as specific application knowledge. Research will face two essential questions: (1) what are the reliability requirements for a particular computation and (2) how do we best design the algorithms and software to meet these requirements? While the analysis of use cases can help understand the particular reliability requirements, the construction of remedies is currently wide open. One avenue would be to refine and improve on system- or application-level checkpointing and rollback strategies in the case an error is detected. Developers might use fault notification interfaces and flexible runtime systems to respond to node failures in an application-dependent fashion. Novel numerical algorithms or more stochastic computational approaches may be required to meet accuracy requirements in the face of undetectable soft errors. These ideas constituted an essential topic of the seminar. The goal of this Dagstuhl Seminar was to bring together a diverse group of scientists with expertise in exascale computing to discuss novel ways to make applications resilient against detected and undetected faults. In particular, participants explored the role that algorithms and applications play in the holistic approach needed to tackle this challenge. This article gathers a broad range of perspectives on the role of algorithms, applications and systems in achieving resilience for extreme scale simulations. The ultimate goal is to spark novel ideas and encourage the development of concrete solutions for achieving such resilience holistically.Peer Reviewed"Article signat per 36 autors/es: Emmanuel Agullo, Mirco Altenbernd, Hartwig Anzt, Leonardo Bautista-Gomez, Tommaso Benacchio, Luca Bonaventura, Hans-Joachim Bungartz, Sanjay Chatterjee, Florina M. Ciorba, Nathan DeBardeleben, Daniel Drzisga, Sebastian Eibl, Christian Engelmann, Wilfried N. Gansterer, Luc Giraud, Dominik G ̈oddeke, Marco Heisig, Fabienne Jezequel, Nils Kohl, Xiaoye Sherry Li, Romain Lion, Miriam Mehl, Paul Mycek, Michael Obersteiner, Enrique S. Quintana-Ortiz, Francesco Rizzi, Ulrich Rude, Martin Schulz, Fred Fung, Robert Speck, Linda Stals, Keita Teranishi, Samuel Thibault, Dominik Thonnes, Andreas Wagner and Barbara Wohlmuth"Postprint (author's final draft

    Fault Tolerant Approaches through Scheduling in Cloud Computing Environment - A State of Art

    Get PDF
    Based on pay-as-per-usage policy, there is a tremendous use of cloud computing in scientific society like bio-medical, healthcare and online financial applications. Fault tolerance is one of the biggest challenges to guarantee the reliability and availability of critical services. We must make the system to avail by minimizing the impact of failure. In this paper, we conducted a comparative analysis of various approaches for tolerating faults through scheduling in cloud computing environment based on their policies. The goal of this paper is not only used to analyze the existing methods, but also to identify the areas needed for future research

    Performance Evaluation of Scheduling Algorithms for Real Time Cloud Computing Systems

    Get PDF
    Cloud computing shares data and oers services transparently among its users. With the increase in number of users of cloud the tasks to be scheduled increases. The performance of cloud depends on the task scheduling algorithms used in the scheduling components or brokering components. Scheduling of tasks on cloud computing systems is one of the research problem, Where the matching of machines and completion time of the tasks are considered. Tasks matching of machines problem is that, assume number of active hosts are Y, number of VMs in each host are Z. Maximum number of possible Virtual Machines(VMs) to schedule a single task is (y*z). If we need to schedule X tasks, number of possibilities are (y *z)^x. So scheduling of tasks is NP Hard problem. NP Hard means this scheduling of tasks on VMs not having polynomial time complexity, but it may have algorithm for verifying solution. Fault-tolerance becomes an important key to establish dependability in cloud computing system. In task scheduling, if task not completed in it's deadline ,then it is one type of fault in scheduling of tasks. In this thesis this type of faults are taken and try to overcome it. In this thesis we present a non-preemptive scheduling algorithm, By inserting the ideal time for postponing the task by ensuring the other task will completes its execution with in the deadline. In simulation the proposed algorithm maximizes the prot of 25%, throughput of 25% and minimizes the penalty of 20% over EDF

    Computing at massive scale: Scalability and dependability challenges

    Get PDF
    Large-scale Cloud systems and big data analytics frameworks are now widely used for practical services and applications. However, with the increase of data volume, together with the heterogeneity of workloads and resources, and the dynamic nature of massive user requests, the uncertainties and complexity of resource management and service provisioning increase dramatically, often resulting in poor resource utilization, vulnerable system dependability, and user-perceived performance degradations. In this paper we report our latest understanding of the current and future challenges in this particular area, and discuss both existing and potential solutions to the problems, especially those concerned with system efficiency, scalability and dependability. We first introduce a data-driven analysis methodology for characterizing the resource and workload patterns and tracing performance bottlenecks in a massive-scale distributed computing environment. We then examine and analyze several fundamental challenges and the solutions we are developing to tackle them, including for example incremental but decentralized resource scheduling, incremental messaging communication, rapid system failover, and request handling parallelism. We integrate these solutions with our data analysis methodology in order to establish an engineering approach that facilitates the optimization, tuning and verification of massive-scale distributed systems. We aim to develop and offer innovative methods and mechanisms for future computing platforms that will provide strong support for new big data and IoE (Internet of Everything) applications
    corecore