Search CORE

147 research outputs found

HPC Cloud for Scientific and Business Applications: Taxonomy, Vision, and Research Challenges

Author: Buyya Rajkumar
Calheiros Rodrigo N.
Cunha Renato L. F.
Netto Marco A. S.
Rodrigues Eduardo R.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2018
Field of study

High Performance Computing (HPC) clouds are becoming an alternative to on-premise clusters for executing scientific applications and business analytics services. Most research efforts in HPC cloud aim to understand the cost-benefit of moving resource-intensive applications from on-premise environments to public cloud platforms. Industry trends show hybrid environments are the natural path to get the best of the on-premise and cloud resources---steady (and sensitive) workloads can run on on-premise resources and peak demand can leverage remote resources in a pay-as-you-go manner. Nevertheless, there are plenty of questions to be answered in HPC cloud, which range from how to extract the best performance of an unknown underlying platform to what services are essential to make its usage easier. Moreover, the discussion on the right pricing and contractual models to fit small and large users is relevant for the sustainability of HPC clouds. This paper brings a survey and taxonomy of efforts in HPC cloud and a vision on what we believe is ahead of us, including a set of research challenges that, once tackled, can help advance businesses and scientific discoveries. This becomes particularly relevant due to the fast increasing wave of new HPC applications coming from big data and artificial intelligence.Comment: 29 pages, 5 figures, Published in ACM Computing Surveys (CSUR

arXiv.org e-Print Archive

Western Sydney ResearchDirect

A Study on Cloud Cost Efficiency by Exploiting Idle Billing Period Fractions

Author: Barbosa Jorge G.
Sampaio Altino M.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

In most of the current commercial Clouds, resources are billed based on a time interval equal to one hour, as is the case of virtual machine (VM) instances on Amazon EC2. Such time interval is usually long, and yet the user has to pay for the whole last hour, even if he/she has only used a fraction of it, contradicting the pay-as-you-go model of Clouds. In this paper, we analyse the advantages of adopting alternative scheduling policies that exploit idle last time intervals, in terms of service cost to Cloud users and operating costs to Cloud providers. Using a real-life astronomy workflow application, constrained by user-defined Deadline and Budget quality of service (QoS) parameters, a set of online state-ofthe- art-based scheduling algorithms try different execution and resource provisioning plans. Our results show that exploitation of partially idle last time intervals can reduce the cost of service to the end user, and augments providers competitiveness up to 21.6% through energy efficiency improvement and consequent lowering of operational costs.info:eu-repo/semantics/publishedVersio

Repositório Científico do Instituto Politécnico do Porto

Crossref

Executing Large Scale Scientific Workflows in Public Clouds

Author: Jiang Qingye
Publication venue: Faculty of Engineering and Information Technologies, School of Information Technologies
Publication date: 01/01/2015
Field of study

Scientists in different fields, such as high-energy physics, earth science, and astronomy are developing large-scale workflow applications. In many use cases, scientists need to run a set of interrelated but independent workflows (i.e., workflow ensembles) for the entire scientific analysis. As a workflow ensemble usually contains many sub-workflows in each of which hundreds or thousands of jobs exist with precedence constraints, the execution of such a workflow ensemble makes a great concern with cost even using elastic and pay-as-you-go cloud resources. In this thesis, we develop a set of methods to optimize the execution of large-scale scientific workflows in public clouds with both cost and deadline constraints with a two-step approach. Firstly, we present a set of methods to optimize the execution of scientific workflow in public clouds, with the Montage astronomical mosaic engine running on Amazon EC2 as an example. Secondly, we address three main challenges in realizing benefits of using public clouds when executing large-scale workflow ensembles: (1) execution coordination, (2) resource provisioning, and (3) data staging. To this end, we develop a new pulling-based workflow execution system with a profiling-based resource provisioning strategy. Our results show that our solution system can achieve 80% speed-up, by removing scheduling overhead, compared to the well-known Pegasus workflow management system when running scientific workflow ensembles. Besides, our evaluation using Montage workflow ensembles on around 1000-core Amazon EC2 clusters has demonstrated the efficacy of our resource provisioning strategy in terms of cost effectiveness within deadline

Sydney eScholarship

Scheduling Flexible Demand in Cloud Computing Spot Markets

Author: Keller Robert
Publication venue: AIS Electronic Library (AISeL)
Publication date: 21/01/2020
Field of study

The rapid standardization and specialization of cloud computing services have led to the development of cloud spot markets on which cloud service providers and customers can trade in near real-time. Frequent changes in demand and supply give rise to spot prices that vary throughout the day. Cloud customers often have temporal flexibility to execute their jobs before a specific deadline. In this paper, the authors apply real options analysis (ROA), which is an established valuation method designed to capture the flexibility of action under uncertainty. They adapt and compare multiple discrete-time approaches that enable cloud customers to quantify and exploit the monetary value of their short-term temporal flexibility. The paper contributes to the field by guaranteeing cloud job execution of variable-time requests in a single cloud spot market, whereas existing multi-market strategies may not fulfill requests when outbid. In a broad simulation of scenarios for the use of Amazon EC2 spot instances, the developed approaches exploit the existing savings potential up to 40 percent – a considerable extent. Moreover, the results demonstrate that ROA, which explicitly considers time-of-day-specific spot price patterns, outperforms traditional option pricing models and expectation optimization

AIS Electronic Library (AISeL)

Ad hoc cloud computing

Author: McGilvary Gary Andrew
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 27/11/2014
Field of study

Commercial and private cloud providers offer virtualized resources via a set of co-located and dedicated hosts that are exclusively reserved for the purpose of offering a cloud service. While both cloud models appeal to the mass market, there are many cases where outsourcing to a remote platform or procuring an in-house infrastructure may not be ideal or even possible. To offer an attractive alternative, we introduce and develop an ad hoc cloud computing platform to transform spare resource capacity from an infrastructure owner’s locally available, but non-exclusive and unreliable infrastructure, into an overlay cloud platform. The foundation of the ad hoc cloud relies on transferring and instantiating lightweight virtual machines on-demand upon near-optimal hosts while virtual machine checkpoints are distributed in a P2P fashion to other members of the ad hoc cloud. Virtual machines found to be non-operational are restored elsewhere ensuring the continuity of cloud jobs. In this thesis we investigate the feasibility, reliability and performance of ad hoc cloud computing infrastructures. We firstly show that the combination of both volunteer computing and virtualization is the backbone of the ad hoc cloud. We outline the process of virtualizing the volunteer system BOINC to create V-BOINC. V-BOINC distributes virtual machines to volunteer hosts allowing volunteer applications to be executed in the sandbox environment to solve many of the downfalls of BOINC; this however also provides the basis for an ad hoc cloud computing platform to be developed. We detail the challenges of transforming V-BOINC into an ad hoc cloud and outline the transformational process and integrated extensions. These include a BOINC job submission system, cloud job and virtual machine restoration schedulers and a periodic P2P checkpoint distribution component. Furthermore, as current monitoring tools are unable to cope with the dynamic nature of ad hoc clouds, a dynamic infrastructure monitoring and management tool called the Cloudlet Control Monitoring System is developed and presented. We evaluate each of our individual contributions as well as the reliability, performance and overheads associated with an ad hoc cloud deployed on a realistically simulated unreliable infrastructure. We conclude that the ad hoc cloud is not only a feasible concept but also a viable computational alternative that offers high levels of reliability and can at least offer reasonable performance, which at times may exceed the performance of a commercial cloud infrastructure

Edinburgh Research Archive

Climbing Up Cloud Nine: Performance Enhancement Techniques for Cloud Computing Environments

Author: Abusharkh Mohamed
Publication venue: Scholarship@Western
Publication date: 07/07/2016
Field of study

With the transformation of cloud computing technologies from an attractive trend to a business reality, the need is more pressing than ever for efficient cloud service management tools and techniques. As cloud technologies continue to mature, the service model, resource allocation methodologies, energy efficiency models and general service management schemes are not yet saturated. The burden of making this all tick perfectly falls on cloud providers. Surely, economy of scale revenues and leveraging existing infrastructure and giant workforce are there as positives, but it is far from straightforward operation from that point. Performance and service delivery will still depend on the providers’ algorithms and policies which affect all operational areas. With that in mind, this thesis tackles a set of the more critical challenges faced by cloud providers with the purpose of enhancing cloud service performance and saving on providers’ cost. This is done by exploring innovative resource allocation techniques and developing novel tools and methodologies in the context of cloud resource management, power efficiency, high availability and solution evaluation. Optimal and suboptimal solutions to the resource allocation problem in cloud data centers from both the computational and the network sides are proposed. Next, a deep dive into the energy efficiency challenge in cloud data centers is presented. Consolidation-based and non-consolidation-based solutions containing a novel dynamic virtual machine idleness prediction technique are proposed and evaluated. An investigation of the problem of simulating cloud environments follows. Available simulation solutions are comprehensively evaluated and a novel design framework for cloud simulators covering multiple variations of the problem is presented. Moreover, the challenge of evaluating cloud resource management solutions performance in terms of high availability is addressed. An extensive framework is introduced to design high availability-aware cloud simulators and a prominent cloud simulator (GreenCloud) is extended to implement it. Finally, real cloud application scenarios evaluation is demonstrated using the new tool. The primary argument made in this thesis is that the proposed resource allocation and simulation techniques can serve as basis for effective solutions that mitigate performance and cost challenges faced by cloud providers pertaining to resource utilization, energy efficiency, and client satisfaction

Scholarship@Western

Recommended from our members

Transiency-driven Resource Management for Cloud Computing Platforms

Author: Sharma Prateek
Publication venue: ScholarWorks@UMass Amherst
Publication date: 25/10/2018
Field of study

Modern distributed server applications are hosted on enterprise or cloud data centers that provide computing, storage, and networking capabilities to these applications. These applications are built using the implicit assumption that the underlying servers will be stable and normally available, barring for occasional faults. In many emerging scenarios, however, data centers and clouds only provide transient, rather than continuous, availability of their servers. Transiency in modern distributed systems arises in many contexts, such as green data centers powered using renewable intermittent sources, and cloud platforms that provide lower-cost transient servers which can be unilaterally revoked by the cloud operator. Transient computing resources are increasingly important, and existing fault-tolerance and resource management techniques are inadequate for transient servers because applications typically assume continuous resource availability. This thesis presents research in distributed systems design that treats transiency as a first-class design principle. I show that combining transiency-specific fault-tolerance mechanisms with resource management policies to suit application characteristics and requirements, can yield significant cost and performance benefits. These mechanisms and policies have been implemented and prototyped as part of software systems, which allow a wide range of applications, such as interactive services and distributed data processing, to be deployed on transient servers, and can reduce cloud computing costs by up to 90\%. This thesis makes contributions to four areas of computer systems research: transiency-specific fault-tolerance, resource allocation, abstractions, and resource reclamation. For reducing the impact of transient server revocations, I develop two fault-tolerance techniques that are tailored to transient server characteristics and application requirements. For interactive applications, I build a derivative cloud platform that masks revocations by transparently moving application-state between servers of different types. Similarly, for distributed data processing applications, I investigate the use of application level periodic checkpointing to reduce the performance impact of server revocations. For managing and reducing the risk of server revocations, I investigate the use of server portfolios that allow transient resource allocation to be tailored to application requirements. Finally, I investigate how resource providers (such as cloud platforms) can provide transient resource availability without revocation, by looking into alternative resource reclamation techniques. I develop resource deflation, wherein a server\u27s resources are fractionally reclaimed, allowing the application to continue execution albeit with fewer resources. Resource deflation generalizes revocation, and the deflation mechanisms and cluster-wide policies can yield both high cluster utilization and low application performance degradation

ScholarWorks@UMass Amherst