1,479 research outputs found

    HPC Cloud for Scientific and Business Applications: Taxonomy, Vision, and Research Challenges

    Full text link
    High Performance Computing (HPC) clouds are becoming an alternative to on-premise clusters for executing scientific applications and business analytics services. Most research efforts in HPC cloud aim to understand the cost-benefit of moving resource-intensive applications from on-premise environments to public cloud platforms. Industry trends show hybrid environments are the natural path to get the best of the on-premise and cloud resources---steady (and sensitive) workloads can run on on-premise resources and peak demand can leverage remote resources in a pay-as-you-go manner. Nevertheless, there are plenty of questions to be answered in HPC cloud, which range from how to extract the best performance of an unknown underlying platform to what services are essential to make its usage easier. Moreover, the discussion on the right pricing and contractual models to fit small and large users is relevant for the sustainability of HPC clouds. This paper brings a survey and taxonomy of efforts in HPC cloud and a vision on what we believe is ahead of us, including a set of research challenges that, once tackled, can help advance businesses and scientific discoveries. This becomes particularly relevant due to the fast increasing wave of new HPC applications coming from big data and artificial intelligence.Comment: 29 pages, 5 figures, Published in ACM Computing Surveys (CSUR

    Data Replication and Its Alignment with Fault Management in the Cloud Environment

    Get PDF
    Nowadays, the exponential data growth becomes one of the major challenges all over the world. It may cause a series of negative impacts such as network overloading, high system complexity, and inadequate data security, etc. Cloud computing is developed to construct a novel paradigm to alleviate massive data processing challenges with its on-demand services and distributed architecture. Data replication has been proposed to strategically distribute the data access load to multiple cloud data centres by creating multiple data copies at multiple cloud data centres. A replica-applied cloud environment not only achieves a decrease in response time, an increase in data availability, and more balanced resource load but also protects the cloud environment against the upcoming faults. The reactive fault tolerance strategy is also required to handle the faults when the faults already occurred. As a result, the data replication strategies should be aligned with the reactive fault tolerance strategies to achieve a complete management chain in the cloud environment. In this thesis, a data replication and fault management framework is proposed to establish a decentralised overarching management to the cloud environment. Three data replication strategies are firstly proposed based on this framework. A replica creation strategy is proposed to reduce the total cost by jointly considering the data dependency and the access frequency in the replica creation decision making process. Besides, a cloud map oriented and cost efficiency driven replica creation strategy is proposed to achieve the optimal cost reduction per replica in the cloud environment. The local data relationship and the remote data relationship are further analysed by creating two novel data dependency types, Within-DataCentre Data Dependency and Between-DataCentre Data Dependency, according to the data location. Furthermore, a network performance based replica selection strategy is proposed to avoid potential network overloading problems and to increase the number of concurrent-running instances at the same time

    SGA Model for Prediction in Cloud Environment

    Get PDF
    With virtual information, cloud computing has made applications available to users everywhere. Efficient asset workload forecasting could help the cloud achieve maximum resource utilisation. The effective utilization of resources and the reduction of datacentres power both depend heavily on load forecasting. The allocation of resources and task scheduling issues in clouds and virtualized systems are significantly impacted by CPU utilisation forecast. A resource manager uses utilisation projection to distribute workload between physical nodes, improving resource consumption effectiveness. When performing a virtual machine distribution job, a good estimation of CPU utilization enables the migration of one or more virtual servers, preventing the overflow of the real machineries. In a cloud system, scalability and flexibility are crucial characteristics. Predicting workload and demands would aid in optimal resource utilisation in a cloud setting. To improve allocation of resources and the effectiveness of the cloud service, workload assessment and future workload forecasting could be performed. The creation of an appropriate statistical method has begun. In this study, a simulation approach and a genetic algorithm were used to forecast workloads. In comparison to the earlier techniques, it is anticipated to produce results that are superior by having a lower error rate and higher forecasting reliability. The suggested method is examined utilizing statistics from the Bit brains datacentres. The study then analyses, summarises, and suggests future study paths in cloud environments

    CloudOps: Towards the Operationalization of the Cloud Continuum: Concepts, Challenges and a Reference Framework

    Get PDF
    The current trend of developing highly distributed, context aware, heterogeneous computing intense and data-sensitive applications is changing the boundaries of cloud computing. Encouraged by the growing IoT paradigm and with flexible edge devices available, an ecosystem of a combination of resources, ranging from high density compute and storage to very lightweight embedded computers running on batteries or solar power, is available for DevOps teams from what is known as the Cloud Continuum. In this dynamic context, manageability is key, as well as controlled operations and resources monitoring for handling anomalies. Unfortunately, the operation and management of such heterogeneous computing environments (including edge, cloud and network services) is complex and operators face challenges such as the continuous optimization and autonomous (re-)deployment of context-aware stateless and stateful applications where, however, they must ensure service continuity while anticipating potential failures in the underlying infrastructure. In this paper, we propose a novel CloudOps workflow (extending the traditional DevOps pipeline), proposing techniques and methods for applications’ operators to fully embrace the possibilities of the Cloud Continuum. Our approach will support DevOps teams in the operationalization of the Cloud Continuum. Secondly, we provide an extensive explanation of the scope, possibilities and future of the CloudOps.This research was funded by the European project PIACERE (Horizon 2020 Research and Innovation Programme, under grant agreement No. 101000162)

    Formulating and managing viable SLAs in cloud computing from a small to medium service provider's viewpoint: A state-of-the-art review

    Full text link
    © 2017 Elsevier Ltd In today's competitive world, service providers need to be customer-focused and proactive in their marketing strategies to create consumer awareness of their services. Cloud computing provides an open and ubiquitous computing feature in which a large random number of consumers can interact with providers and request services. In such an environment, there is a need for intelligent and efficient methods that increase confidence in the successful achievement of business requirements. One such method is the Service Level Agreement (SLA), which is comprised of service objectives, business terms, service relations, obligations and the possible action to be taken in the case of SLA violation. Most of the emphasis in the literature has, until now, been on the formation of meaningful SLAs by service consumers, through which their requirements will be met. However, in an increasingly competitive market based on the cloud environment, service providers too need a framework that will form a viable SLA, predict possible SLA violations before they occur, and generate early warning alarms that flag a potential lack of resources. This is because when a provider and a consumer commit to an SLA, the service provider is bound to reserve the agreed amount of resources for the entire period of that agreement – whether the consumer uses them or not. It is therefore very important for cloud providers to accurately predict the likely resource usage for a particular consumer and to formulate an appropriate SLA before finalizing an agreement. This problem is more important for a small to medium cloud service provider which has limited resources that must be utilized in the best possible way to generate maximum revenue. A viable SLA in cloud computing is one that intelligently helps the service provider to determine the amount of resources to offer to a requesting consumer, and there are number of studies on SLA management in the literature. The aim of this paper is two-fold. First, it presents a comprehensive overview of existing state-of-the-art SLA management approaches in cloud computing, and their features and shortcomings in creating viable SLAs from the service provider's viewpoint. From a thorough analysis, we observe that the lack of a viable SLA management framework renders a service provider unable to make wise decisions in forming an SLA, which could lead to service violations and violation penalties. To fill this gap, our second contribution is the proposal of the Optimized Personalized Viable SLA (OPV-SLA) framework which assists a service provider to form a viable SLA and start managing SLA violation before an SLA is formed and executed. The framework also assists a service provider to make an optimal decision in service formation and allocate the appropriate amount of marginal resources. We demonstrate the applicability of our framework in forming viable SLAs through experiments. From the evaluative results, we observe that our framework helps a service provider to form viable SLAs and later to manage them to effectively minimize possible service violation and penalties

    CloudOps: Towards the Operationalization of the Cloud Continuum: Concepts, Challenges and a Reference Framework

    Get PDF
    The current trend of developing highly distributed, context aware, heterogeneous computing intense and data-sensitive applications is changing the boundaries of cloud computing. Encouraged by the growing IoT paradigm and with flexible edge devices available, an ecosystem of a combination of resources, ranging from high density compute and storage to very lightweight embedded computers running on batteries or solar power, is available for DevOps teams from what is known as the Cloud Continuum. In this dynamic context, manageability is key, as well as controlled operations and resources monitoring for handling anomalies. Unfortunately, the operation and management of such heterogeneous computing environments (including edge, cloud and network services) is complex and operators face challenges such as the continuous optimization and autonomous (re-)deployment of context-aware stateless and stateful applications where, however, they must ensure service continuity while anticipating potential failures in the underlying infrastructure. In this paper, we propose a novel CloudOps workflow (extending the traditional DevOps pipeline), proposing techniques and methods for applications’ operators to fully embrace the possibilities of the Cloud Continuum. Our approach will support DevOps teams in the operationalization of the Cloud Continuum. Secondly, we provide an extensive explanation of the scope, possibilities and future of the CloudOps.This research was funded by the European project PIACERE (Horizon 2020 Research and Innovation Programme, under grant agreement No. 101000162)

    Application of Profile Prediction for Proactive Scheduling

    Get PDF
    Today, cloud environments are widely used as execution platforms for most applications. In these environments, virtualized applications often share computing resources. Although this increases hardware utilization, resources competition can cause performance degradation, and knowing which applications can run on the same host without causing too much interference is key to a better scheduling and performance. Therefore, it is important to predict the resource consumption profile of applications in their subsequent iterations. This work evaluates the use of machine learning techniques to predict the increase or decrease in computational resources consumption. The prediction models are evaluated through experiments using real and benchmark applications. Finally, we conclude that some models offer significantly better performance when compared to the current trend of resource usage. These models averaged up to 94% on the F1 metric for this task

    CILP: Co-simulation based imitation learner for dynamic resource provisioning in cloud computing environments

    Get PDF
    Intelligent Virtual Machine (VM) provisioning is central to cost and resource efficient computation in cloud computing environments. As bootstrapping VMs is time-consuming, a key challenge for latency-critical tasks is to predict future workload demands to provision VMs proactively. However, existing AI-based solutions tend to not holistically consider all crucial aspects such as provisioning overheads, heterogeneous VM costs and Quality of Service (QoS) of the cloud system. To address this, we propose a novel method, called CILP, that formulates the VM provisioning problem as two sub-problems of prediction and optimization, where the provisioning plan is optimized based on predicted workload demands. CILP leverages a neural network as a surrogate model to predict future workload demands with a co-simulated digital-twin of the infrastructure to compute QoS scores. We extend the neural network to also act as an imitation learner that dynamically decides the optimal VM provisioning plan. A transformer based neural model reduces training and inference overheads while our novel two-phase decision making loop facilitates in making informed provisioning decisions. Crucially, we address limitations of prior work by including resource utilization, deployment costs and provisioning overheads to inform the provisioning decisions in our imitation learning framework. Experiments with three public benchmarks demonstrate that CILP gives up to 22% higher resource utilization, 14% higher QoS scores and 44% lower execution costs compared to the current online and offline optimization based state-of-the-art methods
    • …
    corecore