624 research outputs found

    An autonomic prediction suite for cloud resource provisioning

    Get PDF
    One of the challenges of cloud computing is effective resource management due to its auto-scaling feature. Prediction techniques have been proposed for cloud computing to improve cloud resource management. This paper proposes an autonomic prediction suite to improve the prediction accuracy of the auto-scaling system in the cloud computing environment. Towards this end, this paper proposes that the prediction accuracy of the predictive auto-scaling systems will increase if an appropriate time-series prediction algorithm based on the incoming workload pattern is selected. To test the proposition, a comprehensive theoretical investigation is provided on different risk minimization principles and their effects on the accuracy of the time-series prediction techniques in the cloud environment. In addition, experiments are conducted to empirically validate the theoretical assessment of the hypothesis. Based on the theoretical and the experimental results, this paper designs a self-adaptive pred

    Characterizing Service Level Objectives for Cloud Services: Motivation of Short-Term Cache Allocation Performance Modeling

    Get PDF
    Service level objectives (SLOs) stipulate performance goals for cloud applications, microservices, and infrastructure. SLOs are widely used, in part, because system managers can tailor goals to their products, companies, and workloads. Systems research intended to support strong SLOs should target realistic performance goals used by system managers in the field. Evaluations conducted with uncommon SLO goals may not translate to real systems. Some textbooks discuss the structure of SLOs but (1) they only sketch SLO goals and (2) they use outdated examples. We mined real SLOs published on the web, extracted their goals and characterized them. Many web documents discuss SLOs loosely but few provide details and reflect real settings. Systematic literature review (SLR) prunes results and reduces bias by (1) modeling expected SLO structure and (2) detecting and removing outliers. We collected 75 SLOs where response time, query percentile and reporting period were specified. We used these SLOs to confirm and refute common perceptions. For example, we found few SLOs with response time guarantees below 10 ms for 90% or more queries. This reality bolsters perceptions that single digit SLOs face fundamental research challenges.This work was funded by NSF Grants 1749501 and 1350941.No embargoAcademic Major: Computer Science and EngineeringAcademic Major: Financ

    Towards Data-Driven Autonomics in Data Centers

    Get PDF
    Continued reliance on human operators for managing data centers is a major impediment for them from ever reaching extreme dimensions. Large computer systems in general, and data centers in particular, will ultimately be managed using predictive computational and executable models obtained through data-science tools, and at that point, the intervention of humans will be limited to setting high-level goals and policies rather than performing low-level operations. Data-driven autonomics, where management and control are based on holistic predictive models that are built and updated using generated data, opens one possible path towards limiting the role of operators in data centers. In this paper, we present a data-science study of a public Google dataset collected in a 12K-node cluster with the goal of building and evaluating a predictive model for node failures. We use BigQuery, the big data SQL platform from the Google Cloud suite, to process massive amounts of data and generate a rich feature set characterizing machine state over time. We describe how an ensemble classifier can be built out of many Random Forest classifiers each trained on these features, to predict if machines will fail in a future 24-hour window. Our evaluation reveals that if we limit false positive rates to 5%, we can achieve true positive rates between 27% and 88% with precision varying between 50% and 72%. We discuss the practicality of including our predictive model as the central component of a data-driven autonomic manager and operating it on-line with live data streams (rather than off-line on data logs). All of the scripts used for BigQuery and classification analyses are publicly available from the authors' website.Comment: 12 pages, 6 figure

    Cloud Services Brokerage: A Survey and Research Roadmap

    Get PDF
    A Cloud Services Brokerage (CSB) acts as an intermediary between cloud service providers (e.g., Amazon and Google) and cloud service end users, providing a number of value adding services. CSBs as a research topic are in there infancy. The goal of this paper is to provide a concise survey of existing CSB technologies in a variety of areas and highlight a roadmap, which details five future opportunities for research.Comment: Paper published in the 8th IEEE International Conference on Cloud Computing (CLOUD 2015

    Towards Operator-less Data Centers Through Data-Driven, Predictive, Proactive Autonomics

    Get PDF
    Continued reliance on human operators for managing data centers is a major impediment for them from ever reaching extreme dimensions. Large computer systems in general, and data centers in particular, will ultimately be managed using predictive computational and executable models obtained through data-science tools, and at that point, the intervention of humans will be limited to setting high-level goals and policies rather than performing low-level operations. Data-driven autonomics, where management and control are based on holistic predictive models that are built and updated using live data, opens one possible path towards limiting the role of operators in data centers. In this paper, we present a data-science study of a public Google dataset collected in a 12K-node cluster with the goal of building and evaluating predictive models for node failures. Our results support the practicality of a data-driven approach by showing the effectiveness of predictive models based on data found in typical data center logs. We use BigQuery, the big data SQL platform from the Google Cloud suite, to process massive amounts of data and generate a rich feature set characterizing node state over time. We describe how an ensemble classifier can be built out of many Random Forest classifiers each trained on these features, to predict if nodes will fail in a future 24-hour window. Our evaluation reveals that if we limit false positive rates to 5%, we can achieve true positive rates between 27% and 88% with precision varying between 50% and 72%.This level of performance allows us to recover large fraction of jobs' executions (by redirecting them to other nodes when a failure of the present node is predicted) that would otherwise have been wasted due to failures. [...

    Detection of Malware Attacks on Virtual Machines for a Self-Heal Approach in Cloud Computing using VM Snapshots

    Get PDF
    Cloud Computing strives to be dynamic as a service oriented architecture. The services in the SoA are rendered in terms of private, public and in many other commercial domain aspects. These services should be secured and thus are very vital to the cloud infrastructure. In order, to secure and maintain resilience in the cloud, it not only has to have the ability to identify the known threats but also to new challenges that target the infrastructure of a cloud. In this paper, we introduce and discuss a detection method of malwares from the VM logs and corresponding VM snapshots are classified into attacked and non-attacked VM snapshots. As snapshots are always taken to be a backup in the backup servers, especially during the night hours, this approach could reduce the overhead of the backup server with a self-healing capability of the VMs in the local cloud infrastructure. A machine learning approach at the hypervisor level is projected, the features being gathered from the API calls of VM instances in the IaaS level of cloud service. Our proposed scheme can have a high detection accuracy of about 93% while having the capability to classify and detect different types of malwares with respect to the VM snapshots. Finally the paper exhibits an algorithm using snapshots to detect and thus to self-heal using the monitoring components of a particular VM instances applied to cloud scenarios. The self-healing approach with machine learning algorithms can determine new threats with some prior knowledge of its functionality

    Cloud Computing Systems Exploration over Workload Prediction Factor in Distributed Applications

    Get PDF
    This paper highlights the different techniques of workload prediction in cloud computing. Cloud computing resources have a special kind of arrangement in which resources are made available on demand to the customers. Today, most of the organizations are using cloud computing that results in reduction of the operational cost. Cloud computing also reduces the overhead of any organization due to implementation of many hardware and software platforms. These services are being provided by cloud provider on the basis of pay per use. There are lots of cloud service providers in the modern era. In this competitive era, every cloud provider works to provide better services to the customer. To fulfill the customer?s requirements, dynamic provisioning can serve the purpose in cloud system where resources can be released and allocated on later stage as per needs. That?s why resource scaling becomes a great challenge for the cloud providers. There are many approaches to scale the number of instances of any resource. Two main approaches namely: proactive and reactive are used in cloud systems. Reactive approach reacts at later stage while proactive approach predicts resources in advance. Cloud provider needs to predict the number of resources in advance that an application is intended to use. Historical data and patterns can be used for the workload prediction. The benefit of the proactive approach lies in advance number of instances of a resource available for the future use. This results in improved performance for the cloud systems

    Feedback Autonomic Provisioning for Guaranteeing Performance in MapReduce Systems

    No full text
    International audienceCompanies have a fast growing amounts of data to process and store, a data explosion is happening next to us. Currentlyone of the most common approaches to treat these vast data quantities are based on the MapReduce parallel programming paradigm.While its use is widespread in the industry, ensuring performance constraints, while at the same time minimizing costs, still providesconsiderable challenges. We propose a coarse grained control theoretical approach, based on techniques that have already provedtheir usefulness in the control community. We introduce the first algorithm to create dynamic models for Big Data MapReduce systems,running a concurrent workload. Furthermore we identify two important control use cases: relaxed performance - minimal resourceand strict performance. For the first case we develop two feedback control mechanism. A classical feedback controller and an evenbasedfeedback, that minimises the number of cluster reconfigurations as well. Moreover, to address strict performance requirements afeedforward predictive controller that efficiently suppresses the effects of large workload size variations is developed. All the controllersare validated online in a benchmark running in a real 60 node MapReduce cluster, using a data intensive Business Intelligenceworkload. Our experiments demonstrate the success of the control strategies employed in assuring service time constraints
    • …
    corecore