325 research outputs found

    Proactive software rejuvenation solution for web enviroments on virtualized platforms

    Get PDF
    The availability of the Information Technologies for everything, from everywhere, at all times is a growing requirement. We use information Technologies from common and social tasks to critical tasks like managing nuclear power plants or even the International Space Station (ISS). However, the availability of IT infrastructures is still a huge challenge nowadays. In a quick look around news, we can find reports of corporate outage, affecting millions of users and impacting on the revenue and image of the companies. It is well known that, currently, computer system outages are more often due to software faults, than hardware faults. Several studies have reported that one of the causes of unplanned software outages is the software aging phenomenon. This term refers to the accumulation of errors, usually causing resource contention, during long running application executions, like web applications, which normally cause applications/systems to hang or crash. Gradual performance degradation could also accompany software aging phenomena. The software aging phenomena are often related to memory bloating/ leaks, unterminated threads, data corruption, unreleased file-locks or overruns. We can find several examples of software aging in the industry. The work presented in this thesis aims to offer a proactive and predictive software rejuvenation solution for Internet Services against software aging caused by resource exhaustion. To this end, we first present a threshold based proactive rejuvenation to avoid the consequences of software aging. This first approach has some limitations, but the most important of them it is the need to know a priori the resource or resources involved in the crash and the critical condition values. Moreover, we need some expertise to fix the threshold value to trigger the rejuvenation action. Due to these limitations, we have evaluated the use of Machine Learning to overcome the weaknesses of our first approach to obtain a proactive and predictive solution. Finally, the current and increasing tendency to use virtualization technologies to improve the resource utilization has made traditional data centers turn into virtualized data centers or platforms. We have used a Mathematical Programming approach to virtual machine allocation and migration to optimize the resources, accepting as many services as possible on the platform while at the same time, guaranteeing the availability (via our software rejuvenation proposal) of the services deployed against the software aging phenomena. The thesis is supported by an exhaustive experimental evaluation that proves the effectiveness and feasibility of our proposals for current systems

    Reboot-based Recovery of Performance Anomalies in Adaptive Bitrate Video-Streaming Services

    Get PDF
    Performance anomalies represent one common type of failures in Internet servers. Overcoming these failures without introducing server downtimes is of the utmost importance in video-streaming services. These services have large user abandon- ment costs when failures occur after users watch a significant part of a video. Reboot is the most popular and effective technique for overcoming performance anomalies but it takes several minutes from start until the server is warmed-up again to run at its full capacity. During that period, the server is unavailable or provides limited capacity to process end-users’ requests. This paper presents a recovery technique for performance anomalies in HTTP Streaming services, which relies on Container-based Virtualization to implement an efficient multi-phase server reboot technique that minimizes the service downtime. The recovery process includes analysis of variance of request-response times to delimit the server warm-up period, after which the server is running at its full capacity. Experimental results show that the Virtual Container recovery process completes in 72 seconds, which contrasts with the 434 seconds required for full operating system recovery. Both recovery types generate service downtimes imperceptible to end-users

    DEPENDABILITY IN CLOUD COMPUTING

    Get PDF
    The technological advances and success of Service-Oriented Architectures and the Cloud computing paradigm have produced a revolution in the Information and Communications Technology (ICT). Today, a wide range of services are provisioned to the users in a flexible and cost-effective manner, thanks to the encapsulation of several technologies with modern business models. These services not only offer high-level software functionalities such as social networks or e-commerce but also middleware tools that simplify application development and low-level data storage, processing, and networking resources. Hence, with the advent of the Cloud computing paradigm, today's ICT allows users to completely outsource their IT infrastructure and benefit significantly from the economies of scale. At the same time, with the widespread use of ICT, the amount of data being generated, stored and processed by private companies, public organizations and individuals is rapidly increasing. The in-house management of data and applications is proving to be highly cost intensive and Cloud computing is becoming the destination of choice for increasing number of users. As a consequence, Cloud computing services are being used to realize a wide range of applications, each having unique dependability and Quality-of-Service (Qos) requirements. For example, a small enterprise may use a Cloud storage service as a simple backup solution, requiring high data availability, while a large government organization may execute a real-time mission-critical application using the Cloud compute service, requiring high levels of dependability (e.g., reliability, availability, security) and performance. Service providers are presently able to offer sufficient resource heterogeneity, but are failing to satisfy users' dependability requirements mainly because the failures and vulnerabilities in Cloud infrastructures are a norm rather than an exception. This thesis provides a comprehensive solution for improving the dependability of Cloud computing -- so that -- users can justifiably trust Cloud computing services for building, deploying and executing their applications. A number of approaches ranging from the use of trustworthy hardware to secure application design has been proposed in the literature. The proposed solution consists of three inter-operable yet independent modules, each designed to improve dependability under different system context and/or use-case. A user can selectively apply either a single module or combine them suitably to improve the dependability of her applications both during design time and runtime. Based on the modules applied, the overall proposed solution can increase dependability at three distinct levels. In the following, we provide a brief description of each module. The first module comprises a set of assurance techniques that validates whether a given service supports a specified dependability property with a given level of assurance, and accordingly, awards it a machine-readable certificate. To achieve this, we define a hierarchy of dependability properties where a property represents the dependability characteristics of the service and its specific configuration. A model of the service is also used to verify the validity of the certificate using runtime monitoring, thus complementing the dynamic nature of the Cloud computing infrastructure and making the certificate usable both at discovery and runtime. This module also extends the service registry to allow users to select services with a set of certified dependability properties, hence offering the basic support required to implement dependable applications. We note that this module directly considers services implemented by service providers and provides awareness tools that allow users to be aware of the QoS offered by potential partner services. We denote this passive technique as the solution that offers first level of dependability in this thesis. Service providers typically implement a standard set of dependability mechanisms that satisfy the basic needs of most users. Since each application has unique dependability requirements, assurance techniques are not always effective, and a pro-active approach to dependability management is also required. The second module of our solution advocates the innovative approach of offering dependability as a service to users' applications and realizes a framework containing all the mechanisms required to achieve this. We note that this approach relieves users from implementing low-level dependability mechanisms and system management procedures during application development and satisfies specific dependability goals of each application. We denote the module offering dependability as a service as the solution that offers second level of dependability in this thesis. The third, and the last, module of our solution concerns secure application execution. This module considers complex applications and presents advanced resource management schemes that deploy applications with improved optimality when compared to the algorithms of the second module. This module improves dependability of a given application by minimizing its exposure to existing vulnerabilities, while being subject to the same dependability policies and resource allocation conditions as in the second module. Our approach to secure application deployment and execution denotes the third level of dependability offered in this thesis. The contributions of this thesis can be summarized as follows.The contributions of this thesis can be summarized as follows. \u2022 With respect to assurance techniques our contributions are: i) de finition of a hierarchy of dependability properties, an approach to service modeling, and a model transformation scheme; ii) de finition of a dependability certifi cation scheme for services; iii) an approach to service selection that considers users' dependability requirements; iv) de finition of a solution to dependability certifi cation of composite services, where the dependability properties of a composite service are calculated on the basis of the dependability certi ficates of component services. \u2022 With respect to off ering dependability as a service our contributions are: i) de finition of a delivery scheme that transparently functions on users' applications and satisfi es their dependability requirements; ii) design of a framework that encapsulates all the components necessary to o er dependability as a service to the users; iii) an approach to translate high level users' requirements to low level dependability mechanisms; iv) formulation of constraints that allow enforcement of deployment conditions inherent to dependability mechanisms and an approach to satisfy such constraints during resource allocation; v) a resource management scheme that masks the a ffect of system changes by adapting the current allocation of the application. \u2022 With respect to security management our contributions are: i) an approach that deploys users' applications in the Cloud infrastructure such that their exposure to vulnerabilities is minimized; ii) an approach to build interruptible elastic algorithms whose optimality improves as the processing time increases, eventually converging to an optimal solution

    Dynamic resource provisioning and secured file sharing using virtualization in cloud azure

    Get PDF
    Virtual machines (VMs) are preferred by the majority of organizations due to their high performance. VMs allow for reduced overhead with multiple systems running from the same console at the same time. A physical server is a bare-metal system whose hardware is controlled by the host operating system. A physical server runs on a single instance of OS and application. A virtual server or virtual machine encapsulates the underlying hardware and networking resources. With the existing physical server, it is difficult to migrate the tasks from one platform to another platform or to a datacentre. Centralized security is difficult to setup. But with Hypervisor the virtual machine can be deployed, for instance, with automation. Virtualization cost increases as well as a decrease in hardware and infrastructure space costs. We propose an efficient Azure cloud framework for the utilization of physical server resources at remote VM servers. The proposed framework is implemented in two phases first by integrating physical servers into virtual ones by creating virtual machines, and then by integrating virtual servers into cloud service providers in a cost-effective manner. We create a virtual network in the Azure datacenter using the local host physical server to set up the various virtual machines. Two virtual machine instances, VM1 and VM2, are created using Microsoft Hyper-V with the server Windows 2016 R. The desktop application is deployed and VM performance is monitored using the PowerShell script. Tableau is used to evaluate the physical server functionality of the worksheet for the deployed application. The proposed Physical to Virtual to Cloud model (P2V2C) model is being tested, and the performance result shows that P2V2C migration is more successful in dynamic provisioning than direct migration to cloud platform infrastructure. The research work was carried out in a secure way through the migration process from P2V2C.Web of Science111art. no. 4

    Improved self-management of datacenter systems applying machine learning

    Get PDF
    Autonomic Computing is a Computer Science and Technologies research area, originated during mid 2000's. It focuses on optimization and improvement of complex distributed computing systems through self-control and self-management. As distributed computing systems grow in complexity, like multi-datacenter systems in cloud computing, the system operators and architects need more help to understand, design and optimize manually these systems, even more when these systems are distributed along the world and belong to different entities and authorities. Self-management lets these distributed computing systems improve their resource and energy management, a very important issue when resources have a cost, by obtaining, running or maintaining them. Here we propose to improve Autonomic Computing techniques for resource management by applying modeling and prediction methods from Machine Learning and Artificial Intelligence. Machine Learning methods can find accurate models from system behaviors and often intelligible explanations to them, also predict and infer system states and values. These models obtained from automatic learning have the advantage of being easily updated to workload or configuration changes by re-taking examples and re-training the predictors. So employing automatic modeling and predictive abilities, we can find new methods for making "intelligent" decisions and discovering new information and knowledge from systems. This thesis departs from the state of the art, where management is based on administrators expertise, well known data, ad-hoc studied algorithms and models, and elements to be studied from computing machine point of view; to a novel state of the art where management is driven by models learned from the same system, providing useful feedback, making up for incomplete, missing or uncertain data, from a global network of datacenters point of view. - First of all, we cover the scenario where the decision maker works knowing all pieces of information from the system: how much will each job consume, how is and will be the desired quality of service, what are the deadlines for the workload, etc. All of this focusing on each component and policy of each element involved in executing these jobs. -Then we focus on the scenario where instead of fixed oracles that provide us information from an expert formula or set of conditions, machine learning is used to create these oracles. Here we look at components and specific details while some part of the information is not known and must be learned and predicted. - We reduce the problem of optimizing resource allocations and requirements for virtualized web-services to a mathematical problem, indicating each factor, variable and element involved, also all the constraints the scheduling process must attend to. The scheduling problem can be modeled as a Mixed Integer Linear Program. Here we face an scenario of a full datacenter, further we introduce some information prediction. - We complement the model by expanding the predicted elements, studying the main resources (this is CPU, Memory and IO) that can suffer from noise, inaccuracy or unavailability. Once learning predictors for certain components let the decision making improve, the system can become more ¿expert-knowledge independent¿ and research can focus on an scenario where all the elements provide noisy, uncertainty or private information. Also we introduce to the management optimization new factors as for each datacenter context and costs may change, turning the model as "multi-datacenter" - Finally, we review of the cost of placing datacenters depending on green energy sources, and distribute the load according to green energy availability

    Reliable and energy efficient resource provisioning in cloud computing systems

    Get PDF
    Cloud Computing has revolutionized the Information Technology sector by giving computing a perspective of service. The services of cloud computing can be accessed by users not knowing about the underlying system with easy-to-use portals. To provide such an abstract view, cloud computing systems have to perform many complex operations besides managing a large underlying infrastructure. Such complex operations confront service providers with many challenges such as security, sustainability, reliability, energy consumption and resource management. Among all the challenges, reliability and energy consumption are two key challenges focused on in this thesis because of their conflicting nature. Current solutions either focused on reliability techniques or energy efficiency methods. But it has been observed that mechanisms providing reliability in cloud computing systems can deteriorate the energy consumption. Adding backup resources and running replicated systems provide strong fault tolerance but also increase energy consumption. Reducing energy consumption by running resources on low power scaling levels or by reducing the number of active but idle sitting resources such as backup resources reduces the system reliability. This creates a critical trade-off between these two metrics that are investigated in this thesis. To address this problem, this thesis presents novel resource management policies which target the provisioning of best resources in terms of reliability and energy efficiency and allocate them to suitable virtual machines. A mathematical framework showing interplay between reliability and energy consumption is also proposed in this thesis. A formal method to calculate the finishing time of tasks running in a cloud computing environment impacted with independent and correlated failures is also provided. The proposed policies adopted various fault tolerance mechanisms while satisfying the constraints such as task deadlines and utility values. This thesis also provides a novel failure-aware VM consolidation method, which takes the failure characteristics of resources into consideration before performing VM consolidation. All the proposed resource management methods are evaluated by using real failure traces collected from various distributed computing sites. In order to perform the evaluation, a cloud computing framework, 'ReliableCloudSim' capable of simulating failure-prone cloud computing systems is developed. The key research findings and contributions of this thesis are: 1. If the emphasis is given only to energy optimization without considering reliability in a failure prone cloud computing environment, the results can be contrary to the intuitive expectations. Rather than reducing energy consumption, a system ends up consuming more energy due to the energy losses incurred because of failure overheads. 2. While performing VM consolidation in a failure prone cloud computing environment, a significant improvement in terms of energy efficiency and reliability can be achieved by considering failure characteristics of physical resources. 3. By considering correlated occurrence of failures during resource provisioning and VM allocation, the service downtime or interruption is reduced significantly by 34% in comparison to the environments with the assumption of independent occurrence of failures. Moreover, measured by our mathematical model, the ratio of reliability and energy consumption is improved by 14%

    Workload Prediction for Efficient Performance Isolation and System Reliability

    Get PDF
    In large-scaled and distributed systems, like multi-tier storage systems and cloud data centers, resource sharing among workloads brings multiple benefits while introducing many performance challenges. The key to effective workload multiplexing is accurate workload prediction. This thesis focuses on how to capture the salient characteristics of the real-world workloads to develop workload prediction methods and to drive scheduling and resource allocation policies, in order to achieve efficient and in-time resource isolation among applications. For a multi-tier storage system, high-priority user work is often multiplexed with low-priority background work. This brings the challenge of how to strike a balance between maintaining the user performance and maximizing the amount of finished background work. In this thesis, we propose two resource isolation policies based on different workload prediction methods: one is a Markovian model-based and the other is a neural networks-based. These policies aim at, via workload prediction, discovering the opportune time to schedule background work with minimum impact on user performance. Trace-driven simulations verify the efficiency of the two pro- posed resource isolation policies. The Markovian model-based policy successfully schedules the background work at the appropriate periods with small impact on the user performance. The neural networks-based policy adaptively schedules user and background work, resulting in meeting both performance requirements consistently. This thesis also proposes an accurate while efficient neural networks-based pre- diction method for data center usage series, called PRACTISE. Different from the traditional neural networks for time series prediction, PRACTISE selects the most informative features from the past observations of the time series itself. Testing on a large set of usage series in production data centers illustrates the accuracy (e.g., prediction error) and efficiency (e.g., time cost) of PRACTISE. The superiority of the usage prediction also allows a proactive resource management in the highly virtualized cloud data centers. In this thesis, we analyze on the performance tickets in the cloud data centers, and propose an active sizing algorithm, named ATM, that predicts the usage workloads and re-allocates capacity to work- loads to avoid VM performance tickets. Moreover, driven by cheap prediction of usage tails, we also present TailGuard in this thesis, which dynamically clones VMs among co-located boxes, in order to efficiently reduce the performance violations of physical boxes in cloud data centers

    Autonomous migration of vertual machines for maximizing resource utilization

    Get PDF
    Virtualization of computing resources enables multiple virtual machines to run on a physical machine. When many virtual machines are deployed on a cluster of PCs, some physical machines will inevitably experience overload while others are under-utilized over time due to varying computational demands. This computational imbalance across the cluster undermines the very purpose of maximizing resource utilization through virtualization. To solve this imbalance problem, virtual machine migration has been introduced, where a virtual machine on a heavily loaded physical machine is selected and moved to a lightly loaded physical machine. The selection of the source virtual machine and the destination physical machine is based on a single fixed threshold value. Key to such threshold-based VM migration is to determine when to move which VM to what physical machine, since wrong or inadequate decisions can cause unnecessary migrations that would adversely affect the overall performance. The fixed threshold may not necessarily work for different computing infrastructures. Finding the optimal threshold is critical. In this research, a virtual machine migration framework is presented that autonomously finds and adjusts variable thresholds at runtime for different computing requirements to improve and maximize the utilization of computing resources. Central to this approach is the previous history of migrations and their effects before and after each migration in terms of standard deviation of utilization. To broaden this research, a proactive learning methodology is introduced that not only accumulates the past history of computing patterns and resulting migration decisions but more importantly searches all possibilities for the most suitable decisions. This research demonstrates through experimental results that the learning approach autonomously finds thresholds close to the optimal ones for different computing scenarios and that such varying thresholds yield an optimal number of VM migrations for maximizing resource utilization. The proposed framework is set up on a cluster of 8 and 16 PCs, each of which has multiple User-Mode Linux (UML)-based virtual machines. An extensive set of benchmark programs is deployed to closely resemble a real-world computing environment. Experimental results indicate that the proposed framework indeed autonomously finds thresholds close to the optimal ones for different computing scenarios, balances the load across the cluster through autonomous VM migration, and improves the overall performance of the dynamically changing computing environment
    • …
    corecore