494 research outputs found

    Dynamic Virtual Network Restoration with Optimal Standby Virtual Router Selection

    Get PDF
    Title form PDF of title page, viewed on September 4, 2015Dissertation advisor: Deep MedhiVitaIncludes bibliographic references (pages 141-157)Thesis (Ph.D.)--School of Computing and Engineering and Department of Mathematics and Statistics. University of Missouri--Kansas City, 2015Network virtualization technologies allow service providers to request partitioned, QoS guaranteed and fault-tolerant virtual networks provisioned by the substrate network provider (i.e., physical infrastructure provider). A virtualized networking environment (VNE) has common features such as partition, flexibility, etc., but fault-tolerance requires additional efforts to provide survivability against failures on either virtual networks or the substrate network. Two common survivability paradigms are protection (proactive) and restoration (reactive). In the protection scheme, the substrate network provider (SNP) allocates redundant resources (e.g., nodes, paths, bandwidths, etc) to protect against potential failures in the VNE. In the restoration scheme, the SNP dynamically allocates resources to restore the networks, and it usually occurs after the failure is detected. In this dissertation, we design a restoration scheme that can be dynamically implemented in a centralized manner by an SNP to achieve survivability against node failures in the VNE. The proposed restoration scheme is designed to be integrated with a protection scheme, where the SNP allocates spare virtual routers (VRs) as standbys for the virtual networks (VN) and they are ready to serve in the restoration scheme after a node failure has been identified. These standby virtual routers (S-VR) are reserved as a sharedbackup for any single node failure, and during the restoration procedure, one of the S-VR will be selected to replace the failed VR. In this work, we present an optimal S-VR selection approach to simultaneously restore multiple VNs affected by failed VRs, where these VRs may be affected by failures within themselves or at their substrate host (i.e., power outage, hardware failures, maintenance, etc.). Furthermore, the restoration scheme is embedded into a dynamic reconfiguration scheme (DRS), so that the affected VNs can be dynamically restored by a centralized virtual network manager (VNM). We first introduce a dynamic reconfiguration scheme (DRS) against node failures in a VNE, and then present an experimental study by implementing this DRS over a realistic VNE using GpENI testbed. For this experimental study, we ran the DRS to restore one VN with a single-VR failure, and the results showed that with a proper S-VR selection, the performance of the affected VN could be well restored. Next, we proposed an Mixed-Integer Linear Programming (MILP) model with dual–goals to optimally select S-VRs to restore all VNs affected by VR failures while load balancing. We also present a heuristic algorithm based on the model. By considering a number of factors, we present numerical studies to show how the optimal selection is affected. The results show that the proposed heuristic’s performance is close to the optimization model when there were sufficient standby virtual routers for each virtual network and the substrate nodes have the capability to support multiple standby virtual routers to be in service simultaneously. Finally, we present the design of a software-defined resilient VNE with the optimal S-VR selection model, and discuss a prototype implementation on the GENI testbed.Introduction -- Literature survey -- Dynamic reconfiguration scheme in a VNE -- An experimental study on GpENI-VNI -- Optimal standby virtual router selection model -- Prototype design and implementation on GENI -- Conclusion and future work -- Appendix A. Resource Specification (RSpec) in GENI -- Appendix B. Optimal S-VR Selection Model in AMP

    Two levels autonomic resource management in virtualized IaaS

    Get PDF
    International audienceVirtualized cloud infrastructures are very popular as they allow resource mutualization and therefore cost reduction. For cloud providers, minimizing the number of used resources is one of the main services that such environments must ensure. Cloud customers are also concerned with the minimization of used resources in the cloud since they want to reduce their invoice. Thus, resource management in the cloud should be considered by the cloud provider at the virtualization level and by the cloud customers at the application level. Many research works investigate resource management strategies in these two levels. Most of them study virtual machine consolidation (according to the virtualized infrastructure utilization rate) at the virtualized level and dynamic application sizing (according to its workload) at the application level. However, these strategies are studied separately. In this article, we show that virtual machine consolidation and dynamic application sizing are complementary. We show the efficiency of the combination of these two strategies, in reducing resource usage and keeping an application’s Quality of Service. Our demonstration is done by comparing the evaluation of three resource management strategies (implemented at the virtualization level only, at the application level only, or complementary at both levels) in a private cloud infrastructure, hosting typical JEE web applications (evaluated with the RUBiS benchmark)

    Multi-dimensional optimization for cloud based multi-tier applications

    Get PDF
    Emerging trends toward cloud computing and virtualization have been opening new avenues to meet enormous demands of space, resource utilization, and energy efficiency in modern data centers. By being allowed to host many multi-tier applications in consolidated environments, cloud infrastructure providers enable resources to be shared among these applications at a very fine granularity. Meanwhile, resource virtualization has recently gained considerable attention in the design of computer systems and become a key ingredient for cloud computing. It provides significant improvement of aggregated power efficiency and high resource utilization by enabling resource consolidation. It also allows infrastructure providers to manage their resources in an agile way under highly dynamic conditions. However, these trends also raise significant challenges to researchers and practitioners to successfully achieve agile resource management in consolidated environments. First, they must deal with very different responsiveness of different applications, while handling dynamic changes in resource demands as applications' workloads change over time. Second, when provisioning resources, they must consider management costs such as power consumption and adaptation overheads (i.e., overheads incurred by dynamically reconfiguring resources). Dynamic provisioning of virtual resources entails the inherent performance-power tradeoff. Moreover, indiscriminate adaptations can result in significant overheads on power consumption and end-to-end performance. Hence, to achieve agile resource management, it is important to thoroughly investigate various performance characteristics of deployed applications, precisely integrate costs caused by adaptations, and then balance benefits and costs. Fundamentally, the research question is how to dynamically provision available resources for all deployed applications to maximize overall utility under time-varying workloads, while considering such management costs. Given the scope of the problem space, this dissertation aims to develop an optimization system that not only meets performance requirements of deployed applications, but also addresses tradeoffs between performance, power consumption, and adaptation overheads. To this end, this dissertation makes two distinct contributions. First, I show that adaptations applied to cloud infrastructures can cause significant overheads on not only end-to-end response time, but also server power consumption. Moreover, I show that such costs can vary in intensity and time scale against workload, adaptation types, and performance characteristics of hosted applications. Second, I address multi-dimensional optimization between server power consumption, performance benefit, and transient costs incurred by various adaptations. Additionally, I incorporate the overhead of the optimization procedure itself into the problem formulation. Typically, system optimization approaches entail intensive computations and potentially have a long delay to deal with a huge search space in cloud computing infrastructures. Therefore, this type of cost cannot be ignored when adaptation plans are designed. In this multi-dimensional optimization work, scalable optimization algorithm and hierarchical adaptation architecture are developed to handle many applications, hosting servers, and various adaptations to support various time-scale adaptation decisions.Ph.D.Committee Chair: Pu, Calton; Committee Member: Liu, Ling; Committee Member: Liu, Xue; Committee Member: Schlichting, Richard; Committee Member: Schwan, Karsten; Committee Member: Yalamanchili, Sudhaka

    Developing resource consolidation frameworks for moldable virtual machines in clouds

    Get PDF
    This paper considers the scenario where multiple clusters of Virtual Machines (i.e., termed Virtual Clusters) are hosted in a Cloud system consisting of a cluster of physical nodes. Multiple Virtual Clusters (VCs) cohabit in the physical cluster, with each VC offering a particular type of service for the incoming requests. In this context, VM consolidation, which strives to use a minimal number of nodes to accommodate all VMs in the system, plays an important role in saving resource consumption. Most existing consolidation methods proposed in the literature regard VMs as “rigid” during consolidation, i.e., VMs’ resource capacities remain unchanged. In VC environments, QoS is usually delivered by a VC as a single entity. Therefore, there is no reason why VMs’ resource capacity cannot be adjusted as long as the whole VC is still able to maintain the desired QoS. Treating VMs as “moldable” during consolidation may be able to further consolidate VMs into an even fewer number of nodes. This paper investigates this issue and develops a Genetic Algorithm (GA) to consolidate moldable VMs. The GA is able to evolve an optimized system state, which represents the VM-to-node mapping and the resource capacity allocated to each VM. After the new system state is calculated by the GA, the Cloud will transit from the current system state to the new one. The transition time represents overhead and should be minimized. In this paper, a cost model is formalized to capture the transition overhead, and a reconfiguration algorithm is developed to transit the Cloud to the optimized system state with low transition overhead. Experiments have been conducted to evaluate the performance of the GA and the reconfiguration algorithm

    Autonomic management of virtualized resources in cloud computing

    Get PDF
    The last five years have witnessed a rapid growth of cloud computing in business, governmental and educational IT deployment. The success of cloud services depends critically on the effective management of virtualized resources. A key requirement of cloud management is the ability to dynamically match resource allocations to actual demands, To this end, we aim to design and implement a cloud resource management mechanism that manages underlying complexity, automates resource provisioning and controls client-perceived quality of service (QoS) while still achieving resource efficiency. The design of an automatic resource management centers on two questions: when to adjust resource allocations and how much to adjust. In a cloud, applications have different definitions on capacity and cloud dynamics makes it difficult to determine a static resource to performance relationship. In this dissertation, we have proposed a generic metric that measures application capacity, designed model-independent and adaptive approaches to manage resources and built a cloud management system scalable to a cluster of machines. To understand web system capacity, we propose to use a metric of productivity index (PI), which is defined as the ratio of yield to cost, to measure the system processing capability online. PI is a generic concept that can be applied to different levels to monitor system progress in order to identify if more capacity is needed. We applied the concept of PI to the problem of overload prevention in multi-tier websites. The overload predictor built on the PI metric shows more accurate and responsive overload prevention compared to conventional approaches. To address the issue of the lack of accurate server model, we propose a model-independent fuzzy control based approach for CPU allocation. For adaptive and stable control performance, we embed the controller with self-tuning output amplification and flexible rule selection. Finally, we build a QoS provisioning framework that supports multi-objective QoS control and service differentiation. Experiments on a virtual cluster with two service classes show the effectiveness of our approach in both performance and power control. To address the problems of complex interplay between resources and process delays in fine-grained multi-resource allocation, we consider capacity management as a decision-making problem and employ reinforcement learning (RL) to optimize the process. The optimization depends on the trial-and-error interactions with the cloud system. In order to improve the initial management performance, we propose a model-based RL algorithm. The neural network based environment model, which is learned from previous management history, generates simulated resource allocations for the RL agent. Experiment results on heterogeneous applications show that our approach makes efficient use of limited interactions and find near optimal resource configurations within 7 steps. Finally, we present a distributed reinforcement learning approach to the cluster-wide cloud resource management. We decompose the cluster-wide resource allocation problem into sub-problems concerning individual VM resource configurations. The cluster-wide allocation is optimized if individual VMs meet their SLA with a high resource utilization. For scalability, we develop an efficient reinforcement learning approach with continuous state space. For adaptability, we use VM low-level runtime statistics to accommodate workload dynamics. Prototyped in a iBalloon system, the distributed learning approach successfully manages 128 VMs on a 16-node close correlated cluster

    Proactive cloud management for highly heterogeneous multi-cloud infrastructures

    Get PDF
    Various literature studies demonstrated that the cloud computing paradigm can help to improve availability and performance of applications subject to the problem of software anomalies. Indeed, the cloud resource provisioning model enables users to rapidly access new processing resources, even distributed over different geographical regions, that can be promptly used in the case of, e.g., crashes or hangs of running machines, as well as to balance the load in the case of overloaded machines. Nevertheless, managing a complex geographically-distributed cloud deploy could be a complex and time-consuming task. Autonomic Cloud Manager (ACM) Framework is an autonomic framework for supporting proactive management of applications deployed over multiple cloud regions. It uses machine learning models to predict failures of virtual machines and to proactively redirect the load to healthy machines/cloud regions. In this paper, we study different policies to perform efficient proactive load balancing across cloud regions in order to mitigate the effect of software anomalies. These policies use predictions about the mean time to failure of virtual machines. We consider the case of heterogeneous cloud regions, i.e regions with different amount of resources, and we provide an experimental assessment of these policies in the context of ACM Framework

    Self-Learning Cloud Controllers: Fuzzy Q-Learning for Knowledge Evolution

    Get PDF
    Cloud controllers aim at responding to application demands by automatically scaling the compute resources at runtime to meet performance guarantees and minimize resource costs. Existing cloud controllers often resort to scaling strategies that are codified as a set of adaptation rules. However, for a cloud provider, applications running on top of the cloud infrastructure are more or less black-boxes, making it difficult at design time to define optimal or pre-emptive adaptation rules. Thus, the burden of taking adaptation decisions often is delegated to the cloud application. Yet, in most cases, application developers in turn have limited knowledge of the cloud infrastructure. In this paper, we propose learning adaptation rules during runtime. To this end, we introduce FQL4KE, a self-learning fuzzy cloud controller. In particular, FQL4KE learns and modifies fuzzy rules at runtime. The benefit is that for designing cloud controllers, we do not have to rely solely on precise design-time knowledge, which may be difficult to acquire. FQL4KE empowers users to specify cloud controllers by simply adjusting weights representing priorities in system goals instead of specifying complex adaptation rules. The applicability of FQL4KE has been experimentally assessed as part of the cloud application framework ElasticBench. The experimental results indicate that FQL4KE outperforms our previously developed fuzzy controller without learning mechanisms and the native Azure auto-scaling
    • 

    corecore