1,903 research outputs found
Stochastic Reward Net-based Modeling Approach for Availability Quantification of Data Center Systems
Availability quantification and prediction of IT infrastructure in data centers are of paramount importance for online business enterprises. In this chapter, we present comprehensive availability models for practical case studies in order to demonstrate a state-space stochastic reward net model for typical data center systems for quantitative assessment of system availability. We present stochastic reward net models of a virtualized server system, a data center network based on DCell topology, and a conceptual data center for disaster tolerance. The systems are then evaluated against various metrics of interest, including steady state availability, downtime and downtime cost, and sensitivity analysis
Scheduling of data-intensive workloads in a brokered virtualized environment
Providing performance predictability guarantees is increasingly important in cloud platforms, especially for data-intensive applications, for which performance depends greatly on the available rates of data transfer between the various computing/storage hosts underlying the virtualized resources assigned to the application. With the increased prevalence of brokerage services in cloud platforms, there is a need for resource management solutions that consider the brokered nature of these workloads, as well as the special demands of their intra-dependent components. In this paper, we present an offline mechanism for scheduling batches of brokered data-intensive workloads, which can be extended to an online setting. The objective of the mechanism is to decide on a packing of the workloads in a batch that minimizes the broker's incurred costs, Moreover, considering the brokered nature of such workloads, we define a payment model that provides incentives to these workloads to be scheduled as part of a batch, which we analyze theoretically. Finally, we evaluate the proposed scheduling algorithm, and exemplify the fairness of the payment model in practical settings via trace-based experiments
Availability model for virtualized platforms
Network virtualization is a method of providing virtual instances of physical networks. Virtualized networks are widely used with virtualized servers, forming a powerful dynamically reconfigurable platform. In this paper we discuss the impact of network virtualization on the overall system availability. We describe a system reflecting the network architecture usually deployed in today’s data centres. The proposed system is modelled using Markov chains and fault trees. We compare the availability of virtualized system using standard physique network with the availability of virtualized system using virtualized network. Network virtualization introduces a new software layer to the network architecture. The proposed availability model integrates software failures in addition to the hardware failures. Based on the estimated numerical failure rates, we analyse system’s availability
Modeling and analysis of high availability techniques in a virtualized system
Availability evaluation of a virtualized system is critical to the wide deployment of cloud computing services. Time-based, prediction-based rejuvenation of virtual machines (VM) and virtual machine monitors, VM failover and live VM migration are common high-availability (HA) techniques in a virtualized system. This paper investigates the effect of combination of these availability techniques on VM availability in a virtualized system where various software and hardware failures may occur. For each combination, we construct analytic models rejuvenation mechanisms to improve VM availability; (2) prediction-based rejuvenation enhances VM availability much more than time-based VM rejuvenation when prediction successful probability is above 70%, regardless failover and/or live VM migration is also deployed; (3) failover mechanism outperforms live VM migration, although they can work together for higher availability of VM. In addition, they can combine with software rejuvenation mechanisms for even higher availability; (4) and time interval setting is critical to a time-based rejuvenation mechanism. These analytic results provide guidelines for deploying and parameter setting of HA techniques in a virtualized system
Technical Report on Deploying a highly secured OpenStack Cloud Infrastructure using BradStack as a Case Study
Cloud computing has emerged as a popular paradigm and an attractive model for
providing a reliable distributed computing model.it is increasing attracting
huge attention both in academic research and industrial initiatives. Cloud
deployments are paramount for institution and organizations of all scales. The
availability of a flexible, free open source cloud platform designed with no
propriety software and the ability of its integration with legacy systems and
third-party applications are fundamental. Open stack is a free and opensource
software released under the terms of Apache license with a fragmented and
distributed architecture making it highly flexible. This project was initiated
and aimed at designing a secured cloud infrastructure called BradStack, which
is built on OpenStack in the Computing Laboratory at the University of
Bradford. In this report, we present and discuss the steps required in
deploying a secured BradStack Multi-node cloud infrastructure and conducting
Penetration testing on OpenStack Services to validate the effectiveness of the
security controls on the BradStack platform. This report serves as a practical
guideline, focusing on security and practical infrastructure related issues. It
also serves as a reference for institutions looking at the possibilities of
implementing a secured cloud solution.Comment: 38 pages, 19 figures
Straggler Root-Cause and Impact Analysis for Massive-scale Virtualized Cloud Datacenters
Increased complexity and scale of virtualized distributed systems has resulted in the manifestation of emergent phenomena substantially affecting overall system performance. This phenomena is known as “Long Tail”, whereby a small proportion of task stragglers significantly impede job completion time. While work focuses on straggler detection and mitigation, there is limited work that empirically studies straggler root-cause and quantifies its impact upon system operation. Such analysis is critical to ascertain in-depth knowledge of straggler occurrence for focusing developmental and research efforts towards solving the Long Tail challenge. This paper provides an empirical analysis of straggler root-cause within virtualized Cloud datacenters; we analyze two large-scale production systems to quantify the frequency and impact stragglers impose, and propose a method for conducting root-cause analysis. Results demonstrate approximately 5% of task stragglers impact 50% of total jobs for batch processes, and 53% of stragglers occur due to high server resource utilization. We leverage these findings to propose a method for extreme straggler detection through a combination of offline execution patterns modeling and online analytic agents to monitor tasks at runtime. Experiments show the approach is capable of detecting stragglers less than 11% into their execution lifecycle with 95% accuracy for short duration jobs
- …