666 research outputs found

    Quantifying cloud performance and dependability:Taxonomy, metric design, and emerging challenges

    Get PDF
    In only a decade, cloud computing has emerged from a pursuit for a service-driven information and communication technology (ICT), becoming a significant fraction of the ICT market. Responding to the growth of the market, many alternative cloud services and their underlying systems are currently vying for the attention of cloud users and providers. To make informed choices between competing cloud service providers, permit the cost-benefit analysis of cloud-based systems, and enable system DevOps to evaluate and tune the performance of these complex ecosystems, appropriate performance metrics, benchmarks, tools, and methodologies are necessary. This requires re-examining old system properties and considering new system properties, possibly leading to the re-design of classic benchmarking metrics such as expressing performance as throughput and latency (response time). In this work, we address these requirements by focusing on four system properties: (i) elasticity of the cloud service, to accommodate large variations in the amount of service requested, (ii) performance isolation between the tenants of shared cloud systems and resulting performance variability, (iii) availability of cloud services and systems, and (iv) the operational risk of running a production system in a cloud environment. Focusing on key metrics for each of these properties, we review the state-of-the-art, then select or propose new metrics together with measurement approaches. We see the presented metrics as a foundation toward upcoming, future industry-standard cloud benchmarks

    End-to-End Trust Fulfillment of Big Data Workflow Provisioning over Competing Clouds

    Get PDF
    Cloud Computing has emerged as a promising and powerful paradigm for delivering data- intensive, high performance computation, applications and services over the Internet. Cloud Computing has enabled the implementation and success of Big Data, a relatively recent phenomenon consisting of the generation and analysis of abundant data from various sources. Accordingly, to satisfy the growing demands of Big Data storage, processing, and analytics, a large market has emerged for Cloud Service Providers, offering a myriad of resources, platforms, and infrastructures. The proliferation of these services often makes it difficult for consumers to select the most suitable and trustworthy provider to fulfill the requirements of building complex workflows and applications in a relatively short time. In this thesis, we first propose a quality specification model to support dual pre- and post-cloud workflow provisioning, consisting of service provider selection and workflow quality enforcement and adaptation. This model captures key properties of the quality of work at different stages of the Big Data value chain, enabling standardized quality specification, monitoring, and adaptation. Subsequently, we propose a two-dimensional trust-enabled framework to facilitate end-to-end Quality of Service (QoS) enforcement that: 1) automates cloud service provider selection for Big Data workflow processing, and 2) maintains the required QoS levels of Big Data workflows during runtime through dynamic orchestration using multi-model architecture-driven workflow monitoring, prediction, and adaptation. The trust-based automatic service provider selection scheme we propose in this thesis is comprehensive and adaptive, as it relies on a dynamic trust model to evaluate the QoS of a cloud provider prior to taking any selection decisions. It is a multi-dimensional trust model for Big Data workflows over competing clouds that assesses the trustworthiness of cloud providers based on three trust levels: (1) presence of the most up-to-date cloud resource verified capabilities, (2) reputational evidence measured by neighboring users and (3) a recorded personal history of experiences with the cloud provider. The trust-based workflow orchestration scheme we propose aims to avoid performance degradation or cloud service interruption. Our workflow orchestration approach is not only based on automatic adaptation and reconfiguration supported by monitoring, but also on predicting cloud resource shortages, thus preventing performance degradation. We formalize the cloud resource orchestration process using a state machine that efficiently captures different dynamic properties of the cloud execution environment. In addition, we use a model checker to validate our monitoring model in terms of reachability, liveness, and safety properties. We evaluate both our automated service provider selection scheme and cloud workflow orchestration, monitoring and adaptation schemes on a workflow-enabled Big Data application. A set of scenarios were carefully chosen to evaluate the performance of the service provider selection, workflow monitoring and the adaptation schemes we have implemented. The results demonstrate that our service selection outperforms other selection strategies and ensures trustworthy service provider selection. The results of evaluating automated workflow orchestration further show that our model is self-adapting, self-configuring, reacts efficiently to changes and adapts accordingly while enforcing QoS of workflows

    Service Quality Assessment for Cloud-based Distributed Data Services

    Full text link
    The issue of less-than-100% reliability and trust-worthiness of third-party controlled cloud components (e.g., IaaS and SaaS components from different vendors) may lead to laxity in the QoS guarantees offered by a service-support system S to various applications. An example of S is a replicated data service to handle customer queries with fault-tolerance and performance goals. QoS laxity (i.e., SLA violations) may be inadvertent: say, due to the inability of system designers to model the impact of sub-system behaviors onto a deliverable QoS. Sometimes, QoS laxity may even be intentional: say, to reap revenue-oriented benefits by cheating on resource allocations and/or excessive statistical-sharing of system resources (e.g., VM cycles, number of servers). Our goal is to assess how well the internal mechanisms of S are geared to offer a required level of service to the applications. We use computational models of S to determine the optimal feasible resource schedules and verify how close is the actual system behavior to a model-computed \u27gold-standard\u27. Our QoS assessment methods allow comparing different service vendors (possibly with different business policies) in terms of canonical properties: such as elasticity, linearity, isolation, and fairness (analogical to a comparative rating of restaurants). Case studies of cloud-based distributed applications are described to illustrate our QoS assessment methods. Specific systems studied in the thesis are: i) replicated data services where the servers may be hosted on multiple data-centers for fault-tolerance and performance reasons; and ii) content delivery networks to geographically distributed clients where the content data caches may reside on different data-centers. The methods studied in the thesis are useful in various contexts of QoS management and self-configurations in large-scale cloud-based distributed systems that are inherently complex due to size, diversity, and environment dynamicity

    Formulating and managing viable SLAs in cloud computing from a small to medium service provider's viewpoint: A state-of-the-art review

    Full text link
    © 2017 Elsevier Ltd In today's competitive world, service providers need to be customer-focused and proactive in their marketing strategies to create consumer awareness of their services. Cloud computing provides an open and ubiquitous computing feature in which a large random number of consumers can interact with providers and request services. In such an environment, there is a need for intelligent and efficient methods that increase confidence in the successful achievement of business requirements. One such method is the Service Level Agreement (SLA), which is comprised of service objectives, business terms, service relations, obligations and the possible action to be taken in the case of SLA violation. Most of the emphasis in the literature has, until now, been on the formation of meaningful SLAs by service consumers, through which their requirements will be met. However, in an increasingly competitive market based on the cloud environment, service providers too need a framework that will form a viable SLA, predict possible SLA violations before they occur, and generate early warning alarms that flag a potential lack of resources. This is because when a provider and a consumer commit to an SLA, the service provider is bound to reserve the agreed amount of resources for the entire period of that agreement – whether the consumer uses them or not. It is therefore very important for cloud providers to accurately predict the likely resource usage for a particular consumer and to formulate an appropriate SLA before finalizing an agreement. This problem is more important for a small to medium cloud service provider which has limited resources that must be utilized in the best possible way to generate maximum revenue. A viable SLA in cloud computing is one that intelligently helps the service provider to determine the amount of resources to offer to a requesting consumer, and there are number of studies on SLA management in the literature. The aim of this paper is two-fold. First, it presents a comprehensive overview of existing state-of-the-art SLA management approaches in cloud computing, and their features and shortcomings in creating viable SLAs from the service provider's viewpoint. From a thorough analysis, we observe that the lack of a viable SLA management framework renders a service provider unable to make wise decisions in forming an SLA, which could lead to service violations and violation penalties. To fill this gap, our second contribution is the proposal of the Optimized Personalized Viable SLA (OPV-SLA) framework which assists a service provider to form a viable SLA and start managing SLA violation before an SLA is formed and executed. The framework also assists a service provider to make an optimal decision in service formation and allocate the appropriate amount of marginal resources. We demonstrate the applicability of our framework in forming viable SLAs through experiments. From the evaluative results, we observe that our framework helps a service provider to form viable SLAs and later to manage them to effectively minimize possible service violation and penalties

    Reducing Run-Time Adaptation Space via Analysis of Possible Utility Bounds

    Get PDF
    Self-adaptive systems often employ dynamic programming or similar techniques to select optimal adaptations at run-time. These techniques suffer from the “curse of dimensionality , increasing the cost of run-time adaptation decisions. We propose a novel approach that improves upon the state-of-the-art proactive self-adaptation techniques to reduce the number of possible adaptations that need be considered for each run-time adaptation decision. The approach, realized in a tool called Thallium, employs a combination of automated formal modeling techniques to (i) analyze a structural model of the system showing which configurations are reachable from other configurations and (ii) compute the utility that can be generated by the optimal adaptation over a bounded horizon in both the best- and worst-case scenarios. It then constructs triangular possibility values using those optimized bounds to automatically compare adjacent adaptations for each configuration, keeping only the alternatives with the best range of potential results. The experimental results corroborate Thallium’s ability to significantly reduce the number of states that need to be considered with each adaptation decision, freeing up vital resources at run-time

    Distributed Power Generation Scheduling, Modelling and Expansion Planning

    Get PDF
    Distributed generation is becoming more important in electrical power systems due to the decentralization of energy production. Within this new paradigm, new approaches for the operation and planning of distributed power generation are yet to be explored. This book deals with distributed energy resources, such as renewable-based distributed generators and energy storage units, among others, considering their operation, scheduling, and planning. Moreover, other interesting aspects such as demand response, electric vehicles, aggregators, and microgrid are also analyzed. All these aspects constitute a new paradigm that is explored in this Special Issue

    Self-adaptivity of applications on network on chip multiprocessors: the case of fault-tolerant Kahn process networks

    Get PDF
    Technology scaling accompanied with higher operating frequencies and the ability to integrate more functionality in the same chip has been the driving force behind delivering higher performance computing systems at lower costs. Embedded computing systems, which have been riding the same wave of success, have evolved into complex architectures encompassing a high number of cores interconnected by an on-chip network (usually identified as Multiprocessor System-on-Chip). However these trends are hindered by issues that arise as technology scaling continues towards deep submicron scales. Firstly, growing complexity of these systems and the variability introduced by process technologies make it ever harder to perform a thorough optimization of the system at design time. Secondly, designers are faced with a reliability wall that emerges as age-related degradation reduces the lifetime of transistors, and as the probability of defects escaping post-manufacturing testing is increased. In this thesis, we take on these challenges within the context of streaming applications running in network-on-chip based parallel (not necessarily homogeneous) systems-on-chip that adopt the no-remote memory access model. In particular, this thesis tackles two main problems: (1) fault-aware online task remapping, (2) application-level self-adaptation for quality management. For the former, by viewing fault tolerance as a self-adaptation aspect, we adopt a cross-layer approach that aims at graceful performance degradation by addressing permanent faults in processing elements mostly at system-level, in particular by exploiting redundancy available in multi-core platforms. We propose an optimal solution based on an integer linear programming formulation (suitable for design time adoption) as well as heuristic-based solutions to be used at run-time. We assess the impact of our approach on the lifetime reliability. We propose two recovery schemes based on a checkpoint-and-rollback and a rollforward technique. For the latter, we propose two variants of a monitor-controller- adapter loop that adapts application-level parameters to meet performance goals. We demonstrate not only that fault tolerance and self-adaptivity can be achieved in embedded platforms, but also that it can be done without incurring large overheads. In addressing these problems, we present techniques which have been realized (depending on their characteristics) in the form of a design tool, a run-time library or a hardware core to be added to the basic architecture

    Enabling Technologies for Cognitive Optical Networks

    Get PDF
    • …
    corecore