6,751 research outputs found

    Mitigating the Effects of Partial Resource Failures for Cloud Providers

    Get PDF
    Competition for users on a global market is fierce, forcing enterprises to provide for better, faster services while offering the same more cheaply. At the same time, users choose to remain oblivious of the infrastructure behind the service – only demanding that it works. Cloud service failures and inefficient management of such failures can result in significant financial cost, loss of reputation for providers, and drive key customers away. At the same time failure situations can never be completely avoided. To mitigate their effects we present a decision model for providers to help them decide which jobs to keep running and which to cancel in order to minimize loss of revenue and key customers during partial resource failures. The results of the evaluation of the model and its extension show its ability to significantly improve revenue. Furthermore the model can also help to reduce the number of cancelled jobs

    Reliable Provisioning of Spot Instances for Compute-intensive Applications

    Full text link
    Cloud computing providers are now offering their unused resources for leasing in the spot market, which has been considered the first step towards a full-fledged market economy for computational resources. Spot instances are virtual machines (VMs) available at lower prices than their standard on-demand counterparts. These VMs will run for as long as the current price is lower than the maximum bid price users are willing to pay per hour. Spot instances have been increasingly used for executing compute-intensive applications. In spite of an apparent economical advantage, due to an intermittent nature of biddable resources, application execution times may be prolonged or they may not finish at all. This paper proposes a resource allocation strategy that addresses the problem of running compute-intensive jobs on a pool of intermittent virtual machines, while also aiming to run applications in a fast and economical way. To mitigate potential unavailability periods, a multifaceted fault-aware resource provisioning policy is proposed. Our solution employs price and runtime estimation mechanisms, as well as three fault tolerance techniques, namely checkpointing, task duplication and migration. We evaluate our strategies using trace-driven simulations, which take as input real price variation traces, as well as an application trace from the Parallel Workload Archive. Our results demonstrate the effectiveness of executing applications on spot instances, respecting QoS constraints, despite occasional failures.Comment: 8 pages, 4 figure

    MACHS: Mitigating the Achilles Heel of the Cloud through High Availability and Performance-aware Solutions

    Get PDF
    Cloud computing is continuously growing as a business model for hosting information and communication technology applications. However, many concerns arise regarding the quality of service (QoS) offered by the cloud. One major challenge is the high availability (HA) of cloud-based applications. The key to achieving availability requirements is to develop an approach that is immune to cloud failures while minimizing the service level agreement (SLA) violations. To this end, this thesis addresses the HA of cloud-based applications from different perspectives. First, the thesis proposes a component’s HA-ware scheduler (CHASE) to manage the deployments of carrier-grade cloud applications while maximizing their HA and satisfying the QoS requirements. Second, a Stochastic Petri Net (SPN) model is proposed to capture the stochastic characteristics of cloud services and quantify the expected availability offered by an application deployment. The SPN model is then associated with an extensible policy-driven cloud scoring system that integrates other cloud challenges (i.e. green and cost concerns) with HA objectives. The proposed HA-aware solutions are extended to include a live virtual machine migration model that provides a trade-off between the migration time and the downtime while maintaining HA objective. Furthermore, the thesis proposes a generic input template for cloud simulators, GITS, to facilitate the creation of cloud scenarios while ensuring reusability, simplicity, and portability. Finally, an availability-aware CloudSim extension, ACE, is proposed. ACE extends CloudSim simulator with failure injection, computational paths, repair, failover, load balancing, and other availability-based modules

    From cloud computing security towards homomorphic encryption: A comprehensive review

    Get PDF
    “Cloud computing” is a new technology that revolutionized the world of communications and information technologies. It collects a large number of possibilities, facilities, and developments, and uses the combining of various earlier inventions into something new and compelling. Despite all features of cloud computing, it faces big challenges in preserving data confidentiality and privacy. It has been subjected to numerous attacks and security breaches that have prompted people to hesitate to adopt it. This article provided comprehensive literature on the cloud computing concepts with a primary focus on the cloud computing security field, its top threats, and the protection against each one of them. Data security/privacy in the cloud environment is also discussed and homomorphic encryption (HE) was highlighted as a popular technique used to preserve the privacy of sensitive data in many applications of cloud computing. The article aimed to provide an adequate overview of both researchers and practitioners already working in the field of cloud computing security, and for those new in the field who are not yet fully equipped to understand the detailed and complex technical aspects of cloud computing

    Dependability where the mobile world meets the enterprise world

    Get PDF
    As we move toward increasingly larger scales of computing, complexity of systems and networks has increased manifold leading to massive failures of cloud providers (Amazon Cloudfront, November 2014) and geographically localized outages of cellular services (T-Mobile, June 2014). In this dissertation, we investigate the dependability aspects of two of the most prevalent computing platforms today, namely, smartphones and cloud computing. These two seemingly disparate platforms are part of a cohesive story—they interact to provide end-to-end services which are increasingly being delivered over mobile platforms, examples being iCloud, Google Drive and their smartphone counterparts iPhone and Android. ^ In one of the early work on characterizing failures in dominant mobile OSes, we analyzed bug repositories of Android and Symbian and found similarities in their failure modes [ISSRE2010]. We also presented a classification of root causes and quantified the impact of ease of customizing the smartphones on system reliability. Our evaluation of Inter-Component Communication in Android [DSN2012] show an alarming number of exception handling errors where a phone may be crashed by passing it malformed component invocation messages, even from unprivileged applications. In this work, we also suggest language extensions that can mitigate these problems. ^ Mobile applications today are increasingly being used to interact with enterprise-class web services commonly hosted in virtualized environments. Virutalization suffers from the problem of imperfect performance isolation where contention for low-level hardware resources can impact application performance. Through a set of rigorous experiments in a private cloud testbed and in EC2, we show that interference induced performance degradation is a reality. Our experiments have also shown that optimal configuration settings for web servers change during such phases of interference. Based on this observation, we design and implement the IC 2engine which can mitigate effects of interference by reconfiguring web server parameters [MW2014]. We further improve IC 2 by incorporating it into a two-level configuration engine, named ICE, for managing web server clusters [ICAC2015]. Our evaluations show that, compared to an interference agnostic configuration, IC 2 can improve response time of web servers by upto 40%, while ICE can improve response time by up to 94% during phases of interference

    Computing at massive scale: Scalability and dependability challenges

    Get PDF
    Large-scale Cloud systems and big data analytics frameworks are now widely used for practical services and applications. However, with the increase of data volume, together with the heterogeneity of workloads and resources, and the dynamic nature of massive user requests, the uncertainties and complexity of resource management and service provisioning increase dramatically, often resulting in poor resource utilization, vulnerable system dependability, and user-perceived performance degradations. In this paper we report our latest understanding of the current and future challenges in this particular area, and discuss both existing and potential solutions to the problems, especially those concerned with system efficiency, scalability and dependability. We first introduce a data-driven analysis methodology for characterizing the resource and workload patterns and tracing performance bottlenecks in a massive-scale distributed computing environment. We then examine and analyze several fundamental challenges and the solutions we are developing to tackle them, including for example incremental but decentralized resource scheduling, incremental messaging communication, rapid system failover, and request handling parallelism. We integrate these solutions with our data analysis methodology in order to establish an engineering approach that facilitates the optimization, tuning and verification of massive-scale distributed systems. We aim to develop and offer innovative methods and mechanisms for future computing platforms that will provide strong support for new big data and IoE (Internet of Everything) applications

    Resilience Analysis of Service Oriented Collaboration Process Management systems

    Get PDF
    Collaborative business process management allows for the automated coordination of processes involving human and computer actors. In modern economies it is increasingly needed for this coordination to be not only within organizations but also to cross organizational boundaries. The dependence on the performance of other organizations should however be limited, and the control over the own processes is required from a competitiveness perspective. The main objective of this work is to propose an evaluation model for measuring a resilience of a Service Oriented Architecture (SOA) collaborative process management system. In this paper, we have proposed resilience analysis perspectives of SOA collaborative process systems, i.e. overall system perspective, individual process model perspective, individual process instance perspective, service perspective, and resource perspective. A collaborative incident and maintenance notification process system is reviewed for illustrating our resilience analysis. This research contributes to extend SOA collaborative business process management systems with resilience support, not only looking at quantification and identification of resilience factors, but also considering ways of improving the resilience of SOA collaborative process systems through measures at design and run-time

    Maximising revenue in cloud computing markets by means of economically enhanced SLA management

    Get PDF
    This paper proposes a bidirectional communication between market brokers and resource managers in Cloud Computing Markets. This communication is implemented by means of an Economically Enhanced Resource Manager (EERM), that supports the negotiation process by deciding which tasks can be allocated or not, and under which economic and technical conditions. The EERM also uses the economic information that collects from market layers to manage the resources accordingly to concrete BLOs. This paper shows several Business Policies and Rules for maximizing the revenue of a Cloud Provider that sells its services and resources in a market. Their validity is demonstrated through several experiments that shown how the application of these rules can have a positive influence in the revenue and minimize the violations of Service-Level Agreements.Preprin
    • …
    corecore