1,834 research outputs found

    MACHS: Mitigating the Achilles Heel of the Cloud through High Availability and Performance-aware Solutions

    Get PDF
    Cloud computing is continuously growing as a business model for hosting information and communication technology applications. However, many concerns arise regarding the quality of service (QoS) offered by the cloud. One major challenge is the high availability (HA) of cloud-based applications. The key to achieving availability requirements is to develop an approach that is immune to cloud failures while minimizing the service level agreement (SLA) violations. To this end, this thesis addresses the HA of cloud-based applications from different perspectives. First, the thesis proposes a component’s HA-ware scheduler (CHASE) to manage the deployments of carrier-grade cloud applications while maximizing their HA and satisfying the QoS requirements. Second, a Stochastic Petri Net (SPN) model is proposed to capture the stochastic characteristics of cloud services and quantify the expected availability offered by an application deployment. The SPN model is then associated with an extensible policy-driven cloud scoring system that integrates other cloud challenges (i.e. green and cost concerns) with HA objectives. The proposed HA-aware solutions are extended to include a live virtual machine migration model that provides a trade-off between the migration time and the downtime while maintaining HA objective. Furthermore, the thesis proposes a generic input template for cloud simulators, GITS, to facilitate the creation of cloud scenarios while ensuring reusability, simplicity, and portability. Finally, an availability-aware CloudSim extension, ACE, is proposed. ACE extends CloudSim simulator with failure injection, computational paths, repair, failover, load balancing, and other availability-based modules

    Doctor of Philosophy

    Get PDF
    dissertationWe propose a collective approach for harnessing the idle resources (cpu, storage, and bandwidth) of nodes (e.g., home desktops) distributed across the Internet. Instead of a purely peer-to-peer (P2P) approach, we organize participating nodes to act collectively using collective managers (CMs). Participating nodes provide idle resources to CMs, which unify these resources to run meaningful distributed services for external clients. We do not assume altruistic users or employ a barter-based incentive model; instead, participating nodes provide resources to CMs for long durations and are compensated in proportion to their contribution. In this dissertation we discuss the challenges faced by collective systems, present a design that addresses these challenges, and study the effect of selfish nodes. We believe that the collective service model is a useful alternative to the dominant pure P2P and centralized work queue models. It provides more effective utilization of idle resources, has a more meaningful economic model, and is better suited for building legal and commercial distributed services. We demonstrate the value of our work by building two distributed services using the collective approach. These services are a collective content distribution service and a collective data backup service

    A predictive fault-tolerance framework for IoT systems

    Get PDF
    As Internet of Things (IoT) systems scale, attributes such as availability, reliability, safety, maintainability, security, and performance become increasingly more important. A key challenge to realise IoT is how to provide a dependable infrastructure for the billions of expected IoT devices. A dependable IoT system is one that can defensibly be trusted to deliver its intended service within a given time period. To define a FT-support solution that is applicable to all IoT systems, it is important that error definition is a generic, language-agnostic process, so that FT can be applied as a software pattern. It must also be interoperable, so that FT support can be easily 'plugged into' any existing IoT system, which is facilitated by an adherence to standards and protocols. Lastly, it is important that FT support is, itself, fault tolerant, so that it can be depended on to provide correct support for IoT systems. The work in this thesis considers how real-time and historical data analysis techniques can be combined to monitor an IoT environment and analyse its short- and long-term data to make the system as resilient to failure as possible. Specifically, complex event processing (CEP) is proposed for real-time error detection based on the analysis of stream data in an IoT system, where errors are defined as nondeterministic finite automata (NFA). For long-term error analysis, machine learning (ML) is proposed to predict when an error is likely to occur and mitigate imminent system faults based on previous experience of erroneous system behaviour in the IoT system. The contribution is threefold: (1) a language-agnostic approach to error definition using NFAs, designed to provide 'FT as a service' for easy deployment and integration into existing IoT systems; (2) an implementation of NFAs on a bespoke CEP system, BoboCEP, that provides distributed, resilient event processing at the network edge via active replication; and (3) a ML approach to intelligent FT that can learn from system errors over time to ensure correct long-term FT support. The proposed solution was evaluated using two vertical-farming testbeds and a dataset from a real-world vertical farm. Results showed that the proposed solution could detect and predict the successful detection and recovery of erroneous system behaviours. A performance analysis of BoboCEP was conducted with favourable results

    Dvé:Improving DRAM reliability and performance on-demand via coherent replication

    Get PDF

    Big Data Testing Techniques: Taxonomy, Challenges and Future Trends

    Full text link
    Big Data is reforming many industrial domains by providing decision support through analyzing large data volumes. Big Data testing aims to ensure that Big Data systems run smoothly and error-free while maintaining the performance and quality of data. However, because of the diversity and complexity of data, testing Big Data is challenging. Though numerous research efforts deal with Big Data testing, a comprehensive review to address testing techniques and challenges of Big Data is not available as yet. Therefore, we have systematically reviewed the Big Data testing techniques evidence occurring in the period 2010-2021. This paper discusses testing data processing by highlighting the techniques used in every processing phase. Furthermore, we discuss the challenges and future directions. Our findings show that diverse functional, non-functional and combined (functional and non-functional) testing techniques have been used to solve specific problems related to Big Data. At the same time, most of the testing challenges have been faced during the MapReduce validation phase. In addition, the combinatorial testing technique is one of the most applied techniques in combination with other techniques (i.e., random testing, mutation testing, input space partitioning and equivalence testing) to find various functional faults through Big Data testing.Comment: 32 page

    Systems support for genomics computing in cloud environments

    Get PDF
    Genomics research has enormous applications in many areas such as health care, forensic, agriculture, etc. Most recent achievements in this field come from the availability of the unprecedented genomic data. However, new sequencing technologies in genomics keep producing data at a faster pace resulting a very huge amount of data. This poses great challenges on how to store, manage, process and analyze the data efficiently. To deal with these, genomics research groups often equip themselves with a small scale server room composed of high storage capacity and computing ability machines. This solution is not only costly, unscalable but also inefficient. A better solution would be the Cloud Computing with its elasticity and pay-as-you-go economic model. Nevertheless, Cloud Computing only provides the potential infrastructure solution. To address the high-throughput processing challenges, we need to have a suitable programming model. The fundamental idea is to process data in parallel. In existing models, MapReduce appears to be the best candidate because of its extremely scalability. In this work, we plan to develop a domain specific style system to support data management and analysis in genomics using Cloud Computing and MapReduce. Starting from the application layer, we developed a fundamental alignment tool called CloudAligner based on the MapReduce framework that outperformed its counterparts. After that, we continue seeking solutions to improve the system at the infrastructure level. Observing that scientists spend too much time on accessing data from low speed archives (tapes), we developed the Distributed Disk Cache (DiSK), and it was covered in a Master thesis. Another challenge is to enable the system to support differentiated services which are prevalent in Cloud Computing. To address this, we proposed a Differentiated Replication (DiR) mechanism allowing data to be inserted and retrieved with different availability. Another problem that greatly reduces the performance of the system is the heterogeneity of the Cloud. To tame it, we created an Open Reputation model called Opera. It employs vectors to record the behaviors (reputations) of nodes from different aspects. We modified the Hadoop MapReduce scheduler to make use of this information. The results proved that under heterogeneous environments, our system is better than the original Hadoop in terms of job execution time, number of failed/killed tasks, and energy consumption. The last challenge we have dealt with is the data movement since the data in our targeted domain (genomics) is extremely large and is generated with exponential rate. We divided the issue into two categories: internal and external movement. We have successfully developed a cached system to minimize the internal data movement and an easy-to-use tool called SPBD to handle external data movement with minimal respond time
    • …
    corecore