13 research outputs found

    COMBINATION JOB-DRIVEN ORDERING FOR PRAGMATIC MAPREDUCE METHODS

    Get PDF
    It is cost-efficient for an inhabitant with a restricted total to ratify a practical MapReduce flock by renting multiplex practical secret waiter (VPSs) from a VPS lord and master. To yield an apportion scheduling scenario for this type of computing status, we ask here script an amalgam job-driven scheduling practice (Joss for thick) from a resident’s attitude. Joss produces not only job equalize scheduling, but also map-task matched scheduling and force-task achievement scheduling. Joss companies MapReduce jobs situated engaged adjust and job type and designs an apportion scheduling behavior to appoint each place of jobs. The goal commit enhances data zone for both map tasks and cut down tasks, shun job inanition, and enhance job implementation drama. Two variations of Joss are farther on speaking terms to independently produce an enhance map-data region and a faster task choice. We attend broad experiments to calculate and relate one and the other variations with river scheduling finding located Hadoop. The results show that twain variations outplay the diverse certified finding in provisos of map-data parish, cut down-data district, and chain aloft past incurring serious upkeep. In boost, the couple variations are singly good for extraordinary MapReduce-workload scenarios and produce marvelous job dance in the class of all approved conclusion

    Feedback-Based Resource Allocation in MapReduce-Based Systems

    Get PDF

    Shadow replication: An energy-aware, fault-tolerant computational model for green cloud computing

    Get PDF
    As the demand for cloud computing continues to increase, cloud service providers face the daunting challenge to meet the negotiated SLA agreement, in terms of reliability and timely performance, while achieving cost-effectiveness. This challenge is increasingly compounded by the increasing likelihood of failure in large-scale clouds and the rising impact of energy consumption and CO2 emission on the environment. This paper proposes Shadow Replication, a novel fault-tolerance model for cloud computing, which seamlessly addresses failure at scale, while minimizing energy consumption and reducing its impact on the environment. The basic tenet of the model is to associate a suite of shadow processes to execute concurrently with the main process, but initially at a much reduced execution speed, to overcome failures as they occur. Two computationally-feasible schemes are proposed to achieve Shadow Replication. A performance evaluation framework is developed to analyze these schemes and compare their performance to traditional replication-based fault tolerance methods, focusing on the inherent tradeoff between fault tolerance, the specified SLA and profit maximization. The results show that Shadow Replication leads to significant energy reduction, and is better suited for compute-intensive execution models, where up to 30% more profit increase can be achieved due to reduced energy consumption

    Optimizing MapReduce for Highly Distributed Environments

    Full text link
    MapReduce, the popular programming paradigm for large-scale data processing, has traditionally been deployed over tightly-coupled clusters where the data is already locally available. The assumption that the data and compute resources are available in a single central location, however, no longer holds for many emerging applications in commercial, scientific and social networking domains, where the data is generated in a geographically distributed manner. Further, the computational resources needed for carrying out the data analysis may be distributed across multiple data centers or community resources such as Grids. In this paper, we develop a modeling framework to capture MapReduce execution in a highly distributed environment comprising distributed data sources and distributed computational resources. This framework is flexible enough to capture several design choices and performance optimizations for MapReduce execution. We propose a model-driven optimization that has two key features: (i) it is end-to-end as opposed to myopic optimizations that may only make locally optimal but globally suboptimal decisions, and (ii) it can control multiple MapReduce phases to achieve low runtime, as opposed to single-phase optimizations that may control only individual phases. Our model results show that our optimization can provide nearly 82% and 64% reduction in execution time over myopic and single-phase optimizations, respectively. We have modified Hadoop to implement our model outputs, and using three different MapReduce applications over an 8-node emulated PlanetLab testbed, we show that our optimized Hadoop execution plan achieves 31-41% reduction in runtime over a vanilla Hadoop execution. Our model-driven optimization also provides several insights into the choice of techniques and execution parameters based on application and platform characteristics

    Resource Allocation in Vehicular Cloud Computing

    Get PDF
    Recently, we have witnessed the emergence of Cloud Computing, a paradigm shift adopted by information technology (IT) companies with a large installed infrastructure base that often goes under-utilized. The unmistakable appeal of cloud computing is that it provides scalable access to computing resources and to a multitude of IT services. Cloud computing and cloud IT services have seen and continue to see a phenomenal adoption rate around the world. Recently, Professor Olariu and his coworkers through series of research introduced a new concept, Vehicular Cloud Computing. A Vehicular Cloud (VC) is a network of vehicles in a parking lot that can provide computation services to users. In this model each vehicle is a computation node. Some of the applications of a VC include a datacenter at the airport, a data cloud in a parking lot, and a datacenter at the mall. The defining difference between vehicular and conventional clouds lies in the distributed ownership and, consequently, the unpredictable availability of computational resources. As cars enter and leave the parking lot, new computational resources become available while others depart, creating a dynamic environment where the task of efficiently assigning jobs to cars becomes very challenging. Our main contribution is a number of scheduling and fault-tolerant job assignment strategies, based on redundancy, that mitigate the effect of resource volatility in vehicular clouds. We offer a theoretical analysis of the expected job completion time in the case where cars do not leave during a checkpoint operation and also in the case where cars may leave while checkpointing is in progress, leading to system failure. A comprehensive set of simulations have shown that our theoretical predictions are accurate. We considered two different environments for scheduling strategy: deterministic and stochastic. In a deterministic environment the arrival and departure of cars are known. This scenario is for environments like universities where employees should be present at work with known schedules and the university rents out its employees\u27 cars as computation nodes to provide services as a vehicular cloud. We presented a scheduling model for a vehicular cloud based on mixed integer linear programming. This work investigates a job scheduling problem involving non-preemptive tasks with known processing time where job migration is allowed. Assigning a job to resources is valid if the job has been executed fully and continuously (no interruption). A job cannot be executed in parallel. In our approach, the determination of an optimal job schedule can be formulated as maximizing the utilization of VC and minimizing the number of job migrations. Utilization can be calculated as a time period that vehicles have been used as computation resources. For dynamic environment in terms of resource availability, we presented a stochastic model for job assignment. We proposed to make job assignment in VC fault tolerant by using a variant of the checkpointing strategy. Rather than saving the state of the computation, at regular times, the state of the computation is only recorded as needed. Also, since we do not assume a central server that stores checkpointed images, like conventional cloud providers do, in our strategy checkpointing is performed by a car and the resulting image is stored by the car itself. Once the car leaves, the image is lost. We consider two scenarios: in the first one, cars do not leave during checkpointing; in the second one, cars may leave during checkpointing, leading to system failure. Our main contribution is to offer theoretical predictions of the job execution time in both scenarios mentioned above. A comprehensive set of simulations have shown that our theoretical predictions are accurate

    Llama: Leveraging columnar storage for scalable join processing in mapreduce.

    Get PDF
    Master'sMASTER OF SCIENC

    Stratégies de checkpoint pour protéger les tâches parallèles contre des erreurs ayant des distributions générales

    Get PDF
    This paper studies checkpointing strategies for parallel jobs subject to fail-stop errors. The optimal strategy is well known when failure inter-arrival times obey an Exponential law, but it is unknown for non-memoryless failure distributions. We explain why the latter fact is misunderstood in recent literature. We propose a general strategy that maximizes the expected efficiency until the next failure, and we show that this strategy is asymptotically optimal for very long jobs. Through extensive simulations, we show that the new strategy is always at least as good as the Young/Daly strategy for various failure distributions. For distributions with a high infant mortality (such as LogNormal 2.51 or Weibull 0.5), the execution time is divided by a factor 1.9 on average, and up to a factor 4.2 for recently deployed platforms.Cet article étudie les stratégies de checkpoint pour des tâches parallèles sujettes `a des erreurs fatales. La stratégie optimale est bien connue lorsque les temps d’inter-arrivée des pannes obéissent `a une loi exponentielle, mais elle est inconnue pour les distributions d’erreurs générales. Nous expliquons pourquoi ce dernier fait est mal compris dans la littérature récente. Nous proposons une stratégie générale qui maximise l’efficacité attendue jusqu’`a la prochaine d´défaillance, et nous montrons que cette stratégie est asymptotiquement optimale pour les travaux très longs. Par des simulations extensives, nous montrons que la nouvelle stratégie est toujours au moins aussi bonne que la stratégie de Young/Daly pour diverses distributions de pannes. Pour les distributions avec une mortalité infantile élevée (comme LogNormal 2.51 ou Weibull 0.5), le temps d’exécution est divisé par un facteur 1.9 en moyenne, et jusqu’`a un facteur 4.2 pour des plates-formes récemment déployées

    Towards An Efficient Cloud Computing System: Data Management, Resource Allocation and Job Scheduling

    Get PDF
    Cloud computing is an emerging technology in distributed computing, and it has proved to be an effective infrastructure to provide services to users. Cloud is developing day by day and faces many challenges. One of challenges is to build cost-effective data management system that can ensure high data availability while maintaining consistency. Another challenge in cloud is efficient resource allocation which ensures high resource utilization and high SLO availability. Scheduling, referring to a set of policies to control the order of the work to be performed by a computer system, for high throughput is another challenge. In this dissertation, we study how to manage data and improve data availability while reducing cost (i.e., consistency maintenance cost and storage cost); how to efficiently manage the resource for processing jobs and increase the resource utilization with high SLO availability; how to design an efficient scheduling algorithm which provides high throughput, low overhead while satisfying the demands on completion time of jobs. Replication is a common approach to enhance data availability in cloud storage systems. Previously proposed replication schemes cannot effectively handle both correlated and non-correlated machine failures while increasing the data availability with the limited resource. The schemes for correlated machine failures must create a constant number of replicas for each data object, which neglects diverse data popularities and cannot utilize the resource to maximize the expected data availability. Also, the previous schemes neglect the consistency maintenance cost and the storage cost caused by replication. It is critical for cloud providers to maximize data availability hence minimize SLA (Service Level Agreement) violations while minimize cost caused by replication in order to maximize the revenue. In this dissertation, we build a nonlinear programming model to maximize data availability in both types of failures and minimize the cost caused by replication. Based on the model\u27s solution for the replication degree of each data object, we propose a low-cost multi-failure resilient replication scheme (MRR). MRR can effectively handle both correlated and non-correlated machine failures, considers data popularities to enhance data availability, and also tries to minimize consistency maintenance and storage cost. In current cloud, providers still need to reserve resources to allow users to scale on demand. The capacity offered by cloud offerings is in the form of pre-defined virtual machine (VM) configurations. This incurs resource wastage and results in low resource utilization when the users actually consume much less resource than the VM capacity. Existing works either reallocate the unused resources with no Service Level Objectives (SLOs) for availability\footnote{Availability refers to the probability of an allocated resource being remain operational and accessible during the validity of the contract~\cite{CarvalhoCirne14}.} or consider SLOs to reallocate the unused resources for long-running service jobs. This approach increases the allocated resource whenever it detects that SLO is violated in order to achieve SLO in the long term, neglecting the frequent fluctuations of jobs\u27 resource requirements in real-time application especially for short-term jobs that require fast responses and decision making for resource allocation. Thus, this approach cannot fully utilize the resources to process data because they cannot quickly adjust the resource allocation strategy dealing with the fluctuations of jobs\u27 resource requirements. What\u27s more, the previous opportunistic based resource allocation approach aims at providing long-term availability SLOs with good QoS for long-running jobs, which ensures that the jobs can be finished within weeks or months by providing slighted degraded resources with moderate availability guarantees, but it ignores deadline constraints in defining Quality of Service (QoS) for short-lived jobs requiring online responses in real-time application, thus it cannot truly guarantee the QoS and long-term availability SLOs. To overcome the drawbacks of previous works, we adequately consider the fluctuations of unused resource caused by bursts of jobs\u27 resource demands, and present a cooperative opportunistic resource provisioning (CORP) scheme to dynamically allocate the resource to jobs. CORP leverages complementarity of jobs\u27 requirements on different resource types and utilizes the job packing to reduce the resource wastage and increase the resource utilization. An increasing number of large-scale data analytics frameworks move towards larger degrees of parallelism aiming at high throughput. Scheduling that assigns tasks to workers and preemption that suspends low-priority tasks and runs high-priority tasks are two important functions in such frameworks. There are many existing works on scheduling and preemption in literature to provide high throughput. However, previous works do not substantially consider dependency in increasing throughput in scheduling or preemption. Considering dependency is crucial to increase the overall throughput. Besides, extensive task evictions for preemption increase context switches, which may decrease the throughput. To address the above problems, we propose an efficient scheduling system Dependency-aware Scheduling and Preemption (DSP) to achieve high throughput in scheduling and preemption. First, we build a mathematical model to minimize the makespan with the consideration of task dependency, and derive the target workers for tasks which can minimize the makespan; second, we utilize task dependency information to determine tasks\u27 priorities for preemption; finally, we present a probabilistic based preemption to reduce the numerous preemptions, while satisfying the demands on completion time of jobs. We conduct trace driven simulations on a real-cluster and real-world experiments on Amazon S3/EC2 to demonstrate the efficiency and effectiveness of our proposed system in comparison with other systems. The experimental results show the superior performance of our proposed system. In the future, we will further consider data update frequency to reduce consistency maintenance cost, and we will consider the effects of node joining and node leaving. Also we will consider energy consumption of machines and design an optimal replication scheme to improve data availability while saving power. For resource allocation, we will consider using the greedy approach for deep learning to reduce the computation overhead caused by the deep neural network. Also, we will additionally consider the heterogeneity of jobs (i.e., short jobs and long jobs), and use a hybrid resource allocation strategy to provide SLO availability customization for different job types while increasing the resource utilization. For scheduling, we will aim to handle scheduling tasks with partial dependency, worker failures in scheduling and make our DSP fully distributed to increase its scalability. Finally, we plan to use different workloads and real-world experiment to fully test the performance of our methods and make our preliminary system design more mature

    MACHS: Mitigating the Achilles Heel of the Cloud through High Availability and Performance-aware Solutions

    Get PDF
    Cloud computing is continuously growing as a business model for hosting information and communication technology applications. However, many concerns arise regarding the quality of service (QoS) offered by the cloud. One major challenge is the high availability (HA) of cloud-based applications. The key to achieving availability requirements is to develop an approach that is immune to cloud failures while minimizing the service level agreement (SLA) violations. To this end, this thesis addresses the HA of cloud-based applications from different perspectives. First, the thesis proposes a component’s HA-ware scheduler (CHASE) to manage the deployments of carrier-grade cloud applications while maximizing their HA and satisfying the QoS requirements. Second, a Stochastic Petri Net (SPN) model is proposed to capture the stochastic characteristics of cloud services and quantify the expected availability offered by an application deployment. The SPN model is then associated with an extensible policy-driven cloud scoring system that integrates other cloud challenges (i.e. green and cost concerns) with HA objectives. The proposed HA-aware solutions are extended to include a live virtual machine migration model that provides a trade-off between the migration time and the downtime while maintaining HA objective. Furthermore, the thesis proposes a generic input template for cloud simulators, GITS, to facilitate the creation of cloud scenarios while ensuring reusability, simplicity, and portability. Finally, an availability-aware CloudSim extension, ACE, is proposed. ACE extends CloudSim simulator with failure injection, computational paths, repair, failover, load balancing, and other availability-based modules
    corecore