318 research outputs found

    Virtual Machine Deployment Strategy Based on Improved PSO in Cloud Computing

    Get PDF
    Energy consumption is an important cost driven by growth of computing power, thereby energy conservation has become one of the major problems faced by cloud system. How to maximize the utilization of physical machines, reduce the number of virtual machine migrations, and maintain load balance under the constraints of physical machine resource thresholds that is the effective way to implement energy saving in data center. In the paper, we propose a multi-objective physical model for virtual machine deployment. Then the improved multi-objective particle swarm optimization (TPSO) is applied to virtual machine deployment. Compared to other algorithms, the algorithm has better ergodicity into the initial stage, improves the optimization precision and optimization efficiency of the particle swarm. The experimental results based on CloudSim simulation platform show that the algorithm is effective at improving physical machine resource utilization, reducing resource waste, and improving system load balance

    Distributed Simulated Annealing with MapReduce

    Get PDF
    Simulated annealing’s high computational intensity has stimulated researchers to experiment with various parallel and distributed simulated annealing algorithms for shared memory, message-passing, and hybrid-parallel platforms. MapReduce is an emerging distributed computing framework for large-scale data processing on clusters of commodity servers; to our knowledge, MapReduce has not been used for simulated annealing yet. In this paper, we investigate the applicability of MapReduce to distributed simulated annealing in general, and to the TSP in particular. We (i) design six algorithmic patterns of distributed simulated annealing with MapReduce, (ii) instantiate the patterns into MR implementations to solve a sample TSP problem, and (iii) evaluate the solution quality and the speedup of the implementations on a cloud computing platform, Amazon’s Elastic MapReduce. Some of our patterns integrate simulated annealing with genetic algorithms. The paper can be beneficial for those interested in the potential of MapReduce in computationally intensive nature-inspired methods in general and simulated annealing in particular.https://digitalcommons.chapman.edu/scs_books/1016/thumbnail.jp

    Analysis of power consumption in heterogeneous virtual machine environments

    Get PDF
    Reduction of energy consumption in Cloud computing datacenters today is a hot a research topic, as these consume large amounts of energy. Furthermore, most of the energy is used inefficiently because of the improper usage of computational resources such as CPU, storage and network. A good balance between the computing resources and performed workload is mandatory. In the context of data-intensive applications, a significant portion of energy is consumed just to keep alive virtual machines or to move data around without performing useful computation. Moreover, heterogeneity of resources increases the difficulty degree, when trying to achieve energy efficiency. Power consumption optimization requires identification of those inefficiencies in the underlying system and applications. Based on the relation between server load and energy consumption, we study the efficiency of data-intensive applications, and the penalties, in terms of power consumption, that are introduced by different degrees of heterogeneity of the virtual machines characteristics in a cluster

    Analysis of power consumption in heterogeneous virtual machine environments

    Get PDF
    Reduction of energy consumption in Cloud computing datacenters today is a hot a research topic, as these consume large amounts of energy. Furthermore, most of the energy is used inefficiently because of the improper usage of computational resources such as CPU, storage and network. A good balance between the computing resources and performed workload is mandatory. In the context of data-intensive applications, a significant portion of energy is consumed just to keep alive virtual machines or to move data around without performing useful computation. Moreover, heterogeneity of resources increases the difficulty degree, when trying to achieve energy efficiency. Power consumption optimization requires identification of those inefficiencies in the underlying system and applications. Based on the relation between server load and energy consumption, we study the efficiency of data-intensive applications, and the penalties, in terms of power consumption, that are introduced by different degrees of heterogeneity of the virtual machines characteristics in a cluster

    QoS-guaranteed resource provisioning for cloud-based MapReduce

    Get PDF
    This PhD project has investigated how to guarantee the quality of MapReduce services in cloud computing while minimizing the operational cost of the MapReduce services through dynamic resource provisioning. In this PhD project, a framework for the dynamic resource provisioning has been developed. Meanwhile, theoretical results for the dynamic resource provisioning have been derived, and a set of efficient and effective algorithms used in the framework have been proposed

    A Survey on Automatic Parameter Tuning for Big Data Processing Systems

    Get PDF
    Big data processing systems (e.g., Hadoop, Spark, Storm) contain a vast number of configuration parameters controlling parallelism, I/O behavior, memory settings, and compression. Improper parameter settings can cause significant performance degradation and stability issues. However, regular users and even expert administrators grapple with understanding and tuning them to achieve good performance. We investigate existing approaches on parameter tuning for both batch and stream data processing systems and classify them into six categories: rule-based, cost modeling, simulation-based, experiment-driven, machine learning, and adaptive tuning. We summarize the pros and cons of each approach and raise some open research problems for automatic parameter tuning.Peer reviewe

    Improvement of Data-Intensive Applications Running on Cloud Computing Clusters

    Get PDF
    MapReduce, designed by Google, is widely used as the most popular distributed programming model in cloud environments. Hadoop, an open-source implementation of MapReduce, is a data management framework on large cluster of commodity machines to handle data-intensive applications. Many famous enterprises including Facebook, Twitter, and Adobe have been using Hadoop for their data-intensive processing needs. Task stragglers in MapReduce jobs dramatically impede job execution on massive datasets in cloud computing systems. This impedance is due to the uneven distribution of input data and computation load among cluster nodes, heterogeneous data nodes, data skew in reduce phase, resource contention situations, and network configurations. All these reasons may cause delay failure and the violation of job completion time. One of the key issues that can significantly affect the performance of cloud computing is the computation load balancing among cluster nodes. Replica placement in Hadoop distributed file system plays a significant role in data availability and the balanced utilization of clusters. In the current replica placement policy (RPP) of Hadoop distributed file system (HDFS), the replicas of data blocks cannot be evenly distributed across cluster\u27s nodes. The current HDFS must rely on a load balancing utility for balancing the distribution of replicas, which results in extra overhead for time and resources. This dissertation addresses data load balancing problem and presents an innovative replica placement policy for HDFS. It can perfectly balance the data load among cluster\u27s nodes. The heterogeneity of cluster nodes exacerbates the issue of computational load balancing; therefore, another replica placement algorithm has been proposed in this dissertation for heterogeneous cluster environments. The timing of identifying the straggler map task is very important for straggler mitigation in data-intensive cloud computing. To mitigate the straggler map task, Present progress and Feedback based Speculative Execution (PFSE) algorithm has been proposed in this dissertation. PFSE is a new straggler identification scheme to identify the straggler map tasks based on the feedback information received from completed tasks beside the progress of the current running task. Straggler reduce task aggravates the violation of MapReduce job completion time. Straggler reduce task is typically the result of bad data partitioning during the reduce phase. The Hash partitioner employed by Hadoop may cause intermediate data skew, which results in straggler reduce task. In this dissertation a new partitioning scheme, named Balanced Data Clusters Partitioner (BDCP), is proposed to mitigate straggler reduce tasks. BDCP is based on sampling of input data and feedback information about the current processing task. BDCP can assist in straggler mitigation during the reduce phase and minimize the job completion time in MapReduce jobs. The results of extensive experiments corroborate that the algorithms and policies proposed in this dissertation can improve the performance of data-intensive applications running on cloud platforms

    Task Scheduling in Big Data Platforms: A Systematic Literature Review

    Get PDF
    Context: Hadoop, Spark, Storm, and Mesos are very well known frameworks in both research and industrial communities that allow expressing and processing distributed computations on massive amounts of data. Multiple scheduling algorithms have been proposed to ensure that short interactive jobs, large batch jobs, and guaranteed-capacity production jobs running on these frameworks can deliver results quickly while maintaining a high throughput. However, only a few works have examined the effectiveness of these algorithms. Objective: The Evidence-based Software Engineering (EBSE) paradigm and its core tool, i.e., the Systematic Literature Review (SLR), have been introduced to the Software Engineering community in 2004 to help researchers systematically and objectively gather and aggregate research evidences about different topics. In this paper, we conduct a SLR of task scheduling algorithms that have been proposed for big data platforms. Method: We analyse the design decisions of different scheduling models proposed in the literature for Hadoop, Spark, Storm, and Mesos over the period between 2005 and 2016. We provide a research taxonomy for succinct classification of these scheduling models. We also compare the algorithms in terms of performance, resources utilization, and failure recovery mechanisms. Results: Our searches identifies 586 studies from journals, conferences and workshops having the highest quality in this field. This SLR reports about different types of scheduling models (dynamic, constrained, and adaptive) and the main motivations behind them (including data locality, workload balancing, resources utilization, and energy efficiency). A discussion of some open issues and future challenges pertaining to improving the current studies is provided

    Bio-inspired computation for big data fusion, storage, processing, learning and visualization: state of the art and future directions

    Get PDF
    This overview gravitates on research achievements that have recently emerged from the confluence between Big Data technologies and bio-inspired computation. A manifold of reasons can be identified for the profitable synergy between these two paradigms, all rooted on the adaptability, intelligence and robustness that biologically inspired principles can provide to technologies aimed to manage, retrieve, fuse and process Big Data efficiently. We delve into this research field by first analyzing in depth the existing literature, with a focus on advances reported in the last few years. This prior literature analysis is complemented by an identification of the new trends and open challenges in Big Data that remain unsolved to date, and that can be effectively addressed by bio-inspired algorithms. As a second contribution, this work elaborates on how bio-inspired algorithms need to be adapted for their use in a Big Data context, in which data fusion becomes crucial as a previous step to allow processing and mining several and potentially heterogeneous data sources. This analysis allows exploring and comparing the scope and efficiency of existing approaches across different problems and domains, with the purpose of identifying new potential applications and research niches. Finally, this survey highlights open issues that remain unsolved to date in this research avenue, alongside a prescription of recommendations for future research.This work has received funding support from the Basque Government (Eusko Jaurlaritza) through the Consolidated Research Group MATHMODE (IT1294-19), EMAITEK and ELK ARTEK programs. D. Camacho also acknowledges support from the Spanish Ministry of Science and Education under PID2020-117263GB-100 grant (FightDIS), the Comunidad Autonoma de Madrid under S2018/TCS-4566 grant (CYNAMON), and the CHIST ERA 2017 BDSI PACMEL Project (PCI2019-103623, Spain)
    corecore