90 research outputs found

    Power Management in Heterogeneous MapReduce Cluster

    Get PDF
    The growing expenses of power in data centers as compared to the operation costs has been a concern for the past several decades. It has been predicted that without an intervention, the energy cost will soon outgrow the infrastructure and operation cost. Therefore, it is of great importance to make data center clusters more energy efficient which is critical for avoiding system overheating and failures. In addition, energy inefficiency causes not only the loss of capital but also environmental pollution. Various Power Management(PM) strategies have been developed over the years to make system more energy efficient and to counteract the sharply rising cost of electricity. However, it is still a challenge to make the system both power efficient and computation efficient due to many underlying system constraints. In this thesis, we investigate the Power Management technique in heterogeneous MapReduce clusters while also maintaining the required system QoS (Quality of Service). For a cluster that supports MapReduce jobs, it is necessary to develop a PM technique that also considers the data availability. We develop our PM strategy by exploiting the fact that the servers in the system are underutilized most of the time. Hence, we first develop a model of our testbed and study how the server utilization levels affect the power consumption and the system throughput. With the established models, we form and solve the power optimization problem for heterogeneous MadReduce clusters where we control the server utilization levels intelligently to minimize the total power consumption. We have conducted simulations and shown the power savings achieved using our PM technique. Then we validate some of our simulation results by running experiments in a real testbed. Our simulation and experimental data have shown that our PM strategy works well for heterogeneous MapReduce clusters which consists of different power efficient and inefficient servers. Adviser: Ying L

    High-Performance Modelling and Simulation for Big Data Applications

    Get PDF
    This open access book was prepared as a Final Publication of the COST Action IC1406 “High-Performance Modelling and Simulation for Big Data Applications (cHiPSet)“ project. Long considered important pillars of the scientific method, Modelling and Simulation have evolved from traditional discrete numerical methods to complex data-intensive continuous analytical optimisations. Resolution, scale, and accuracy have become essential to predict and analyse natural and complex systems in science and engineering. When their level of abstraction raises to have a better discernment of the domain at hand, their representation gets increasingly demanding for computational and data resources. On the other hand, High Performance Computing typically entails the effective use of parallel and distributed processing units coupled with efficient storage, communication and visualisation systems to underpin complex data-intensive applications in distinct scientific and technical domains. It is then arguably required to have a seamless interaction of High Performance Computing with Modelling and Simulation in order to store, compute, analyse, and visualise large data sets in science and engineering. Funded by the European Commission, cHiPSet has provided a dynamic trans-European forum for their members and distinguished guests to openly discuss novel perspectives and topics of interests for these two communities. This cHiPSet compendium presents a set of selected case studies related to healthcare, biological data, computational advertising, multimedia, finance, bioinformatics, and telecommunications

    PREDON Scientific Data Preservation 2014

    Get PDF
    LPSC14037Scientific data collected with modern sensors or dedicated detectors exceed very often the perimeter of the initial scientific design. These data are obtained more and more frequently with large material and human efforts. A large class of scientific experiments are in fact unique because of their large scale, with very small chances to be repeated and to superseded by new experiments in the same domain: for instance high energy physics and astrophysics experiments involve multi-annual developments and a simple duplication of efforts in order to reproduce old data is simply not affordable. Other scientific experiments are in fact unique by nature: earth science, medical sciences etc. since the collected data is "time-stamped" and thereby non-reproducible by new experiments or observations. In addition, scientific data collection increased dramatically in the recent years, participating to the so-called "data deluge" and inviting for common reflection in the context of "big data" investigations. The new knowledge obtained using these data should be preserved long term such that the access and the re-use are made possible and lead to an enhancement of the initial investment. Data observatories, based on open access policies and coupled with multi-disciplinary techniques for indexing and mining may lead to truly new paradigms in science. It is therefore of outmost importance to pursue a coherent and vigorous approach to preserve the scientific data at long term. The preservation remains nevertheless a challenge due to the complexity of the data structure, the fragility of the custom-made software environments as well as the lack of rigorous approaches in workflows and algorithms. To address this challenge, the PREDON project has been initiated in France in 2012 within the MASTODONS program: a Big Data scientific challenge, initiated and supported by the Interdisciplinary Mission of the National Centre for Scientific Research (CNRS). PREDON is a study group formed by researchers from different disciplines and institutes. Several meetings and workshops lead to a rich exchange in ideas, paradigms and methods. The present document includes contributions of the participants to the PREDON Study Group, as well as invited papers, related to the scientific case, methodology and technology. This document should be read as a "facts finding" resource pointing to a concrete and significant scientific interest for long term research data preservation, as well as to cutting edge methods and technologies to achieve this goal. A sustained, coherent and long term action in the area of scientific data preservation would be highly beneficial
    • …
    corecore