399 research outputs found

    An Approach to Ad hoc Cloud Computing

    Get PDF
    We consider how underused computing resources within an enterprise may be harnessed to improve utilization and create an elastic computing infrastructure. Most current cloud provision involves a data center model, in which clusters of machines are dedicated to running cloud infrastructure software. We propose an additional model, the ad hoc cloud, in which infrastructure software is distributed over resources harvested from machines already in existence within an enterprise. In contrast to the data center cloud model, resource levels are not established a priori, nor are resources dedicated exclusively to the cloud while in use. A participating machine is not dedicated to the cloud, but has some other primary purpose such as running interactive processes for a particular user. We outline the major implementation challenges and one approach to tackling them

    Software-Defined Cloud Computing: Architectural Elements and Open Challenges

    Full text link
    The variety of existing cloud services creates a challenge for service providers to enforce reasonable Software Level Agreements (SLA) stating the Quality of Service (QoS) and penalties in case QoS is not achieved. To avoid such penalties at the same time that the infrastructure operates with minimum energy and resource wastage, constant monitoring and adaptation of the infrastructure is needed. We refer to Software-Defined Cloud Computing, or simply Software-Defined Clouds (SDC), as an approach for automating the process of optimal cloud configuration by extending virtualization concept to all resources in a data center. An SDC enables easy reconfiguration and adaptation of physical resources in a cloud infrastructure, to better accommodate the demand on QoS through a software that can describe and manage various aspects comprising the cloud environment. In this paper, we present an architecture for SDCs on data centers with emphasis on mobile cloud applications. We present an evaluation, showcasing the potential of SDC in two use cases-QoS-aware bandwidth allocation and bandwidth-aware, energy-efficient VM placement-and discuss the research challenges and opportunities in this emerging area.Comment: Keynote Paper, 3rd International Conference on Advances in Computing, Communications and Informatics (ICACCI 2014), September 24-27, 2014, Delhi, Indi

    Performance Evaluation of Structured and Unstructured Data in PIG/HADOOP and MONGO-DB Environments

    Get PDF
    The exponential development of data initially exhibited difficulties for prominent organizations, for example, Google, Yahoo, Amazon, Microsoft, Facebook, Twitter and so forth. The size of the information that needs to be handled by cloud applications is developing significantly quicker than storage capacity. This development requires new systems for managing and breaking down data. The term Big Data is used to address large volumes of unstructured (or semi-structured) and structured data that gets created from different applications, messages, weblogs, and online networking. Big Data is data whose size, variety and uncertainty require new supplementary models, procedures, algorithms, and research to manage and extract value and concealed learning from it. To process more information efficiently and skillfully, for analysis parallelism is utilized. To deal with the unstructured and semi-structured information NoSQL database has been presented. Hadoop better serves the Big Data analysis requirements. It is intended to scale up starting from a single server to a large cluster of machines, which has a high level of adaptation to internal failure. Many business and research institutes such as Facebook, Yahoo, Google, and so on had an expanding need to import, store, and analyze dynamic semi-structured data and its metadata. Also, significant development of semi-structured data inside expansive web-based organizations has prompted the formation of NoSQL data collections for flexible sorting and MapReduce for adaptable parallel analysis. They assessed, used and altered Hadoop, the most popular open source execution of MapReduce, for tending to the necessities of various valid analytics problems. These institutes are also utilizing MongoDB, and a report situated NoSQL store. In any case, there is a limited comprehension of the execution trade-offs of using these two innovations. This paper assesses the execution, versatility, and adaptation to an internal failure of utilizing MongoDB and Hadoop, towards the objective of recognizing the correct programming condition for logical data analytics and research. Lately, an expanding number of organizations have developed diverse, distinctive kinds of non-relational databases (such as MongoDB, Cassandra, Hypertable, HBase/ Hadoop, CouchDB and so on), generally referred to as NoSQL databases. The enormous amount of information generated requires an effective system to analyze the data in various scenarios, under various breaking points. In this paper, the objective is to find the break-even point of both Hadoop/Pig and MongoDB and develop a robust environment for data analytics

    Mapreduce and Heterogeneity: Power-Aware Bag-of-Tasks, Framework Parameter Sensitivity, and Dynamic Cluster Aware Framework Configuration

    Get PDF
    This dissertation presents the techniques for adaptation of MapReduce frameworks to incorporate heterogeneity-aware scheduling algorithms, an inspection of cluster configurations and how they impact these scheduling algorithms, an analysis regarding how the cluster configuration and the heterogeneity-aware scheduling can work together to minimize turnaround time and/or power consumption of the cluster when executing MapReduce applications, and how these lessons can be applied more broadly to Big Data infrastructure outside of MapReduce that supports multiple Big Data frameworks simultaneously. Heterogeneity exists in various capacities in any given cluster, from static (Physical and Platform) heterogeneity to dynamic heterogeneity (Transient Data, Transient Applications, and Irregular Hardware Behavior). Within the cluster there are historically several types of mitigation strategies for each of these types of heterogeneity, and each has their pros and cons. We discuss these mitigation strategies and the types of heterogeneity each of these strategies is able to address, and the history of the related work in the field. After this, we consider taking host-level metrics and using them to schedule tasks in real time, with a desire to address cluster-wide energy usage. To do this, we consider estimators for power consumption that are available on-chip, namely temperature. We establish a correlation between CPU temperature and power consumption, then derive a scheduling algorithm that eliminates nodes that are consuming too much power from the pool of schedule-able resources. In order to do this we focus on the ability of MapReduce frameworks, constructed as we have constructed the frameworks described in this thesis, to delay binding of tasks to specific workers. We analyze the impacts this has on turnaround time of a MapReduce application, with analysis around setting this threshold properly to reduce impact on turnaround time while shifting power consumption around in the cluster, away from nodes that are over-consuming. We also address concerns with respect to upgrading a cluster in stages, introducing more Physical Heterogeneity at various levels and the types of adjustments that need to be made to MapReduce configurations in order to combat the increased Heterogeneity. In particular, we look at the concerns for MapReduce platform mis-configuration and its impacts on turnaround time, analyzing the ways in which these types of errors can be mitigated between incremental platform upgrades. In an effort to address this, we introduce a Dynamic Heterogeneity Awareness (DHA) module to our MapReduce framework in order to address these upgrades, and allow better spreading of tasks by the framework, in order to further improve turnaround time and resource utilization. Finally we consider the implications for framework and application co-tenancy, and we describe the state of art in these areas. We focus on describing what co-tenancy is, why it\u27s important, and how the state of the art can be expanded to in order to leverage findings from this thesis to make these co-tenant clusters increase application and framework performance as well as improving these clusters with considerations for energy efficiency

    Towards efficient resource provisioning in MapReduce

    Get PDF
    The paper presents a novel approach and algorithm with mathematical formula for obtaining the exact optimal number of task resources for any workload running on HadoopMapReduce. In the era of Big Data, energy efficiency has become an important issue for the ubiquitous Hadoop MapReduce framework. However, the question of what is the optimal number of tasks required for a job to get the most efficient performance from MapReduce still has no definite answer. Our algorithm for optimal resource provisioning allows users to identify the best trade-off point between performance and energy efficiency on the runtime elbow curve fitted from sampled executions on the target cluster for subsequent behavioral replication. Our verification and comparison show that the currently well-known rules of thumb for calculating the required number of reduce tasks are inaccurate and could lead to significant waste of computing resources and energy with no further improvement in execution time

    Energy Efficient Data-Intensive Computing With Mapreduce

    Get PDF
    Power and energy consumption are critical constraints in data center design and operation. In data centers, MapReduce data-intensive applications demand significant resources and energy. Recognizing the importance and urgency of optimizing energy usage of MapReduce applications, this work aims to provide instrumental tools to measure and evaluate MapReduce energy efficiency and techniques to conserve energy without impacting performance. Energy conservation for data-intensive computing requires enabling technology to provide detailed and systemic energy information and to identify in the underlying system hardware and software. To address this need, we present eTune, a fine-grained, scalable energy profiling framework for data-intensive computing on large-scale distributed systems. eTune leverages performance monitoring counters (PMCs) on modern computer components and statistically builds power-performance correlation models. Using learned models, eTune augments direct measurement with a software-based power estimator that runs on compute nodes and reports power at multiple levels including node, core, memory, and disks with high accuracy. Data-intensive computing differs from traditional high performance computing as most execution time is spent in moving data between storage devices, nodes, and components. Since data movements are potential performance and energy bottlenecks, we propose an analysis framework with methods and metrics for evaluating and characterizing costly built-in MapReduce data movements. The revealed data movement energy characteristics can be exploited in system design and resource allocation to improve data-intensive computing energy efficiency. Finally, we present an optimization technique that targets inefficient built-in MapReduce data movements to conserve energy without impacting performance. The optimization technique allocates the optimal number of compute nodes to applications and dynamically schedules processor frequency during its execution based on data movement characteristics. Experimental results show significant energy savings, though improvements depend on both workload characteristics and policies of resource and dynamic voltage and frequency scheduling. As data volume doubles every two years and more data centers are put into production, energy consumption is expected to grow further. We expect these studies provide direction and insight in building more energy efficient data-intensive systems and applications, and the tools and techniques are adopted by other researchers for their energy efficient studies

    Towards Efficient Resource Provisioning in Hadoop

    Get PDF
    Considering recent exponential growth in the amount of information processed in Big Data, the high energy consumed by data processing engines in datacenters has become a major issue, underlining the need for efficient resource allocation for better energy-efficient computing. This thesis proposes the Best Trade-off Point (BToP) method which provides a general approach and techniques based on an algorithm with mathematical formulas to find the best trade-off point on an elbow curve of performance vs. resources for efficient resource provisioning in Hadoop MapReduce and Apache Spark. Our novel BToP method is expected to work for any applications and systems which rely on a tradeoff curve with an elbow shape, non-inverted or inverted, for making good decisions. This breakthrough method for optimal resource provisioning was not available before in the scientific, computing, and economic communities. To illustrate the effectiveness of the BToP method on the ubiquitous Hadoop MapReduce, our Terasort experiment shows that the number of task resources recommended by the BToP algorithm is always accurate and optimal when compared to the ones suggested by three popular rules of thumbs. We also test the BToP method on the emerging cluster computing framework Apache Spark running in YARN cluster mode. Despite the effectiveness of Spark’s robust and sophisticated built-in dynamic resource allocation mechanism, which is not available in MapReduce, the BToP method could still consistently outperform it according to our Spark-Bench Terasort test results. The performance efficiency gained from the BToP method not only leads to significant energy saving but also improves overall system throughput and prevents cluster underutilization in a multi-tenancy environment. In General, the BToP method is preferable for workloads with identical resource consumption signatures in production environment where job profiling for behavioral replication will lead to the most efficient resource provisioning

    A hierarchic task-based programming model for distributed heterogeneous computing

    Get PDF
    Distributed computing platforms are evolving to heterogeneous ecosystems with Clusters, Grids and Clouds introducing in its computing nodes, processors with different core architectures, accelerators (i.e. GPUs, FPGAs), as well as different memories and storage devices in order to achieve better performance with lower energy consumption. As a consequence of this heterogeneity, programming applications for these distributed heterogeneous platforms becomes a complex task. Additionally to the complexity of developing an application for distributed platforms, developers must also deal now with the complexity of the different computing devices inside the node. In this article, we present a programming model that aims to facilitate the development and execution of applications in current and future distributed heterogeneous parallel architectures. This programming model is based on the hierarchical composition of the COMP Superscalar and Omp Superscalar programming models that allow developers to implement infrastructure-agnostic applications. The underlying runtime enables applications to adapt to the infrastructure without the need of maintaining different versions of the code. Our programming model proposal has been evaluated on real platforms, in terms of heterogeneous resource usage, performance and adaptation.This work has been supported by the European Commission through the Horizon 2020 Research and Innovation program under contract 687584 (TANGO project) by the Spanish Government under contract TIN2015-65316 and grant SEV-2015-0493 (Severo Ochoa Program) and by Generalitat de Catalunya under contracts 2014-SGR-1051 and 2014-SGR-1272.Peer ReviewedPostprint (author's final draft
    • …
    corecore