Search CORE

181 research outputs found

Dynamically Iterative MapReduce

Author: [[corresponding]]Wei Hsin-Wen
Lee Wei-Tsong
Publication venue: 臺北市：臺灣學術網路管理委員會
Publication date
Field of study

[[abstract]]MapReduce is a distributed and parallel computing model for data-intensive tasks with features of optimized scheduling, flexibility, high availability, and high manageability. MapReduce can work on various platforms; however, MapReduce is not suitable for iterative programs because the performance may be lowered by frequent disk I/O operations. In order to improve system performance and resource utilization, we propose a novel MapReduce framework named Dynamically Iterative MapReduce (DIMR) to reduce numbers of disk I/O operations and the consumption of network bandwidth by means of using dynamic task allocation and memory management mechanism. We show that DIMR is promising with detail discussions in this paper.[[notice]]補正完畢[[incitationindex]]SCI[[incitationindex]]EI[[booktype]]紙本[[booktype]]電子

Tamkang University Institutional Repository

BigFCM: Fast, Precise and Scalable FCM on Hadoop

Author: Ghadiri Nasser
Ghaffari Meysam
Nikbakht Mohammad Amin
Publication venue: 'Elsevier BV'
Publication date: 10/05/2016
Field of study

Clustering plays an important role in mining big data both as a modeling technique and a preprocessing step in many data mining process implementations. Fuzzy clustering provides more flexibility than non-fuzzy methods by allowing each data record to belong to more than one cluster to some degree. However, a serious challenge in fuzzy clustering is the lack of scalability. Massive datasets in emerging fields such as geosciences, biology and networking do require parallel and distributed computations with high performance to solve real-world problems. Although some clustering methods are already improved to execute on big data platforms, but their execution time is highly increased for large datasets. In this paper, a scalable Fuzzy C-Means (FCM) clustering named BigFCM is proposed and designed for the Hadoop distributed data platform. Based on the map-reduce programming model, it exploits several mechanisms including an efficient caching design to achieve several orders of magnitude reduction in execution time. Extensive evaluation over multi-gigabyte datasets shows that BigFCM is scalable while it preserves the quality of clustering

arXiv.org e-Print Archive

Western Sydney ResearchDirect

Performance Model of MapReduce Iterative Applications for Hybrid Cloud Bursting

Author: Bogdan Nicolae
Clemente-Castelló Francisco J.
Fernández Fernández Juan Carlos
Mayo Rafael
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

Hybrid cloud bursting (i.e., leasing temporary off-premise cloud resources to boost the overall capacity during peak utilization) can be a cost-effective way to deal with the increasing complexity of big data analytics, especially for iterative applications. However, the low throughput, high latency network link between the on-premise and off-premise resources (“weak link”) makes maintaining scalability difficult. While several data locality techniques have been designed for big data bursting on hybrid clouds, their effectiveness is difficult to estimate in advance. Yet such estimations are critical, because they help users decide whether the extra pay-as-you-go cost incurred by using the off-premise resources justifies the runtime speed-up. To this end, the current paper presents a performance model and methodology to estimate the runtime of iterative MapReduce applications in a hybrid cloud-bursting scenario. The paper focuses on the overhead incurred by the weak link at fine granularity, for both the map and the reduce phases. This approach enables high estimation accuracy, as demonstrated by extensive experiments at scale using a mix of real-world iterative MapReduce applications from standard big data benchmarking suites that cover a broad spectrum of data patterns. Not only are the produced estimations accurate in absolute terms compared with experimental results, but they are also up to an order of magnitude more accurate than applying state-of-art estimation approaches originally designed for single-site MapReduce deployments

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositori Institucional de la Universitat Jaume I

Distributed processing of large remote sensing images using MapReduce - A case of Edge Detection

Author: Tesfamariam Ermias Beyene
Publication venue
Publication date: 07/02/2011
Field of study

Dissertation submitted in partial fulfillment of the requirements for the Degree of Master of Science in Geospatial Technologies.Advances in sensor technology and their ever increasing repositories of the collected data are revolutionizing the mechanisms remotely sensed data are collected, stored and processed. This exponential growth of data archives and the increasing user’s demand for real-and near-real time remote sensing data products has pressurized remote sensing service providers to deliver the required services. The remote sensing community has recognized the challenge in processing large and complex satellite datasets to derive customized products. To address this high demand in computational resources, several efforts have been made in the past few years towards incorporation of high-performance computing models in remote sensing data collection, management and analysis. This study adds an impetus to these efforts by introducing the recent advancements in distributed computing technologies, MapReduce programming paradigm, to the area of remote sensing. The MapReduce model which is developed by Google Inc. encapsulates the efforts of distributed computing in a highly simplified single library. This simple but powerful programming model can provide us distributed environment without having deep knowledge of parallel programming. This thesis presents a MapReduce based processing of large satellite images a use case scenario of edge detection methods. Deriving from the conceptual massive remote sensing image processing applications, a prototype of edge detection methods was implemented on MapReduce framework using its open-source implementation, the Apache Hadoop environment. The experiences of the implementation of the MapReduce model of Sobel, Laplacian, and Canny edge detection methods are presented. This thesis also presents the results of the evaluation the effect of parallelization using MapReduce on the quality of the output and the execution time performance tests conducted based on various performance metrics. The MapReduce algorithms were executed on a test environment on heterogeneous cluster that supports the Apache Hadoop open-source software. The successful implementation of the MapReduce algorithms on a distributed environment demonstrates that MapReduce has a great potential for scaling large-scale remotely sensed images processing and perform more complex geospatial problems

Repositório da Universidade Nova de Lisboa

3rd Many-core Applications Research Community (MARC) Symposium. (KIT Scientific Reports ; 7598)

Author: Becker Jürgen
Göhringer Diana
Hübner Michael
Publication venue: KIT Scientific Publishing, Karlsruhe
Publication date: 01/01/2011
Field of study

This manuscript includes recent scientific work regarding the Intel Single Chip Cloud computer and describes approaches for novel approaches for programming and run-time organization

KITopen