Search CORE

6,538 research outputs found

MAPREDUCE CHALLENGES ON PERVASIVE GRIDS

Author: Cassales Guilherme,
Cogorno Matías
Flauzac Olivier
Kirsch Pinheiro Manuele
Nesmachnow Sergio
Pitthan Barcelos Patricia
Rey J.
Schwertner Charão Andrea
Souveyet Carine
Steffenel Luiz Angelo
Stein Benhur
Publication venue: 'Science Publications'
Publication date: 24/07/2014
Field of study

International audienceThis study presents the advances on designing and implementing scalable techniques to support the development and execution of MapReduce application in pervasive distributed computing infrastructures, in the context of the PER-MARE project. A pervasive framework for MapReduce applications is very useful in practice, especially in those scientific, enterprises and educational centers which have many unused or underused computing resources, which can be fully exploited to solve relevant problems that demand large computing power, such as scientific computing applications, big data processing, etc. In this study, we pro-pose the study of multiple techniques to support volatility and heterogeneity on MapReduce, by applying two complementary approaches: Improving the Apache Hadoop middleware by including context-awareness and fault-tolerance features; and providing an alternative pervasive grid implementation, fully adapted to dynamic environments. The main design and implementation decisions for both alternatives are described and validated through experiments, demonstrating that our approaches provide high reliability when executing on pervasive environments. The analysis of the experiments also leads to several insights on the requirements and constraints from dynamic and volatile systems, reinforcing the importance of context-aware information and advanced fault-tolerance features to provide efficient and reliable MapReduce services on pervasive grids

Crossref

HAL Descartes

HAL-Paris1

PER-MARE: Adaptive Deployment of MapReduce over Pervasive Grids

Author: Diaz Daniel
Flauzac Olivier
Kirsch Pinheiro Manuele
Nesmachnow Sergio
Pitthan Barcelos Patricia
Schwertner Charão Andrea
Steffenel Luiz Angelo
Stein Benhur
Publication venue: HAL CCSD
Publication date: 28/10/2013
Field of study

International audienceMapReduce is a parallel programming paradigm successfully used to perform computations on massive amounts of data, being widely deployed on clusters, grid, and cloud infrastructures. Interestingly, while the emergence of cloud in- frastructures has opened new perspectives, several enterprises hesitate to put sensible data on the cloud and prefer to rely on internal resources. In this paper we introduce the PER- MARE initiative, which aims at proposing scalable techniques to support existent MapReduce data-intensive applications in the context of loosely coupled networks such as pervasive and desktop grids. By relying on the MapReduce programming model, PER-MARE proposes to explore the potential advan- tages of using free unused resources available at enterprises as pervasive grids, alone or in a hybrid environment. This paper presents the main lines that orient the PER-MARE approach and some preliminary results

HAL-Paris1

Querying Large Physics Data Sets Over an Information Grid

Author: Baker Nigel
Brooks Peter
Goff Jean-Marie Le
Kovacs Zsolt
McClatchey Richard
Publication venue
Publication date: 01/01/2001
Field of study

Optimising use of the Web (WWW) for LHC data analysis is a complex problem and illustrates the challenges arising from the integration of and computation across massive amounts of information distributed worldwide. Finding the right piece of information can, at times, be extremely time-consuming, if not impossible. So-called Grids have been proposed to facilitate LHC computing and many groups have embarked on studies of data replication, data migration and networking philosophies. Other aspects such as the role of 'middleware' for Grids are emerging as requiring research. This paper positions the need for appropriate middleware that enables users to resolve physics queries across massive data sets. It identifies the role of meta-data for query resolution and the importance of Information Grids for high-energy physics analysis rather than just Computational or Data Grids. This paper identifies software that is being implemented at CERN to enable the querying of very large collaborating HEP data-sets, initially being employed for the construction of CMS detectors.Comment: 4 pages, 3 figure

arXiv.org e-Print Archive

CiteSeerX

CERN Document Server

InterCloud: Utility-Oriented Federation of Cloud Computing Environments for Scaling of Application Services

Author: A. Weiss
C. Vecchiola
L. Kleinrock
P. Barham
R. Buyya
R. Buyya
R. Buyya
R. Buyya
X. Chu
Publication venue
Publication date: 01/01/2010
Field of study

Cloud computing providers have setup several data centers at different geographical locations over the Internet in order to optimally serve needs of their customers around the world. However, existing systems do not support mechanisms and policies for dynamically coordinating load distribution among different Cloud-based data centers in order to determine optimal location for hosting application services to achieve reasonable QoS levels. Further, the Cloud computing providers are unable to predict geographic distribution of users consuming their services, hence the load coordination must happen automatically, and distribution of services must change in response to changes in the load. To counter this problem, we advocate creation of federated Cloud computing environment (InterCloud) that facilitates just-in-time, opportunistic, and scalable provisioning of application services, consistently achieving QoS targets under variable workload, resource and network conditions. The overall goal is to create a computing environment that supports dynamic expansion or contraction of capabilities (VMs, services, storage, and database) for handling sudden variations in service demands. This paper presents vision, challenges, and architectural elements of InterCloud for utility-oriented federation of Cloud computing environments. The proposed InterCloud environment supports scaling of applications across multiple vendor clouds. We have validated our approach by conducting a set of rigorous performance evaluation study using the CloudSim toolkit. The results demonstrate that federated Cloud computing model has immense potential as it offers significant performance gains as regards to response time and cost saving under dynamic workload scenarios.Comment: 20 pages, 4 figures, 3 tables, conference pape

arXiv.org e-Print Archive

CiteSeerX

Crossref

Global Grids and Software Toolkits: A Study of Four Grid Middleware Technologies

Author: Asadzadeh Parvin
Buyya Rajkumar
Kei Chun Ling
Nayar Deepa
Venugopal Srikumar
Publication venue
Publication date: 01/07/2004
Field of study

Grid is an infrastructure that involves the integrated and collaborative use of computers, networks, databases and scientific instruments owned and managed by multiple organizations. Grid applications often involve large amounts of data and/or computing resources that require secure resource sharing across organizational boundaries. This makes Grid application management and deployment a complex undertaking. Grid middlewares provide users with seamless computing ability and uniform access to resources in the heterogeneous Grid environment. Several software toolkits and systems have been developed, most of which are results of academic research projects, all over the world. This chapter will focus on four of these middlewares--UNICORE, Globus, Legion and Gridbus. It also presents our implementation of a resource broker for UNICORE as this functionality was not supported in it. A comparison of these systems on the basis of the architecture, implementation model and several other features is included.Comment: 19 pages, 10 figure

arXiv.org e-Print Archive

Enlighten

Recommended from our members

Speeding-up the execution of credit risk simulations using desktop grid computing: A case study

Author: Mustafee N
Taylor S J E
Publication venue: 'The Korean Brain Tumor Society, The Korean Society for Neuro-Oncology'
Publication date: 01/01/2010
Field of study

This paper describes a case study that was undertaken at a leading European Investment bank in which desktop grid computing was used to speed-up the execution of Monte Carlo credit risk simulations. The credit risk simulations were modelled using commercial-off-the-shelf simulation packages (CSPs). The CSPs did not incorporate built-in support for desktop grids, and therefore the authors implemented a middleware for desktop grid computing, called WinGrid, and interfaced it with the CSP. The performance results show that WinGrid can speed-up the execution of CSP-based Monte Carlo simulations. However, since WinGrid was installed on non-dedicated PCs, the speed-up achieved varied according to users’ PC usage. Finally, the paper presents some lessons learnt from this case study. It is expected that this paper will encourage simulation practitioners and CSP vendors to experiment with desktop grid computing technologies with the objective of speeding-up simulation experimentation

Brunel University Research Archive

A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing

Author: Buyya Rajkumar
Ramamohanarao Kotagiri
Venugopal Srikumar
Publication venue
Publication date: 10/06/2005
Field of study

Data Grids have been adopted as the platform for scientific communities that need to share, access, transport, process and manage large data collections distributed worldwide. They combine high-end computing technologies with high-performance networking and wide-area storage management techniques. In this paper, we discuss the key concepts behind Data Grids and compare them with other data sharing and distribution paradigms such as content delivery networks, peer-to-peer networks and distributed databases. We then provide comprehensive taxonomies that cover various aspects of architecture, data transportation, data replication and resource allocation and scheduling. Finally, we map the proposed taxonomy to various Data Grid systems not only to validate the taxonomy but also to identify areas for future exploration. Through this taxonomy, we aim to categorise existing systems to better understand their goals and their methodology. This would help evaluate their applicability for solving similar problems. This taxonomy also provides a "gap analysis" of this area through which researchers can potentially identify new issues for investigation. Finally, we hope that the proposed taxonomy and mapping also helps to provide an easy way for new practitioners to understand this complex area of research.Comment: 46 pages, 16 figures, Technical Repor

arXiv.org e-Print Archive

CiteSeerX

University of Melbourne Institutional Repository