Search CORE

202 research outputs found

Big Data Refinement

Author: Boiten
Boiten
Crépeau
Derrick
Eerke A. Boiten
Eerke Boiten
John Derrick
Kusakabe
McIver
Ono
Pasquale
Reddy
Steve Reeves
Su
Ying
Zuiderveen Borgesius
Publication venue: 'Open Publishing Association'
Publication date: 01/06/2016
Field of study

"Big data" has become a major area of research and associated funding, as well as a focus of utopian thinking. In the still growing research community, one of the favourite optimistic analogies for data processing is that of the oil refinery, extracting the essence out of the raw data. Pessimists look for their imagery to the other end of the petrol cycle, and talk about the "data exhausts" of our society. Obviously, the refinement community knows how to do "refining". This paper explores the extent to which notions of refinement and data in the formal methods community relate to the core concepts in "big data". In particular, can the data refinement paradigm can be used to explain aspects of big data processing

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

Kent Academic Repository

Real-Time MapReduce Scheduling

Author: Lee Insup
Loo Boon Thau
Phan Linh T.X.
Zhang Zhuoyao
Publication venue: ScholarlyCommons
Publication date: 01/01/2010
Field of study

In this paper, we explore the feasibility of enabling the scheduling of mixed hard and soft real-time MapReduce applications. We first present an experimental evaluation of the popular Hadoop MapReduce middleware on the Amazon EC2 cloud. Our evaluation reveals tradeoffs between overall system throughput and execution time predictability, as well as highlights a number of factors affecting real-time scheduling, such as data placement, concurrent users, and master scheduling overhead. Based on our evaluation study, we present a formal model for capturing real-time MapReduce applications and the Hadoop platform. Using this model, we formulate the offline scheduling of real-time MapReduce jobs on a heterogeneous distributed Hadoop architecture as a constraint satisfaction problem (CSP) and introduce various search strategies for the formulation. We propose an enhancement of MapReduce’s execution model and a range of heuristic techniques for the online scheduling. We further outline some of our future directions that apply state-of-the-art techniques in the real-time scheduling literature

ScholarlyCommons@Penn

Pabble: parameterised Scribble

Author: Ng N
Yoshida N
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 20/12/2014
Field of study

© 2014, The Author(s).Many parallel and distributed message-passing programs are written in a parametric way over available resources, in particular the number of nodes and their topologies, so that a single parallel program can scale over different environments. This article presents a parameterised protocol description language, Pabble, which can guarantee safety and progress in a large class of practical, complex parameterised message-passing programs through static checking. Pabble can describe an overall interaction topology, using a concise and expressive notation, designed for a variable number of participants arranged in multiple dimensions. These parameterised protocols in turn automatically generate local protocols for type checking parameterised MPI programs for communication safety and deadlock freedom. In spite of undecidability of endpoint projection and type checking in the underlying parameterised session type theory, our method guarantees the termination of end point projection and type checking

Spiral - Imperial College Digital Repository

Constraint Satisfaction Problems in Hadoop MapReduce

Author: Ntoulias Emmanouil
Ντούλιας Εμμανουήλ
Publication venue
Publication date: 01/01/2016
Field of study

Σκοπός της παρούσας πτυχιακής εργασίας είναι να εξετάσουμε την αποτελεσματικότητα των αλγορίθμων ένωσης πινάκων μέσω της τεχνολογίας Hadoop Map Reduce για την επίλυση Προβλημάτων Ικανοποιήσης Περιορισμών (ΠΙΠ). Ξεκινάμε παρουσιάζοντας το πλαίσιο Map Reduce και συνεχίζουμε συνοψίζοντας τα ΠΙΠ. Εκμεταλλευόμαστε το γεγονός ότι τα ΠΙΠ υπερκαλύπτονται από τις τεχνικές βάσεων δεδομένων, όπως η μοντελοποίηση ενός ΠΙΠ ως ένα σχήμα βάσης δεδομένων. Περιγράφουμε μερικούς από τους ήδη υπάρχοντες αλγόριθμους ένωσης πινάκων και χρησιμοποιούμε το προαναφερθέν σχήμα ως είσοδο σε αυτούς. Τροποποιούμε τους αλγορίθμους αυτούς ώστε να υποστηρίζουν τα ΠΙΠ ως σχήματα βάσεων δεδομένων. Τέλος, πραγματοποιούμε μια σειρά από πειράματα και συμπεραίνουμε ότι δεν είναι ιδανικό να χρησιμοποιήσουμε το πλαίσιο Map Reduce για επίλυση ΠΙΠ.In this thesis we examine the effectiveness of using Hadoop MapReduce join algorithms to solve Constraint Satisfaction Problems. We start by presenting the Map Reduce framework and continue by making a brief summary of the CSPs. We take advantage of the fact that CSPs and database techniques overlap, by modeling a CSP as a database schema. We describe some of the join algorithms and then use the aforementioned schema as input to them. Some modification and preprocessing is done to these algorithms to support the specifications of CSPs as joins. We finally use them to conduct a set of experiments and conclude that it is not effective to use the map reduce framework for CSPs. One suggestion is to remake the experiments on a more suitable environment, i.e. a better cluster, because the one that was used is proven to be inefficient

Pergamos : Unified Institutional Repository / Digital Library Platform of the National and Kapodistrian University of Athens