202 research outputs found

    Big Data Refinement

    Get PDF
    "Big data" has become a major area of research and associated funding, as well as a focus of utopian thinking. In the still growing research community, one of the favourite optimistic analogies for data processing is that of the oil refinery, extracting the essence out of the raw data. Pessimists look for their imagery to the other end of the petrol cycle, and talk about the "data exhausts" of our society. Obviously, the refinement community knows how to do "refining". This paper explores the extent to which notions of refinement and data in the formal methods community relate to the core concepts in "big data". In particular, can the data refinement paradigm can be used to explain aspects of big data processing

    Real-Time MapReduce Scheduling

    Get PDF
    In this paper, we explore the feasibility of enabling the scheduling of mixed hard and soft real-time MapReduce applications. We first present an experimental evaluation of the popular Hadoop MapReduce middleware on the Amazon EC2 cloud. Our evaluation reveals tradeoffs between overall system throughput and execution time predictability, as well as highlights a number of factors affecting real-time scheduling, such as data placement, concurrent users, and master scheduling overhead. Based on our evaluation study, we present a formal model for capturing real-time MapReduce applications and the Hadoop platform. Using this model, we formulate the offline scheduling of real-time MapReduce jobs on a heterogeneous distributed Hadoop architecture as a constraint satisfaction problem (CSP) and introduce various search strategies for the formulation. We propose an enhancement of MapReduce’s execution model and a range of heuristic techniques for the online scheduling. We further outline some of our future directions that apply state-of-the-art techniques in the real-time scheduling literature

    Pabble: parameterised Scribble

    Get PDF
    © 2014, The Author(s).Many parallel and distributed message-passing programs are written in a parametric way over available resources, in particular the number of nodes and their topologies, so that a single parallel program can scale over different environments. This article presents a parameterised protocol description language, Pabble, which can guarantee safety and progress in a large class of practical, complex parameterised message-passing programs through static checking. Pabble can describe an overall interaction topology, using a concise and expressive notation, designed for a variable number of participants arranged in multiple dimensions. These parameterised protocols in turn automatically generate local protocols for type checking parameterised MPI programs for communication safety and deadlock freedom. In spite of undecidability of endpoint projection and type checking in the underlying parameterised session type theory, our method guarantees the termination of end point projection and type checking

    Constraint Satisfaction Problems in Hadoop MapReduce

    Get PDF
    Σκοπός της παρούσας πτυχιακής εργασίας είναι να εξετάσουμε την αποτελεσματικότητα των αλγορίθμων ένωσης πινάκων μέσω της τεχνολογίας Hadoop Map Reduce για την επίλυση Προβλημάτων Ικανοποιήσης Περιορισμών (ΠΙΠ). Ξεκινάμε παρουσιάζοντας το πλαίσιο Map Reduce και συνεχίζουμε συνοψίζοντας τα ΠΙΠ. Εκμεταλλευόμαστε το γεγονός ότι τα ΠΙΠ υπερκαλύπτονται από τις τεχνικές βάσεων δεδομένων, όπως η μοντελοποίηση ενός ΠΙΠ ως ένα σχήμα βάσης δεδομένων. Περιγράφουμε μερικούς από τους ήδη υπάρχοντες αλγόριθμους ένωσης πινάκων και χρησιμοποιούμε το προαναφερθέν σχήμα ως είσοδο σε αυτούς. Τροποποιούμε τους αλγορίθμους αυτούς ώστε να υποστηρίζουν τα ΠΙΠ ως σχήματα βάσεων δεδομένων. Τέλος, πραγματοποιούμε μια σειρά από πειράματα και συμπεραίνουμε ότι δεν είναι ιδανικό να χρησιμοποιήσουμε το πλαίσιο Map Reduce για επίλυση ΠΙΠ.In this thesis we examine the effectiveness of using Hadoop MapReduce join algorithms to solve Constraint Satisfaction Problems. We start by presenting the Map Reduce framework and continue by making a brief summary of the CSPs. We take advantage of the fact that CSPs and database techniques overlap, by modeling a CSP as a database schema. We describe some of the join algorithms and then use the aforementioned schema as input to them. Some modification and preprocessing is done to these algorithms to support the specifications of CSPs as joins. We finally use them to conduct a set of experiments and conclude that it is not effective to use the map reduce framework for CSPs. One suggestion is to remake the experiments on a more suitable environment, i.e. a better cluster, because the one that was used is proven to be inefficient
    corecore