877 research outputs found

    Distributed Formal Concept Analysis Algorithms Based on an Iterative MapReduce Framework

    Get PDF
    While many existing formal concept analysis algorithms are efficient, they are typically unsuitable for distributed implementation. Taking the MapReduce (MR) framework as our inspiration we introduce a distributed approach for performing formal concept mining. Our method has its novelty in that we use a light-weight MapReduce runtime called Twister which is better suited to iterative algorithms than recent distributed approaches. First, we describe the theoretical foundations underpinning our distributed formal concept analysis approach. Second, we provide a representative exemplar of how a classic centralized algorithm can be implemented in a distributed fashion using our methodology: we modify Ganter's classic algorithm by introducing a family of MR* algorithms, namely MRGanter and MRGanter+ where the prefix denotes the algorithm's lineage. To evaluate the factors that impact distributed algorithm performance, we compare our MR* algorithms with the state-of-the-art. Experiments conducted on real datasets demonstrate that MRGanter+ is efficient, scalable and an appealing algorithm for distributed problems.Comment: 17 pages, ICFCA 201, Formal Concept Analysis 201

    Scather: programming with multi-party computation and MapReduce

    Full text link
    We present a prototype of a distributed computational infrastructure, an associated high level programming language, and an underlying formal framework that allow multiple parties to leverage their own cloud-based computational resources (capable of supporting MapReduce [27] operations) in concert with multi-party computation (MPC) to execute statistical analysis algorithms that have privacy-preserving properties. Our architecture allows a data analyst unfamiliar with MPC to: (1) author an analysis algorithm that is agnostic with regard to data privacy policies, (2) to use an automated process to derive algorithm implementation variants that have different privacy and performance properties, and (3) to compile those implementation variants so that they can be deployed on an infrastructures that allows computations to take place locally within each participant’s MapReduce cluster as well as across all the participants’ clusters using an MPC protocol. We describe implementation details of the architecture, discuss and demonstrate how the formal framework enables the exploration of tradeoffs between the efficiency and privacy properties of an analysis algorithm, and present two example applications that illustrate how such an infrastructure can be utilized in practice.This work was supported in part by NSF Grants: #1430145, #1414119, #1347522, and #1012798

    CPL: A Core Language for Cloud Computing -- Technical Report

    Full text link
    Running distributed applications in the cloud involves deployment. That is, distribution and configuration of application services and middleware infrastructure. The considerable complexity of these tasks resulted in the emergence of declarative JSON-based domain-specific deployment languages to develop deployment programs. However, existing deployment programs unsafely compose artifacts written in different languages, leading to bugs that are hard to detect before run time. Furthermore, deployment languages do not provide extension points for custom implementations of existing cloud services such as application-specific load balancing policies. To address these shortcomings, we propose CPL (Cloud Platform Language), a statically-typed core language for programming both distributed applications as well as their deployment on a cloud platform. In CPL, application services and deployment programs interact through statically typed, extensible interfaces, and an application can trigger further deployment at run time. We provide a formal semantics of CPL and demonstrate that it enables type-safe, composable and extensible libraries of service combinators, such as load balancing and fault tolerance.Comment: Technical report accompanying the MODULARITY '16 submissio

    Big Data Refinement

    Get PDF
    "Big data" has become a major area of research and associated funding, as well as a focus of utopian thinking. In the still growing research community, one of the favourite optimistic analogies for data processing is that of the oil refinery, extracting the essence out of the raw data. Pessimists look for their imagery to the other end of the petrol cycle, and talk about the "data exhausts" of our society. Obviously, the refinement community knows how to do "refining". This paper explores the extent to which notions of refinement and data in the formal methods community relate to the core concepts in "big data". In particular, can the data refinement paradigm can be used to explain aspects of big data processing

    A Formal, Resource Consumption-Preserving Translation of Actors to Haskell

    Get PDF
    We present a formal translation of an actor-based language with cooperative scheduling to the functional language Haskell. The translation is proven correct with respect to a formal semantics of the source language and a high-level operational semantics of the target, i.e. a subset of Haskell. The main correctness theorem is expressed in terms of a simulation relation between the operational semantics of actor programs and their translation. This allows us to then prove that the resource consumption is preserved over this translation, as we establish an equivalence of the cost of the original and Haskell-translated execution traces.Comment: Pre-proceedings paper presented at the 26th International Symposium on Logic-Based Program Synthesis and Transformation (LOPSTR 2016), Edinburgh, Scotland UK, 6-8 September 2016 (arXiv:1608.02534

    A survey of large-scale reasoning on the Web of data

    Get PDF
    As more and more data is being generated by sensor networks, social media and organizations, the Webinterlinking this wealth of information becomes more complex. This is particularly true for the so-calledWeb of Data, in which data is semantically enriched and interlinked using ontologies. In this large anduncoordinated environment, reasoning can be used to check the consistency of the data and of asso-ciated ontologies, or to infer logical consequences which, in turn, can be used to obtain new insightsfrom the data. However, reasoning approaches need to be scalable in order to enable reasoning over theentire Web of Data. To address this problem, several high-performance reasoning systems, whichmainly implement distributed or parallel algorithms, have been proposed in the last few years. Thesesystems differ significantly; for instance in terms of reasoning expressivity, computational propertiessuch as completeness, or reasoning objectives. In order to provide afirst complete overview of thefield,this paper reports a systematic review of such scalable reasoning approaches over various ontologicallanguages, reporting details about the methods and over the conducted experiments. We highlight theshortcomings of these approaches and discuss some of the open problems related to performing scalablereasoning

    Proceedings of the 4th DIKU-IST Joint Workshop on the Foundations of Software

    Get PDF
    • …
    corecore