66,102 research outputs found

    Impliance: A Next Generation Information Management Appliance

    Full text link
    ably successful in building a large market and adapting to the changes of the last three decades, its impact on the broader market of information management is surprisingly limited. If we were to design an information management system from scratch, based upon today's requirements and hardware capabilities, would it look anything like today's database systems?" In this paper, we introduce Impliance, a next-generation information management system consisting of hardware and software components integrated to form an easy-to-administer appliance that can store, retrieve, and analyze all types of structured, semi-structured, and unstructured information. We first summarize the trends that will shape information management for the foreseeable future. Those trends imply three major requirements for Impliance: (1) to be able to store, manage, and uniformly query all data, not just structured records; (2) to be able to scale out as the volume of this data grows; and (3) to be simple and robust in operation. We then describe four key ideas that are uniquely combined in Impliance to address these requirements, namely the ideas of: (a) integrating software and off-the-shelf hardware into a generic information appliance; (b) automatically discovering, organizing, and managing all data - unstructured as well as structured - in a uniform way; (c) achieving scale-out by exploiting simple, massive parallel processing, and (d) virtualizing compute and storage resources to unify, simplify, and streamline the management of Impliance. Impliance is an ambitious, long-term effort to define simpler, more robust, and more scalable information systems for tomorrow's enterprises.Comment: This article is published under a Creative Commons License Agreement (http://creativecommons.org/licenses/by/2.5/.) You may copy, distribute, display, and perform the work, make derivative works and make commercial use of the work, but, you must attribute the work to the author and CIDR 2007. 3rd Biennial Conference on Innovative Data Systems Research (CIDR) January 710, 2007, Asilomar, California, US

    Exploiting path parallelism in logic programming

    Get PDF
    This paper presents a novel parallel implementation of Prolog. The system is based on Multipath, a novel execution model for Prolog that implements a partial breadth-first search of the SLD-tree. The paper focusses on the type of parallelism inherent to the execution model, which is called path parallelism. This is a particular case of data parallelism that can be efficiently exploited in a SPMD architecture. A SPMD architecture oriented to the Multipath execution model is presented. A simulator of such system has been developed and used to assess the performance of path parallelism. Performance figures show that path parallelism is effective for non-deterministic programs.Peer ReviewedPostprint (published version

    A partial breadth-first execution model for prolog

    Get PDF
    MEM (Multipath Execution Model) is a novel model for the execution of Prolog programs which combines a depth-first and breadth-first exploration of the search tree. The breadth-first search allows more than one path of the SLD-tree to be explored at the same time. In this way, the computational cost of traversing the whole search tree associated to a program can be decreased because the MEM model reduces the overhead due to the execution of control instructions and also diminishes the number of unifications to be performed. This paper focuses on the description of the MEM model and its sequential implementation. Moreover, the MEM execution model can be implemented in order to exploit a new kind of parallelism, called path parallelism, which allows the parallel execution of unify operations related to simultaneously traversed pathsPeer ReviewedPostprint (published version

    Scientific Computing Meets Big Data Technology: An Astronomy Use Case

    Full text link
    Scientific analyses commonly compose multiple single-process programs into a dataflow. An end-to-end dataflow of single-process programs is known as a many-task application. Typically, tools from the HPC software stack are used to parallelize these analyses. In this work, we investigate an alternate approach that uses Apache Spark -- a modern big data platform -- to parallelize many-task applications. We present Kira, a flexible and distributed astronomy image processing toolkit using Apache Spark. We then use the Kira toolkit to implement a Source Extractor application for astronomy images, called Kira SE. With Kira SE as the use case, we study the programming flexibility, dataflow richness, scheduling capacity and performance of Apache Spark running on the EC2 cloud. By exploiting data locality, Kira SE achieves a 2.5x speedup over an equivalent C program when analyzing a 1TB dataset using 512 cores on the Amazon EC2 cloud. Furthermore, we show that by leveraging software originally designed for big data infrastructure, Kira SE achieves competitive performance to the C implementation running on the NERSC Edison supercomputer. Our experience with Kira indicates that emerging Big Data platforms such as Apache Spark are a performant alternative for many-task scientific applications
    • 

    corecore