4,008 research outputs found

    Introducing Molly: Distributed Memory Parallelization with LLVM

    Get PDF
    Programming for distributed memory machines has always been a tedious task, but necessary because compilers have not been sufficiently able to optimize for such machines themselves. Molly is an extension to the LLVM compiler toolchain that is able to distribute and reorganize workload and data if the program is organized in statically determined loop control-flows. These are represented as polyhedral integer-point sets that allow program transformations applied on them. Memory distribution and layout can be declared by the programmer as needed and the necessary asynchronous MPI communication is generated automatically. The primary motivation is to run Lattice QCD simulations on IBM Blue Gene/Q supercomputers, but since the implementation is not yet completed, this paper shows the capabilities on Conway's Game of Life

    Custom Integrated Circuits

    Get PDF
    Contains reports on ten research projects.Analog Devices, Inc.IBM CorporationNational Science Foundation/Defense Advanced Research Projects Agency Grant MIP 88-14612Analog Devices Career Development Assistant ProfessorshipU.S. Navy - Office of Naval Research Contract N0014-87-K-0825AT&TDigital Equipment CorporationNational Science Foundation Grant MIP 88-5876

    Redundancy management for efficient fault recovery in NASA's distributed computing system

    Get PDF
    The management of redundancy in computer systems was studied and guidelines were provided for the development of NASA's fault-tolerant distributed systems. Fault recovery and reconfiguration mechanisms were examined. A theoretical foundation was laid for redundancy management by efficient reconfiguration methods and algorithmic diversity. Algorithms were developed to optimize the resources for embedding of computational graphs of tasks in the system architecture and reconfiguration of these tasks after a failure has occurred. The computational structure represented by a path and the complete binary tree was considered and the mesh and hypercube architectures were targeted for their embeddings. The innovative concept of Hybrid Algorithm Technique was introduced. This new technique provides a mechanism for obtaining fault tolerance while exhibiting improved performance

    The Homeostasis Protocol: Avoiding Transaction Coordination Through Program Analysis

    Get PDF
    Datastores today rely on distribution and replication to achieve improved performance and fault-tolerance. But correctness of many applications depends on strong consistency properties - something that can impose substantial overheads, since it requires coordinating the behavior of multiple nodes. This paper describes a new approach to achieving strong consistency in distributed systems while minimizing communication between nodes. The key insight is to allow the state of the system to be inconsistent during execution, as long as this inconsistency is bounded and does not affect transaction correctness. In contrast to previous work, our approach uses program analysis to extract semantic information about permissible levels of inconsistency and is fully automated. We then employ a novel homeostasis protocol to allow sites to operate independently, without communicating, as long as any inconsistency is governed by appropriate treaties between the nodes. We discuss mechanisms for optimizing treaties based on workload characteristics to minimize communication, as well as a prototype implementation and experiments that demonstrate the benefits of our approach on common transactional benchmarks

    Parallelization of Reconstructability Analysis Algorithms.

    Get PDF
    Bush Jones published a series of papers providing sequential algorithms that are key to reconstructability analysis. These algorithms include the determination of unbiased reconstructions and a greedy algorithm for a generalization of the reconstruction problem. The implementation of these sequential algorithms provide scientists and mathematicians with the means of utilizing reconstructability analysis in systems modeling. The algorithms, however, are so computationally intensive that the system is limited to a very small set of variables. Many papers have been written applying reconstructability analysis and maximum entropy methods to various disciplines. Reconstructability analysis has the potential of dramatically impacting the scientific community, but the sequential algorithms leave the utilization of reconstructability analysis infeasible. The author has parallelized the reconstructability analysis algorithms developed by Jones, thereby, bridging the gap between theoretical application and feasible implementation. Since the goal of parallelization of these reconstructability analysis algorithms is to make them feasible to as many researchers as possible, a specific architecture is not assumed. It is assumed that the architecture employed is a multiple data architecture. That is, the architectural design needed for the implementation of these algorithms must have memory local to each processing element (PE). The parallel algorithms developed and presented here do not address the problems of communications between processors of particular architectures. These algorithms assume a reconfigurable bus system which is a bus system whose configuration can be dynamically altered thus allowing broadcasting and long-distance communications to be completed in constant time. It is noted that processor arrays with such reconfigurable bus systems have been designed. Frequently, parallel algorithms do not address the situation in which the number of values on which to operate is larger than the number of processors. However, since the purpose of the parallelization of these reconstructability analysis algorithms is to make them feasible for large structure systems, the parallelization given does address the situation in which the number of values on which to operate is larger than the number of processors available. Therefore, implementation of the algorithms involves simply incorporating the communication protocols between processors for the particular architecture employed
    corecore