Search CORE

4,008 research outputs found

Introducing Molly: Distributed Memory Parallelization with LLVM

Author: Kruse Michael
Publication venue
Publication date: 01/01/2013
Field of study

Programming for distributed memory machines has always been a tedious task, but necessary because compilers have not been sufficiently able to optimize for such machines themselves. Molly is an extension to the LLVM compiler toolchain that is able to distribute and reorganize workload and data if the program is organized in statically determined loop control-flows. These are represented as polyhedral integer-point sets that allow program transformations applied on them. Memory distribution and layout can be declared by the programmer as needed and the necessary asynchronous MPI communication is generated automatically. The primary motivation is to run Lattice QCD simulations on IBM Blue Gene/Q supercomputers, but since the implementation is not yet completed, this paper shows the capabilities on Conway's Game of Life

arXiv.org e-Print Archive

HAL-CentraleSupelec

HAL - Lille 3

INRIA a CCSD electronic archive server

HAL-Rennes 1

Custom Integrated Circuits

Author: Allen Jonathan
Antoniadis Dimitri A.
Armstrong Robert C.
Baltus Donald G.
Bamji Cyrus S.
Chen Curtis S.
Decker Steven J.
Devadas Srinivas
Elfadel Ibrahim M.
Frants Marina
Hakkarainen Juha M.
Horn Berthold K. P.
Keast Craig L.
Kim Songmin
Kukula James H.
Lam Kevin
Lee Hae-Seung
Leeb Steven B.
Lloyd Jennifer A.
Lumsdaine Andrew
McQuirk Ignacio S.
Nabors Keith S.
Phillips Joel R.
Poggio Tomaso
Rahmat Khalid
Reichelt Mark W.
Seidel Mark N.
Shen Amelia H.
Silveira Luis M.
Sodini Charles G.
Standley David L.
Telichevesky Ricardo
Umminger Christopher B.
Van Aelten Filip J.
White Jacob K.
Wyatt John L., Jr.
Yang Woodward
Yu Paul C.
Publication venue: Research Laboratory of Electronics (RLE) at the Massachusetts Institute of Technology (MIT)
Publication date
Field of study

Contains reports on ten research projects.Analog Devices, Inc.IBM CorporationNational Science Foundation/Defense Advanced Research Projects Agency Grant MIP 88-14612Analog Devices Career Development Assistant ProfessorshipU.S. Navy - Office of Naval Research Contract N0014-87-K-0825AT&TDigital Equipment CorporationNational Science Foundation Grant MIP 88-5876

DSpace@MIT

Redundancy management for efficient fault recovery in NASA's distributed computing system

Author: Malek Miroslaw
Pandya Mihir
Yau Kitty
Publication venue
Publication date
Field of study

The management of redundancy in computer systems was studied and guidelines were provided for the development of NASA's fault-tolerant distributed systems. Fault recovery and reconfiguration mechanisms were examined. A theoretical foundation was laid for redundancy management by efficient reconfiguration methods and algorithmic diversity. Algorithms were developed to optimize the resources for embedding of computational graphs of tasks in the system architecture and reconfiguration of these tasks after a failure has occurred. The computational structure represented by a path and the complete binary tree was considered and the mesh and hypercube architectures were targeted for their embeddings. The innovative concept of Hybrid Algorithm Technique was introduced. This new technique provides a mechanism for obtaining fault tolerance while exhibiting improved performance

NASA Technical Reports Server

The Homeostasis Protocol: Avoiding Transaction Coordination Through Program Analysis

Author: Bender Gabriel
Ding Bailu
Foster Nate
Gehrke Johannes
Hojjat Hossein
Koch Christoph
Kot Lucja
Roy Sudip
Publication venue
Publication date: 19/01/2015
Field of study

Datastores today rely on distribution and replication to achieve improved performance and fault-tolerance. But correctness of many applications depends on strong consistency properties - something that can impose substantial overheads, since it requires coordinating the behavior of multiple nodes. This paper describes a new approach to achieving strong consistency in distributed systems while minimizing communication between nodes. The key insight is to allow the state of the system to be inconsistent during execution, as long as this inconsistency is bounded and does not affect transaction correctness. In contrast to previous work, our approach uses program analysis to extract semantic information about permissible levels of inconsistency and is fully automated. We then employ a novel homeostasis protocol to allow sites to operate independently, without communicating, as long as any inconsistency is governed by appropriate treaties between the nodes. We discuss mechanisms for optimizing treaties based on workload characteristics to minimize communication, as well as a prototype implementation and experiments that demonstrate the benefits of our approach on common transactional benchmarks

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

CiteSeerX

Parallelization of Reconstructability Analysis Algorithms.

Author: Iles Patti E
Publication venue: LSU Digital Commons
Publication date: 01/01/1995
Field of study

Bush Jones published a series of papers providing sequential algorithms that are key to reconstructability analysis. These algorithms include the determination of unbiased reconstructions and a greedy algorithm for a generalization of the reconstruction problem. The implementation of these sequential algorithms provide scientists and mathematicians with the means of utilizing reconstructability analysis in systems modeling. The algorithms, however, are so computationally intensive that the system is limited to a very small set of variables. Many papers have been written applying reconstructability analysis and maximum entropy methods to various disciplines. Reconstructability analysis has the potential of dramatically impacting the scientific community, but the sequential algorithms leave the utilization of reconstructability analysis infeasible. The author has parallelized the reconstructability analysis algorithms developed by Jones, thereby, bridging the gap between theoretical application and feasible implementation. Since the goal of parallelization of these reconstructability analysis algorithms is to make them feasible to as many researchers as possible, a specific architecture is not assumed. It is assumed that the architecture employed is a multiple data architecture. That is, the architectural design needed for the implementation of these algorithms must have memory local to each processing element (PE). The parallel algorithms developed and presented here do not address the problems of communications between processors of particular architectures. These algorithms assume a reconfigurable bus system which is a bus system whose configuration can be dynamically altered thus allowing broadcasting and long-distance communications to be completed in constant time. It is noted that processor arrays with such reconfigurable bus systems have been designed. Frequently, parallel algorithms do not address the situation in which the number of values on which to operate is larger than the number of processors. However, since the purpose of the parallelization of these reconstructability analysis algorithms is to make them feasible for large structure systems, the parallelization given does address the situation in which the number of values on which to operate is larger than the number of processors available. Therefore, implementation of the algorithms involves simply incorporating the communication protocols between processors for the particular architecture employed

Louisiana State University

31th International Symposium on Theoretical Aspects of Computer Science: STACS '14, March 5th to March 8th, 2014, Lyon, France

Author: STACS <31 2014, Lyon>
Publication venue: Schloss Dagstuhl - Leibniz-Zentrum für Informatik
Publication date: 01/03/2014
Field of study

Digitale Bibliothek Thüringen

Recommended from our members

An Algorithmic Taxonomy of Production System Machines

Author: Mills Russell C.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/1988
Field of study

This paper presents a survey of computer architectures designed to execute production systems. After a brief description of production systems and production system languages, the paper summarizes match algorithms, particularly the Rete algorithm, and outlines suggested parallelizations. Most parallel production system algorithms have as their unit of sequential computation a single production's left-hand side, activations of a single Rete node, a single activation of a Rete node, or a single comparison in a Rete node. The paper discusses a number of proposed production system machine architectures in terms of the parallel and sequential computations performed in the algorithms suggested for each machine. A taxonomy of parallel production system algorithms, describing in detail the distribution and replication of data and computations, concludes the paper

Columbia University Academic Commons