27,840 research outputs found

    Computational Aspects of Asynchronous CA

    Full text link
    This work studies some aspects of the computational power of fully asynchronous cellular automata (ACA). We deal with some notions of simulation between ACA and Turing Machines. In particular, we characterize the updating sequences specifying which are "universal", i.e., allowing a (specific family of) ACA to simulate any TM on any input. We also consider the computational cost of such simulations

    Intrinsic universality and the computational power of self-assembly

    Full text link
    This short survey of recent work in tile self-assembly discusses the use of simulation to classify and separate the computational and expressive power of self-assembly models. The journey begins with the result that there is a single universal tile set that, with proper initialization and scaling, simulates any tile assembly system. This universal tile set exhibits something stronger than Turing universality: it captures the geometry and dynamics of any simulated system. From there we find that there is no such tile set in the noncooperative, or temperature 1, model, proving it weaker than the full tile assembly model. In the two-handed or hierarchal model, where large assemblies can bind together on one step, we encounter an infinite set, of infinite hierarchies, each with strictly increasing simulation power. Towards the end of our trip, we find one tile to rule them all: a single rotatable flipable polygonal tile that can simulate any tile assembly system. It seems this could be the beginning of a much longer journey, so directions for future work are suggested.Comment: In Proceedings MCU 2013, arXiv:1309.104

    Hybrid-parallel sparse matrix-vector multiplication with explicit communication overlap on current multicore-based systems

    Full text link
    We evaluate optimized parallel sparse matrix-vector operations for several representative application areas on widespread multicore-based cluster configurations. First the single-socket baseline performance is analyzed and modeled with respect to basic architectural properties of standard multicore chips. Beyond the single node, the performance of parallel sparse matrix-vector operations is often limited by communication overhead. Starting from the observation that nonblocking MPI is not able to hide communication cost using standard MPI implementations, we demonstrate that explicit overlap of communication and computation can be achieved by using a dedicated communication thread, which may run on a virtual core. Moreover we identify performance benefits of hybrid MPI/OpenMP programming due to improved load balancing even without explicit communication overlap. We compare performance results for pure MPI, the widely used "vector-like" hybrid programming strategies, and explicit overlap on a modern multicore-based cluster and a Cray XE6 system.Comment: 16 pages, 10 figure

    Parallel sparse matrix-vector multiplication as a test case for hybrid MPI+OpenMP programming

    Full text link
    We evaluate optimized parallel sparse matrix-vector operations for two representative application areas on widespread multicore-based cluster configurations. First the single-socket baseline performance is analyzed and modeled with respect to basic architectural properties of standard multicore chips. Going beyond the single node, parallel sparse matrix-vector operations often suffer from an unfavorable communication to computation ratio. Starting from the observation that nonblocking MPI is not able to hide communication cost using standard MPI implementations, we demonstrate that explicit overlap of communication and computation can be achieved by using a dedicated communication thread, which may run on a virtual core. We compare our approach to pure MPI and the widely used "vector-like" hybrid programming strategy.Comment: 12 pages, 6 figure

    Improving the scalability of parallel N-body applications with an event driven constraint based execution model

    Full text link
    The scalability and efficiency of graph applications are significantly constrained by conventional systems and their supporting programming models. Technology trends like multicore, manycore, and heterogeneous system architectures are introducing further challenges and possibilities for emerging application domains such as graph applications. This paper explores the space of effective parallel execution of ephemeral graphs that are dynamically generated using the Barnes-Hut algorithm to exemplify dynamic workloads. The workloads are expressed using the semantics of an Exascale computing execution model called ParalleX. For comparison, results using conventional execution model semantics are also presented. We find improved load balancing during runtime and automatic parallelism discovery improving efficiency using the advanced semantics for Exascale computing.Comment: 11 figure

    Discrete and fuzzy dynamical genetic programming in the XCSF learning classifier system

    Full text link
    A number of representation schemes have been presented for use within learning classifier systems, ranging from binary encodings to neural networks. This paper presents results from an investigation into using discrete and fuzzy dynamical system representations within the XCSF learning classifier system. In particular, asynchronous random Boolean networks are used to represent the traditional condition-action production system rules in the discrete case and asynchronous fuzzy logic networks in the continuous-valued case. It is shown possible to use self-adaptive, open-ended evolution to design an ensemble of such dynamical systems within XCSF to solve a number of well-known test problems

    Efficient Parallel Algorithm for Statistical Ion Track Simulations in Crystalline Materials

    Full text link
    We present an efficient parallel algorithm for statistical Molecular Dynamics simulations of ion tracks in solids. The method is based on the Rare Event Enhanced Domain following Molecular Dynamics (REED-MD) algorithm, which has been successfully applied to studies of, e.g., ion implantation into crystalline semiconductor wafers. We discuss the strategies for parallelizing the method, and we settle on a host-client type polling scheme in which a multiple of asynchronous processors are continuously fed to the host, which, in turn, distributes the resulting feed-back information to the clients. This real-time feed-back consists of, e.g., cumulative damage information or statistics updates necessary for the cloning in the rare event algorithm. We finally demonstrate the algorithm for radiation effects in a nuclear oxide fuel, and we show the balanced parallel approach with high parallel efficiency in multiple processor configurations.Comment: 17 pages, seven figures, four table
    • …
    corecore