21,687 research outputs found

    Parallel Tempering Simulation of the three-dimensional Edwards-Anderson Model with Compact Asynchronous Multispin Coding on GPU

    Get PDF
    Monte Carlo simulations of the Ising model play an important role in the field of computational statistical physics, and they have revealed many properties of the model over the past few decades. However, the effect of frustration due to random disorder, in particular the possible spin glass phase, remains a crucial but poorly understood problem. One of the obstacles in the Monte Carlo simulation of random frustrated systems is their long relaxation time making an efficient parallel implementation on state-of-the-art computation platforms highly desirable. The Graphics Processing Unit (GPU) is such a platform that provides an opportunity to significantly enhance the computational performance and thus gain new insight into this problem. In this paper, we present optimization and tuning approaches for the CUDA implementation of the spin glass simulation on GPUs. We discuss the integration of various design alternatives, such as GPU kernel construction with minimal communication, memory tiling, and look-up tables. We present a binary data format, Compact Asynchronous Multispin Coding (CAMSC), which provides an additional 28.4%28.4\% speedup compared with the traditionally used Asynchronous Multispin Coding (AMSC). Our overall design sustains a performance of 33.5 picoseconds per spin flip attempt for simulating the three-dimensional Edwards-Anderson model with parallel tempering, which significantly improves the performance over existing GPU implementations.Comment: 15 pages, 18 figure

    Multi-core computation of transfer matrices for strip lattices in the Potts model

    Full text link
    The transfer-matrix technique is a convenient way for studying strip lattices in the Potts model since the compu- tational costs depend just on the periodic part of the lattice and not on the whole. However, even when the cost is reduced, the transfer-matrix technique is still an NP-hard problem since the time T(|V|, |E|) needed to compute the matrix grows ex- ponentially as a function of the graph width. In this work, we present a parallel transfer-matrix implementation that scales performance under multi-core architectures. The construction of the matrix is based on several repetitions of the deletion- contraction technique, allowing parallelism suitable to multi-core machines. Our experimental results show that the multi-core implementation achieves speedups of 3.7X with p = 4 processors and 5.7X with p = 8. The efficiency of the implementation lies between 60% and 95%, achieving the best balance of speedup and efficiency at p = 4 processors for actual multi-core architectures. The algorithm also takes advantage of the lattice symmetry, making the transfer matrix computation to run up to 2X faster than its non-symmetric counterpart and use up to a quarter of the original space

    An adaptive prefix-assignment technique for symmetry reduction

    Full text link
    This paper presents a technique for symmetry reduction that adaptively assigns a prefix of variables in a system of constraints so that the generated prefix-assignments are pairwise nonisomorphic under the action of the symmetry group of the system. The technique is based on McKay's canonical extension framework [J.~Algorithms 26 (1998), no.~2, 306--324]. Among key features of the technique are (i) adaptability---the prefix sequence can be user-prescribed and truncated for compatibility with the group of symmetries; (ii) parallelizability---prefix-assignments can be processed in parallel independently of each other; (iii) versatility---the method is applicable whenever the group of symmetries can be concisely represented as the automorphism group of a vertex-colored graph; and (iv) implementability---the method can be implemented relying on a canonical labeling map for vertex-colored graphs as the only nontrivial subroutine. To demonstrate the practical applicability of our technique, we have prepared an experimental open-source implementation of the technique and carry out a set of experiments that demonstrate ability to reduce symmetry on hard instances. Furthermore, we demonstrate that the implementation effectively parallelizes to compute clusters with multiple nodes via a message-passing interface.Comment: Updated manuscript submitted for revie

    Distributed Symmetry Breaking in Hypergraphs

    Full text link
    Fundamental local symmetry breaking problems such as Maximal Independent Set (MIS) and coloring have been recognized as important by the community, and studied extensively in (standard) graphs. In particular, fast (i.e., logarithmic run time) randomized algorithms are well-established for MIS and Δ+1\Delta +1-coloring in both the LOCAL and CONGEST distributed computing models. On the other hand, comparatively much less is known on the complexity of distributed symmetry breaking in {\em hypergraphs}. In particular, a key question is whether a fast (randomized) algorithm for MIS exists for hypergraphs. In this paper, we study the distributed complexity of symmetry breaking in hypergraphs by presenting distributed randomized algorithms for a variety of fundamental problems under a natural distributed computing model for hypergraphs. We first show that MIS in hypergraphs (of arbitrary dimension) can be solved in O(log⁥2n)O(\log^2 n) rounds (nn is the number of nodes of the hypergraph) in the LOCAL model. We then present a key result of this paper --- an O(Δϔpolylog(n))O(\Delta^{\epsilon}\text{polylog}(n))-round hypergraph MIS algorithm in the CONGEST model where Δ\Delta is the maximum node degree of the hypergraph and Ï”>0\epsilon > 0 is any arbitrarily small constant. To demonstrate the usefulness of hypergraph MIS, we present applications of our hypergraph algorithm to solving problems in (standard) graphs. In particular, the hypergraph MIS yields fast distributed algorithms for the {\em balanced minimal dominating set} problem (left open in Harris et al. [ICALP 2013]) and the {\em minimal connected dominating set problem}. We also present distributed algorithms for coloring, maximal matching, and maximal clique in hypergraphs.Comment: Changes from the previous version: More references adde

    Optimal Collision/Conflict-free Distance-2 Coloring in Synchronous Broadcast/Receive Tree Networks

    Get PDF
    This article is on message-passing systems where communication is (a) synchronous and (b) based on the "broadcast/receive" pair of communication operations. "Synchronous" means that time is discrete and appears as a sequence of time slots (or rounds) such that each message is received in the very same round in which it is sent. "Broadcast/receive" means that during a round a process can either broadcast a message to its neighbors or receive a message from one of them. In such a communication model, no two neighbors of the same process, nor a process and any of its neighbors, must be allowed to broadcast during the same time slot (thereby preventing message collisions in the first case, and message conflicts in the second case). From a graph theory point of view, the allocation of slots to processes is know as the distance-2 coloring problem: a color must be associated with each process (defining the time slots in which it will be allowed to broadcast) in such a way that any two processes at distance at most 2 obtain different colors, while the total number of colors is "as small as possible". The paper presents a parallel message-passing distance-2 coloring algorithm suited to trees, whose roots are dynamically defined. This algorithm, which is itself collision-free and conflict-free, uses Δ+1\Delta + 1 colors where Δ\Delta is the maximal degree of the graph (hence the algorithm is color-optimal). It does not require all processes to have different initial identities, and its time complexity is O(dΔ)O(d \Delta), where d is the depth of the tree. As far as we know, this is the first distributed distance-2 coloring algorithm designed for the broadcast/receive round-based communication model, which owns all the previous properties.Comment: 19 pages including one appendix. One Figur
    • 

    corecore