21,687 research outputs found
Parallel Tempering Simulation of the three-dimensional Edwards-Anderson Model with Compact Asynchronous Multispin Coding on GPU
Monte Carlo simulations of the Ising model play an important role in the
field of computational statistical physics, and they have revealed many
properties of the model over the past few decades. However, the effect of
frustration due to random disorder, in particular the possible spin glass
phase, remains a crucial but poorly understood problem. One of the obstacles in
the Monte Carlo simulation of random frustrated systems is their long
relaxation time making an efficient parallel implementation on state-of-the-art
computation platforms highly desirable. The Graphics Processing Unit (GPU) is
such a platform that provides an opportunity to significantly enhance the
computational performance and thus gain new insight into this problem. In this
paper, we present optimization and tuning approaches for the CUDA
implementation of the spin glass simulation on GPUs. We discuss the integration
of various design alternatives, such as GPU kernel construction with minimal
communication, memory tiling, and look-up tables. We present a binary data
format, Compact Asynchronous Multispin Coding (CAMSC), which provides an
additional speedup compared with the traditionally used Asynchronous
Multispin Coding (AMSC). Our overall design sustains a performance of 33.5
picoseconds per spin flip attempt for simulating the three-dimensional
Edwards-Anderson model with parallel tempering, which significantly improves
the performance over existing GPU implementations.Comment: 15 pages, 18 figure
Multi-core computation of transfer matrices for strip lattices in the Potts model
The transfer-matrix technique is a convenient way for studying strip lattices
in the Potts model since the compu- tational costs depend just on the periodic
part of the lattice and not on the whole. However, even when the cost is
reduced, the transfer-matrix technique is still an NP-hard problem since the
time T(|V|, |E|) needed to compute the matrix grows ex- ponentially as a
function of the graph width. In this work, we present a parallel
transfer-matrix implementation that scales performance under multi-core
architectures. The construction of the matrix is based on several repetitions
of the deletion- contraction technique, allowing parallelism suitable to
multi-core machines. Our experimental results show that the multi-core
implementation achieves speedups of 3.7X with p = 4 processors and 5.7X with p
= 8. The efficiency of the implementation lies between 60% and 95%, achieving
the best balance of speedup and efficiency at p = 4 processors for actual
multi-core architectures. The algorithm also takes advantage of the lattice
symmetry, making the transfer matrix computation to run up to 2X faster than
its non-symmetric counterpart and use up to a quarter of the original space
An adaptive prefix-assignment technique for symmetry reduction
This paper presents a technique for symmetry reduction that adaptively
assigns a prefix of variables in a system of constraints so that the generated
prefix-assignments are pairwise nonisomorphic under the action of the symmetry
group of the system. The technique is based on McKay's canonical extension
framework [J.~Algorithms 26 (1998), no.~2, 306--324]. Among key features of the
technique are (i) adaptability---the prefix sequence can be user-prescribed and
truncated for compatibility with the group of symmetries; (ii)
parallelizability---prefix-assignments can be processed in parallel
independently of each other; (iii) versatility---the method is applicable
whenever the group of symmetries can be concisely represented as the
automorphism group of a vertex-colored graph; and (iv) implementability---the
method can be implemented relying on a canonical labeling map for
vertex-colored graphs as the only nontrivial subroutine. To demonstrate the
practical applicability of our technique, we have prepared an experimental
open-source implementation of the technique and carry out a set of experiments
that demonstrate ability to reduce symmetry on hard instances. Furthermore, we
demonstrate that the implementation effectively parallelizes to compute
clusters with multiple nodes via a message-passing interface.Comment: Updated manuscript submitted for revie
Distributed Symmetry Breaking in Hypergraphs
Fundamental local symmetry breaking problems such as Maximal Independent Set
(MIS) and coloring have been recognized as important by the community, and
studied extensively in (standard) graphs. In particular, fast (i.e.,
logarithmic run time) randomized algorithms are well-established for MIS and
-coloring in both the LOCAL and CONGEST distributed computing
models. On the other hand, comparatively much less is known on the complexity
of distributed symmetry breaking in {\em hypergraphs}. In particular, a key
question is whether a fast (randomized) algorithm for MIS exists for
hypergraphs.
In this paper, we study the distributed complexity of symmetry breaking in
hypergraphs by presenting distributed randomized algorithms for a variety of
fundamental problems under a natural distributed computing model for
hypergraphs. We first show that MIS in hypergraphs (of arbitrary dimension) can
be solved in rounds ( is the number of nodes of the
hypergraph) in the LOCAL model. We then present a key result of this paper ---
an -round hypergraph MIS algorithm in
the CONGEST model where is the maximum node degree of the hypergraph
and is any arbitrarily small constant.
To demonstrate the usefulness of hypergraph MIS, we present applications of
our hypergraph algorithm to solving problems in (standard) graphs. In
particular, the hypergraph MIS yields fast distributed algorithms for the {\em
balanced minimal dominating set} problem (left open in Harris et al. [ICALP
2013]) and the {\em minimal connected dominating set problem}. We also present
distributed algorithms for coloring, maximal matching, and maximal clique in
hypergraphs.Comment: Changes from the previous version: More references adde
Optimal Collision/Conflict-free Distance-2 Coloring in Synchronous Broadcast/Receive Tree Networks
This article is on message-passing systems where communication is (a)
synchronous and (b) based on the "broadcast/receive" pair of communication
operations. "Synchronous" means that time is discrete and appears as a sequence
of time slots (or rounds) such that each message is received in the very same
round in which it is sent. "Broadcast/receive" means that during a round a
process can either broadcast a message to its neighbors or receive a message
from one of them. In such a communication model, no two neighbors of the same
process, nor a process and any of its neighbors, must be allowed to broadcast
during the same time slot (thereby preventing message collisions in the first
case, and message conflicts in the second case). From a graph theory point of
view, the allocation of slots to processes is know as the distance-2 coloring
problem: a color must be associated with each process (defining the time slots
in which it will be allowed to broadcast) in such a way that any two processes
at distance at most 2 obtain different colors, while the total number of colors
is "as small as possible". The paper presents a parallel message-passing
distance-2 coloring algorithm suited to trees, whose roots are dynamically
defined. This algorithm, which is itself collision-free and conflict-free, uses
colors where is the maximal degree of the graph (hence
the algorithm is color-optimal). It does not require all processes to have
different initial identities, and its time complexity is , where d
is the depth of the tree. As far as we know, this is the first distributed
distance-2 coloring algorithm designed for the broadcast/receive round-based
communication model, which owns all the previous properties.Comment: 19 pages including one appendix. One Figur
- âŠ