24,140 research outputs found
Parallelization of Markov chain generation and its application to the multicanonical method
We develop a simple algorithm to parallelize generation processes of Markov
chains. In this algorithm, multiple Markov chains are generated in parallel and
jointed together to make a longer Markov chain. The joints between the
constituent Markov chains are processed using the detailed balance. We apply
the parallelization algorithm to multicanonical calculations of the
two-dimensional Ising model and demonstrate accurate estimation of
multicanonical weights.Comment: 15 pages, 5 figures, uses elsart.cl
Dynamic Algorithms for the Massively Parallel Computation Model
The Massive Parallel Computing (MPC) model gained popularity during the last
decade and it is now seen as the standard model for processing large scale
data. One significant shortcoming of the model is that it assumes to work on
static datasets while, in practice, real-world datasets evolve continuously. To
overcome this issue, in this paper we initiate the study of dynamic algorithms
in the MPC model.
We first discuss the main requirements for a dynamic parallel model and we
show how to adapt the classic MPC model to capture them. Then we analyze the
connection between classic dynamic algorithms and dynamic algorithms in the MPC
model. Finally, we provide new efficient dynamic MPC algorithms for a variety
of fundamental graph problems, including connectivity, minimum spanning tree
and matching.Comment: Accepted to the 31st ACM Symposium on Parallelism in Algorithms and
Architectures (SPAA 2019
Scalable Exact Parent Sets Identification in Bayesian Networks Learning with Apache Spark
In Machine Learning, the parent set identification problem is to find a set
of random variables that best explain selected variable given the data and some
predefined scoring function. This problem is a critical component to structure
learning of Bayesian networks and Markov blankets discovery, and thus has many
practical applications, ranging from fraud detection to clinical decision
support. In this paper, we introduce a new distributed memory approach to the
exact parent sets assignment problem. To achieve scalability, we derive
theoretical bounds to constraint the search space when MDL scoring function is
used, and we reorganize the underlying dynamic programming such that the
computational density is increased and fine-grain synchronization is
eliminated. We then design efficient realization of our approach in the Apache
Spark platform. Through experimental results, we demonstrate that the method
maintains strong scalability on a 500-core standalone Spark cluster, and it can
be used to efficiently process data sets with 70 variables, far beyond the
reach of the currently available solutions
- …