24,140 research outputs found

    Parallelization of Markov chain generation and its application to the multicanonical method

    Full text link
    We develop a simple algorithm to parallelize generation processes of Markov chains. In this algorithm, multiple Markov chains are generated in parallel and jointed together to make a longer Markov chain. The joints between the constituent Markov chains are processed using the detailed balance. We apply the parallelization algorithm to multicanonical calculations of the two-dimensional Ising model and demonstrate accurate estimation of multicanonical weights.Comment: 15 pages, 5 figures, uses elsart.cl

    Dynamic Algorithms for the Massively Parallel Computation Model

    Get PDF
    The Massive Parallel Computing (MPC) model gained popularity during the last decade and it is now seen as the standard model for processing large scale data. One significant shortcoming of the model is that it assumes to work on static datasets while, in practice, real-world datasets evolve continuously. To overcome this issue, in this paper we initiate the study of dynamic algorithms in the MPC model. We first discuss the main requirements for a dynamic parallel model and we show how to adapt the classic MPC model to capture them. Then we analyze the connection between classic dynamic algorithms and dynamic algorithms in the MPC model. Finally, we provide new efficient dynamic MPC algorithms for a variety of fundamental graph problems, including connectivity, minimum spanning tree and matching.Comment: Accepted to the 31st ACM Symposium on Parallelism in Algorithms and Architectures (SPAA 2019

    Scalable Exact Parent Sets Identification in Bayesian Networks Learning with Apache Spark

    Full text link
    In Machine Learning, the parent set identification problem is to find a set of random variables that best explain selected variable given the data and some predefined scoring function. This problem is a critical component to structure learning of Bayesian networks and Markov blankets discovery, and thus has many practical applications, ranging from fraud detection to clinical decision support. In this paper, we introduce a new distributed memory approach to the exact parent sets assignment problem. To achieve scalability, we derive theoretical bounds to constraint the search space when MDL scoring function is used, and we reorganize the underlying dynamic programming such that the computational density is increased and fine-grain synchronization is eliminated. We then design efficient realization of our approach in the Apache Spark platform. Through experimental results, we demonstrate that the method maintains strong scalability on a 500-core standalone Spark cluster, and it can be used to efficiently process data sets with 70 variables, far beyond the reach of the currently available solutions
    corecore