4,024 research outputs found

    Methods for Partitioning Data to Improve Parallel Execution Time for Sorting on Heterogeneous Clusters

    Get PDF
    International audienceThe aim of the paper is to introduce general techniques in order to optimize the parallel execution time of sorting on a distributed architectures with processors of various speeds. Such an application requires a partitioning step. For uniformly related processors (processors speeds are related by a constant factor), we develop a constant time technique for mastering processor load and execution time in an heterogeneous environment and also a technique to deal with unknown cost functions. For non uniformly related processors, we use a technique based on dynamic programming. Most of the time, the solutions are in O(p) (p is the number of processors), independent of the problem size n. Consequently, there is a small overhead regarding the problem we deal with but it is inherently limited by the knowing of time complexity of the portion of code following the partitioning

    The Family of MapReduce and Large Scale Data Processing Systems

    Full text link
    In the last two decades, the continuous increase of computational power has produced an overwhelming flow of data which has called for a paradigm shift in the computing architecture and large scale data processing mechanisms. MapReduce is a simple and powerful programming model that enables easy development of scalable parallel applications to process vast amounts of data on large clusters of commodity machines. It isolates the application from the details of running a distributed program such as issues on data distribution, scheduling and fault tolerance. However, the original implementation of the MapReduce framework had some limitations that have been tackled by many research efforts in several followup works after its introduction. This article provides a comprehensive survey for a family of approaches and mechanisms of large scale data processing mechanisms that have been implemented based on the original idea of the MapReduce framework and are currently gaining a lot of momentum in both research and industrial communities. We also cover a set of introduced systems that have been implemented to provide declarative programming interfaces on top of the MapReduce framework. In addition, we review several large scale data processing systems that resemble some of the ideas of the MapReduce framework for different purposes and application scenarios. Finally, we discuss some of the future research directions for implementing the next generation of MapReduce-like solutions.Comment: arXiv admin note: text overlap with arXiv:1105.4252 by other author

    A Parallel Monte Carlo Code for Simulating Collisional N-body Systems

    Full text link
    We present a new parallel code for computing the dynamical evolution of collisional N-body systems with up to N~10^7 particles. Our code is based on the the Henon Monte Carlo method for solving the Fokker-Planck equation, and makes assumptions of spherical symmetry and dynamical equilibrium. The principal algorithmic developments involve optimizing data structures, and the introduction of a parallel random number generation scheme, as well as a parallel sorting algorithm, required to find nearest neighbors for interactions and to compute the gravitational potential. The new algorithms we introduce along with our choice of decomposition scheme minimize communication costs and ensure optimal distribution of data and workload among the processing units. The implementation uses the Message Passing Interface (MPI) library for communication, which makes it portable to many different supercomputing architectures. We validate the code by calculating the evolution of clusters with initial Plummer distribution functions up to core collapse with the number of stars, N, spanning three orders of magnitude, from 10^5 to 10^7. We find that our results are in good agreement with self-similar core-collapse solutions, and the core collapse times generally agree with expectations from the literature. Also, we observe good total energy conservation, within less than 0.04% throughout all simulations. We analyze the performance of the code, and demonstrate near-linear scaling of the runtime with the number of processors up to 64 processors for N=10^5, 128 for N=10^6 and 256 for N=10^7. The runtime reaches a saturation with the addition of more processors beyond these limits which is a characteristic of the parallel sorting algorithm. The resulting maximum speedups we achieve are approximately 60x, 100x, and 220x, respectively.Comment: 53 pages, 13 figures, accepted for publication in ApJ Supplement
    • …
    corecore