2,443 research outputs found

    Parallel Sort-Based Matching for Data Distribution Management on Shared-Memory Multiprocessors

    Full text link
    In this paper we consider the problem of identifying intersections between two sets of d-dimensional axis-parallel rectangles. This is a common problem that arises in many agent-based simulation studies, and is of central importance in the context of High Level Architecture (HLA), where it is at the core of the Data Distribution Management (DDM) service. Several realizations of the DDM service have been proposed; however, many of them are either inefficient or inherently sequential. These are serious limitations since multicore processors are now ubiquitous, and DDM algorithms -- being CPU-intensive -- could benefit from additional computing power. We propose a parallel version of the Sort-Based Matching algorithm for shared-memory multiprocessors. Sort-Based Matching is one of the most efficient serial algorithms for the DDM problem, but is quite difficult to parallelize due to data dependencies. We describe the algorithm and compute its asymptotic running time; we complete the analysis by assessing its performance and scalability through extensive experiments on two commodity multicore systems based on a dual socket Intel Xeon processor, and a single socket Intel Core i7 processor.Comment: Proceedings of the 21-th ACM/IEEE International Symposium on Distributed Simulation and Real Time Applications (DS-RT 2017). Best Paper Award @DS-RT 201

    Highly Parallel Processing of Relational Databases (Thesis)

    Get PDF

    Polyvalent Parallelizations for Hierarchical Block Matching Motion Estimation

    Get PDF
    Block matching motion estimation algorithms are widely used in video coding schemes. In this paper,we design an efficient hierarchical block matching motion estimation (HBMME) algorithm on a hypercube multiprocessor. Unlike systolic array designs, this solution is not tied down to specific values of algorithm parameters and thus offers increased flexibility. Moreover, the hypercube network can efficiently handle the non regular data flow of the HBMME algorithm. Our techniques nearly eliminate the occurrence of “difficult” communication patterns, namely many-to-many personalized communication, by replacing them with simple shift operations. These operations have an efficient implementation on most of interconnection networks and thus our techniques can be adapted to other networks as well. With regard to the employed multiprocessor we make no specific assumption about the amount of local memory residing in each processor. Instead, we introduce a free parameter S and assume that each processor has O(S) local memory. By doing so, we handle all the cases of modern multiprocessors, that is fine-grained, medium-grained and coarse-grained multiprocessors and thus our design is quite general

    Another Look at the Cost of Cryptographic Attacks

    Get PDF
    This paper makes the case for considering the cost of cryptographic attacks as the main measure of their efficiency, instead of their time complexity. This allows, in our opinion, a more realistic assessment of the "risk" these attacks represent. This is half-and-half a position and a technical paper. Cryptographic attacks described in the literature are rarely implemented. Most exist only "on paper", and their main characteristic is that their estimated time complexity is small enough to break a given security property. However, when a cryptanalyst actually considers implementing an attack, she soon realizes that there is more to the story than time complexity. For instance, Wiener has shown that breaking the double-DES costs 2 6n/5 , asymptotically more than exhaustive search on n bits. We put forward the asymptotic cost of cryptographic attacks as a measure of their practicality. We discuss the shortcomings of the usual computational model and propose a simple abstract cryptographic machine on which it is easy to estimate the cost. We then study the asymptotic cost of several relevant algorithm: collision search, the three-list birthday problem (3XOR) and solving multivariate quadratic polynomial equations. We find that some smart algorithms cost much more than what their time complexity suggest, while naive and simple algorithms may cost less. Some algorithms can be tuned to reduce their cost (this increases their time complexity). Foreword A celebrated High Performance Computing paper entitled "Hitting the Memory Wall: Implications of the Obvious" [47] opens with these words: This brief note points out something obvious-something the authors "knew" without really understanding. With apologies to those who did understand, we offer it to those others who, like us, missed the point. We would like to do the same-but this note is not so short

    Algorithm 947: Paraperm-parallel generation of random permutations with MPI

    Get PDF
    An algorithm for parallel generation of a random permutation of a large set of distinct integers is presented. This algorithm is designed for massively parallel systems with distributed memory architectures and the MPI-based runtime environments. Scalability of the algorithm is analyzed according to the memory and communication requirements. An implementation of the algorithm in a form of a software library based on the C++ programming language and the MPI application programming interface is further provided. Finally, performed experiments are described and their results discussed. The biggest of these experiments resulted in a generation of a random permutation of 241 integers in slightly more than four minutes using 131072 CPU cores

    Beyond Binary Search: Parallel In-Place Construction of Implicit Search Tree Layouts.

    Get PDF
    M.S. Thesis. University of Hawaiʻi at Mānoa 2018

    Doctor of Philosophy

    Get PDF
    dissertationSolutions to Partial Di erential Equations (PDEs) are often computed by discretizing the domain into a collection of computational elements referred to as a mesh. This solution is an approximation with an error that decreases as the mesh spacing decreases. However, decreasing the mesh spacing also increases the computational requirements. Adaptive mesh re nement (AMR) attempts to reduce the error while limiting the increase in computational requirements by re ning the mesh locally in regions of the domain that have large error while maintaining a coarse mesh in other portions of the domain. This approach often provides a solution that is as accurate as that obtained from a much larger xed mesh simulation, thus saving on both computational time and memory. However, historically, these AMR operations often limit the overall scalability of the application. Adapting the mesh at runtime necessitates scalable regridding and load balancing algorithms. This dissertation analyzes the performance bottlenecks for a widely used regridding algorithm and presents two new algorithms which exhibit ideal scalability. In addition, a scalable space- lling curve generation algorithm for dynamic load balancing is also presented. The performance of these algorithms is analyzed by determining their theoretical complexity, deriving performance models, and comparing the observed performance to those performance models. The models are then used to predict performance on larger numbers of processors. This analysis demonstrates the necessity of these algorithms at larger numbers of processors. This dissertation also investigates methods to more accurately predict workloads based on measurements taken at runtime. While the methods used are not new, the application of these methods to the load balancing process is. These methods are shown to be highly accurate and able to predict the workload within 3% error. By improving the accuracy of these estimations, the load imbalance of the simulation can be reduced, thereby increasing the overall performance
    corecore