3,036 research outputs found

    One machine, one minute, three billion tetrahedra

    Full text link
    This paper presents a new scalable parallelization scheme to generate the 3D Delaunay triangulation of a given set of points. Our first contribution is an efficient serial implementation of the incremental Delaunay insertion algorithm. A simple dedicated data structure, an efficient sorting of the points and the optimization of the insertion algorithm have permitted to accelerate reference implementations by a factor three. Our second contribution is a multi-threaded version of the Delaunay kernel that is able to concurrently insert vertices. Moore curve coordinates are used to partition the point set, avoiding heavy synchronization overheads. Conflicts are managed by modifying the partitions with a simple rescaling of the space-filling curve. The performances of our implementation have been measured on three different processors, an Intel core-i7, an Intel Xeon Phi and an AMD EPYC, on which we have been able to compute 3 billion tetrahedra in 53 seconds. This corresponds to a generation rate of over 55 million tetrahedra per second. We finally show how this very efficient parallel Delaunay triangulation can be integrated in a Delaunay refinement mesh generator which takes as input the triangulated surface boundary of the volume to mesh

    Compiling global name-space programs for distributed execution

    Get PDF
    Distributed memory machines do not provide hardware support for a global address space. Thus programmers are forced to partition the data across the memories of the architecture and use explicit message passing to communicate data between processors. The compiler support required to allow programmers to express their algorithms using a global name-space is examined. A general method is presented for analysis of a high level source program and along with its translation to a set of independently executing tasks communicating via messages. If the compiler has enough information, this translation can be carried out at compile-time. Otherwise run-time code is generated to implement the required data movement. The analysis required in both situations is described and the performance of the generated code on the Intel iPSC/2 is presented

    Options in Scan Processing for Shared-Disk Parallel Database Systems

    Get PDF
    Shared-disk database systems offer a high degree of freedom in the allocation of workload compared to shared-nothing architectures. This creates a great potential for load balancing but also introduces additional complexity into the process of query scheduling. This report surveys the problems and opportunities faced in scan processing in a shared-disk environment. We list the parameters to tune and the decisions to make, as well as some known solutions and commonsense considerations, in order to identify the most promising areas of future research

    Load Balancing Algorithms for Parallel Spatial Join on HPC Platforms

    Get PDF
    Geospatial datasets are growing in volume, complexity, and heterogeneity. For efficient execution of geospatial computations and analytics on large scale datasets, parallel processing is necessary. To exploit fine-grained parallel processing on large scale compute clusters, partitioning of skewed datasets in a load-balanced way is challenging. The workload in spatial join is data dependent and highly irregular. Moreover, wide variation in the size and density of geometries from one region of the map to another, further exacerbates the load imbalance. This dissertation focuses on spatial join operation used in Geographic Information Systems (GIS) and spatial databases, where the inputs are two layers of geospatial data, and the output is a combination of the two layers according to join predicate.This dissertation introduces a novel spatial data partitioning algorithm geared towards load balancing the parallel spatial join processing. Unlike existing partitioning techniques, the proposed partitioning algorithm divides the spatial join workload instead of partitioning the individual datasets separately to provide better load-balancing. This workload partitioning algorithm has been evaluated on a high-performance computing system using real-world datasets. An intermediate output-sensitive duplication avoidance technique is proposed that decreases the external memory space requirement for storing spatial join candidates across the partitions. GPU acceleration is used to further reduce the spatial partitioning runtime. For dynamic load balancing in spatial join, a novel framework for fine-grained work stealing is presented. This framework is efficient and NUMA-aware. Performance improvements are demonstrated on shared and distributed memory architectures using threads and message passing. Experimental results show effective mitigation of data skew. The framework supports a variety of spatial join predicates and spatial overlay using partitioned and un-partitioned datasets

    Massiv-Parallele Algorithmen zum Laden von Daten auf Moderner Hardware

    Get PDF
    While systems face an ever-growing amount of data that needs to be ingested, queried and analysed, processors are seeing only moderate improvements in sequential processing performance. This thesis addresses the fundamental shift towards increasingly parallel processors and contributes multiple massively parallel algorithms to accelerate different stages of the ingestion pipeline, such as data parsing and sorting.Systeme sehen sich mit einer stetig anwachsenden Menge an Daten konfrontiert, die geladen und analysiert, sowie Anfragen darauf bearbeitet werden müssen. Gleichzeitig nimmt die sequentielle Verarbeitungsgeschwindigkeit von Prozessoren nur noch moderat zu. Diese Arbeit adressiert den Wandel hin zu zunehmend parallelen Prozessoren und leistet mit mehreren massiv-parallelen Algorithmen einen Beitrag um unterschiedliche Phasen der Datenverarbeitung wie zum Beispiel Parsing und Sortierung zu beschleunigen
    • …
    corecore