201 research outputs found

    Parallel generalized Delaunay mesh refinement

    Get PDF
    The modeling of physical phenomena in computational fracture mechanics, computational fluid dynamics and other fields is based on solving systems of partial differential equations (PDEs). When PDEs are defined over geometrically complex domains, they often do not admit closed form solutions. In such cases, they are solved approximately using discretizations of domains into simple elements like triangles and quadrilaterals in two dimensions (2D), and tetrahedra and hexahedra in three dimensions (3D). These discretizations are called finite element meshes. Many applications, for example, real-time computer assisted surgery, or crack propagation from fracture mechanics, impose time and/or mesh size constraints that cannot be met on a single sequential machine. as a result, the development of parallel mesh generation algorithms is required.;In this dissertation, we describe a complete solution for both sequential and parallel construction of guaranteed quality Delaunay meshes for 2D and 3D geometries. First, we generalize the existing 2D and 3D Delaunay refinement algorithms along with theoretical proofs of mesh quality in terms of element shape and mesh gradation. Existing algorithms are constrained by just one or two specific positions for the insertion of a Steiner point inside a circumscribed disk of a poorly shaped element. We derive an entire 2D or 3D region for the selection of a Steiner point (i.e., infinitely many choices) inside the circumscribed disk. Second, we develop a novel theory which extends both the 2D and the 3D Generalized Delaunay Refinement methods for the concurrent and mathematically guaranteed independent insertion of Steiner points. Previous parallel algorithms are either reactive relying on implementation heuristics to resolve dependencies in parallel mesh generation computations or require the solution of a very difficult geometric optimization problem (the domain decomposition problem) which is still open for general 3D geometries. Our theory solves both of these drawbacks. Third, using our generalization of both the sequential and the parallel algorithms we implemented prototypes of practical and efficient parallel generalized guaranteed quality Delaunay refinement codes for both 2D and 3D geometries using existing state-of-the-art sequential codes for traditional Delaunay refinement methods. On a heterogeneous cluster of more than 100 processors our implementation can generate a uniform mesh with about a billion elements in less than 5 minutes. Even on a workstation with a few cores, we achieve a significant performance improvement over the corresponding state-of-the-art sequential 3D code, for graded meshes

    Scalable Parallel Delaunay Image-to-Mesh Conversion for Shared and Distributed Memory Architectures

    Get PDF
    Mesh generation is an essential component for many engineering applications. The ability to generate meshes in parallel is critical for the scalability of the entire Finite Element Method (FEM) pipeline. However, parallel mesh generation applications belong to the broader class of adaptive and irregular problems, and are among the most complex, challenging, and labor intensive to develop and maintain. In this thesis, we summarize several years of the progress that we made in a novel framework for highly scalable and guaranteed quality mesh generation for finite element analysis in three dimensions. We studied and developed parallel mesh generation algorithms on both shared and distributed memory architectures. In this thesis we present a novel two-level parallel tetrahedral mesh generation framework capable of delivering and sustaining close to 6000 of concurrent work units (cores). We achieve this by leveraging concurrency at two different granularity levels by using a hybrid message passing and multi-threaded execution model which is suitable to the hierarchy of the hardware architecture of the distributed memory clusters. An end-user productivity and scalability study was performed on up to 6000 cores, and indicated very good end-user productivity with about 300 million tets per second and about 3600 weak scaling speedup. Both of these results suggest that: compared to the best previous algorithm, we have seen an improvement of more than 7000 times in performance, measured in terms of speed (elements per second) by using about 180 times more CPUs, for geometries that are by many orders of magnitude more complex

    Extreme-Scale Parallel Mesh Generation: Telescopic Approach

    Get PDF
    In this poster we focus and present our preliminary results pertinent to the integration of multiple parallel Delaunay mesh generation methods into a coherent hierarchical framework. The goal of this project is to study our telescopic approach and to develop Delaunay-based methods to explore concurrency at all hardware layers using abstractions at (a) medium-grain level for many cores within a single chip and (b) coarse-grain level, i.e., sub-domain level using proper error metric- and application-specific continuous decomposition methods

    A Unified Framework for Parallel Anisotropic Mesh Adaptation

    Get PDF
    Finite-element methods are a critical component of the design and analysis procedures of many (bio-)engineering applications. Mesh adaptation is one of the most crucial components since it discretizes the physics of the application at a relatively low cost to the solver. Highly scalable parallel mesh adaptation methods for High-Performance Computing (HPC) are essential to meet the ever-growing demand for higher fidelity simulations. Moreover, the continuous growth of the complexity of the HPC systems requires a systematic approach to exploit their full potential. Anisotropic mesh adaptation captures features of the solution at multiple scales while, minimizing the required number of elements. However, it also introduces new challenges on top of mesh generation. Also, the increased complexity of the targeted cases requires departing from traditional surface-constrained approaches to utilizing CAD (Computer-Aided Design) kernels. Alongside the functionality requirements, is the need of taking advantage of the ubiquitous multi-core machines. More importantly, the parallel implementation needs to handle the ever-increasing complexity of the mesh adaptation code. In this work, we develop a parallel mesh adaptation method that utilizes a metric-based approach for generating anisotropic meshes. Moreover, we enhance our method by interfacing with a CAD kernel, thus enabling its use on complex geometries. We evaluate our method both with fixed-resolution benchmarks and within a simulation pipeline, where the resolution of the discretization increases incrementally. With the Telescopic Approach for scalable mesh generation as a guide, we propose a parallel method at the node (multi-core) for mesh adaptation that is expected to scale up efficiently to the upcoming exascale machines. To facilitate an effective implementation, we introduce an abstract layer between the application and the runtime system that enables the use of task-based parallelism for concurrent mesh operations. Our evaluation indicates results comparable to state-of-the-art methods for fixed-resolution meshes both in terms of performance and quality. The integration with an adaptive pipeline offers promising results for the capability of the proposed method to function as part of an adaptive simulation. Moreover, our abstract tasking layer allows the separation of different aspects of the implementation without any impact on the functionality of the method

    Parallelization of the Advancing Front Local Reconnection Mesh Generation Software Using a Pseudo-Constrained Parallel Data Refinement Method

    Get PDF
    Preliminary results of a long-term project entailing the parallelization of an industrial strength sequential mesh generator, called Advancing Front Local Reconnection (AFLR), are presented. AFLR has been under development for the last 25 years at the NSF/ERC center at Mississippi State University. The parallel procedure that is presented is called Pseudo-constrained (PsC) Parallel Data Refinement (PDR) and consists of the following steps: (i) use an octree data-decomposition scheme to divide the original geometry into subdomains (octree leaves), (ii) refine each subdomain with the proper adjustments of its neighbors using the given refinement code, and (iii) combine all subdomain data into a single, conforming mesh. Parallelism was achieved by implementing Pseudo-constrained Parallel Data Refinement AFLR (PsC.AFLR) on top of a runtime system called Parallel Runtime Environment for Multi-computer Applications (PREMA). During run time, the PsC.AFLR method exposes data decomposition information (number of subdomains waiting to be refined) to the underlying runtime system. In turn, this system facilitates work-load balancing and guides the program’s execution towards the most efficient utilization of hardware resources. Preliminary results, on the mesh refinement operation, show that the end-user productivity (measured in terms of elements refined per second) increases as the number of cores in use are increased. When using approximately 16 cores, PsC.AFLR outperforms the serial AFLR code by about 11 times. PsC.AFLR also maintains its stability by generating meshes of comparable quality. Although it offers good end-user productivity, PsC.AFLR suffers in its capability to generate meshes with the same level of density or quality as that of the serial AFLR software due to the constraints set by subdomain boundaries that are required to successfully execute AFLR. These constraints demonstrate that it is not ideal to use AFLR in a black box manner when parallelizing the software. Its source code must be modified to a non-trivial extent if one wishes to remove these constraints and maximize the end-user productivity and potential scalability

    Enabling technology for non-rigid registration during image-guided neurosurgery

    Get PDF
    In the context of image processing, non-rigid registration is an operation that attempts to align two or more images using spatially varying transformations. Non-rigid registration finds application in medical image processing to account for the deformations in the soft tissues of the imaged organs. During image-guided neurosurgery, non-rigid registration has the potential to assist in locating critical brain structures and improve identification of the tumor boundary. Robust non-rigid registration methods combine estimation of tissue displacement based on image intensities with the spatial regularization using biomechanical models of brain deformation. In practice, the use of such registration methods during neurosurgery is complicated by a number of issues: construction of the biomechanical model used in the registration from the image data, high computational demands of the application, and difficulties in assessing the registration results. In this dissertation we develop methods and tools that address some of these challenges, and provide components essential for the intra-operative application of a previously validated physics-based non-rigid registration method.;First, we study the problem of image-to-mesh conversion, which is required for constructing biomechanical model of the brain used during registration. We develop and analyze a number of methods suitable for solving this problem, and evaluate them using application-specific quantitative metrics. Second, we develop a high-performance implementation of the non-rigid registration algorithm and study the use of geographically distributed Grid resources for speculative registration computations. Using the high-performance implementation running on the remote computing resources we are able to deliver the results of registration within the time constraints of the neurosurgery. Finally, we present a method that estimates local alignment error between the two images of the same subject. We assess the utility of this method using multiple sources of ground truth to evaluate its potential to support speculative computations of non-rigid registration

    Decoupling method for parallel Delaunay two-dimensional mesh generation

    Get PDF
    Parallel mesh generation procedures that are based on geometric domain decompositions require the permanent separators to be of good quality (in terms of their angles and length), in order to maintain the mesh quality. The Medial Axis Domain Decomposition, an innovative geometric domain decomposition procedure that addresses this problem, is introduced. The Medial Axis domain decomposition is of high quality in terms of the formed angles, and provides separators of small size, and also good work-load balance. It presents for the first time a decomposition method suitable for parallel meshing procedures that are based on geometric domain decompositions.;The Decoupling Method for parallel Delaunay 2D mesh generation is a highly efficient and effective parallel procedure, able to generate billions of elements in a few hundred of seconds, on distributed memory machines. Our mathematical formulation introduces the notion of the decoupling path, which guarantees the decoupling property, and also the quality and conformity of the Delaunay submeshes. The subdomains are meshed independently, and as a result, the method eliminates the communication and the synchronization during the parallel meshing. A method for shielding small angles is introduced, so that the decoupled parallel Delaunay algorithm can be applied on domains with small angles. Moreover, I present the construction of a sizing function, that encompasses an existing sizing function and also geometric features and small angles. The decoupling procedure can be used for parallel graded Delaunay mesh generation, controlled by the sizing function

    Effective Large Scale Computing Software for Parallel Mesh Generation

    Get PDF
    Scientists commonly turn to supercomputers or Clusters of Workstations with hundreds (even thousands) of nodes to generate meshes for large-scale simulations. Parallel mesh generation software is then used to decompose the original mesh generation problem into smaller sub-problems that can be solved (meshed) in parallel. The size of the final mesh is limited by the amount of aggregate memory of the parallel machine. Also, requesting many compute nodes on a shared computing resource may result in a long waiting, far surpassing the time it takes to solve the problem.;These two problems (i.e., insufficient memory when computing on a small number of nodes, and long waiting times when using many nodes from a shared computing resource) can be addressed by using out-of-core algorithms. These are algorithms that keep most of the dataset out-of-core (i.e., outside of memory, on disk) and load only a portion in-core (i.e., into memory) at a time.;We explored two approaches to out-of-core computing. First, we presented a traditional approach, which is to modify the existing in-core algorithms to enable out-of-core computing. While we achieved good performance with this approach the task is complex and labor intensive. An alternative approach, we presented a runtime system designed to support out-of-core applications. It requires little modification of the existing in-core application code and still produces acceptable results. Evaluation of the runtime system showed little performance degradation while simplifying and shortening the development cycle of out-of-core applications. The overhead from using the runtime system for small problem sizes is between 12% and 41% while the overlap of computation, communication and disk I/O is above 50% and as high as 61% for large problems.;The main contribution of our work is the ability to utilize computing resources more effectively. The user has a choice of either solving larger problems, that otherwise would not be possible, or solving problems of the same size but using fewer computing nodes, thus reducing the waiting time on shared clusters and supercomputers. We demonstrated that the latter could potentially lead to substantially shorter wall-clock time

    Real-Time High-Quality Image to Mesh Conversion for Finite Element Simulations

    Get PDF
    Technological Advances in Medical Imaging have enabled the acquisition of images accurately describing biological tissues. Finite Element (FE) methods on these images provide the means to simulate biological phenomena such as brain shift registration, respiratory organ motion, blood flow pressure in vessels, etc. FE methods require the domain of tissues be discretized by simpler geometric elements, such as triangles in two dimensions, tetrahedra in three, and pentatopes in four. This exact discretization is called a mesh . The accuracy and speed of FE methods depend on the quality and fidelity of the mesh used to describe the biological object. Elements with bad quality introduce numerical errors and slower solver convergence. Also, analysis based on poor fidelity meshes do not yield accurate results specially near the surface. In this dissertation, we present the theory and the implementation of both a sequential and a parallel Delaunay meshing technique for 3D and ---for the first time--- 4D space-time domains. Our method provably guarantees that the mesh is a faithful representation of the multi-tissue domain in topological and geometric sense. Moreover, we show that our method generates graded elements of bounded radius-edge and aspect ratio, which renders our technique suitable for Finite Element analysis. A notable feature of our implementation is speed and scalability. The single-threaded performance of our 3D code is faster than the state of the art open source meshing tools. Experimental evaluation shows a more than 82% weak scaling efficiency for up to 144 cores, reaching a rate of more than 14.3 million elements per second. This is the first 3D parallel Delaunay refinement method to achieve such a performance, on either distributed or shared-memory architectures. Lastly, this dissertation is the first to develop and examine the sequential and parallel high-quality and fidelity meshing of general space-time 4D multi-tissue domains
    • …
    corecore