1,344 research outputs found

    Applications of a hyper-graph grammar system in adaptive finite-element computations

    Get PDF
    This paper describes application of a hyper-graph grammar system for modeling a three-dimensional adaptive finite element method. The hyper-graph grammar approach allows obtaining a linear computational cost of adaptive mesh transformations and computations performed over refined meshes. The computations are done by a hyper-graph grammar driven algorithm applicable to three-dimensional problems. For the case of typical refinements performed towards a point or an edge, the algorithm yields linear computational cost with respect to the mesh nodes for its sequential execution and logarithmic cost for its parallel execution. Such hyper-graph grammar productions are the mathematical formalism used to describe the computational algorithm implementing the finite element method. Each production indicates the smallest atomic task that can be executed concurrently. The mesh transformations and computations by using the hyper-graph grammar-based approach have been tested in the GALOIS environment. We conclude the paper with some numerical results performed on a shared-memory Linux cluster node, for the case of three-dimensional computational meshes refined towards a point, an edge and a face

    Quasi-optimal elimination trees for 2D grids with singularities

    Get PDF
    We construct quasi-optimal elimination trees for 2D finite element meshes with singularities.These trees minimize the complexity of the solution of the discrete system. The computational cost estimates of the elimination process model the execution of the multifrontal algorithms in serial and in parallel shared-memory executions. Since the meshes considered are a subspace of all possible mesh partitions, we call these minimizers quasi-optimal.We minimize the cost functionals using dynamic programming. Finding these minimizers is more computationally expensive than solving the original algebraic system. Nevertheless, from the insights provided by the analysis of the dynamic programming minima, we propose a heuristic construction of the elimination trees that has cost O(log(Ne log(Ne)), where N e is the number of elements in the mesh.We show that this heuristic ordering has similar computational cost to the quasi-optimal elimination trees found with dynamic programming and outperforms state-of-the-art alternatives in our numerical experiments

    Parallel Fast Isogeometric Solvers for Explicit Dynamics

    Get PDF
    This paper presents a parallel implementation of the fast isogeometric solvers for explicit dynamics for solving non-stationary time-dependent problems. The algorithm is described in pseudo-code. We present theoretical estimates of the computational and communication complexities for a single time step of the parallel algorithm. The computational complexity is O(p^6 N/c t_comp) and communication complexity is O(N/(c^(2/3)t_comm) where p denotes the polynomial order of B-spline basis with Cp-1 global continuity, N denotes the number of elements and c is number of processors forming a cube, t_comp refers to the execution time of a single operation, and t_comm refers to the time of sending a single datum. We compare theoretical estimates with numerical experiments performed on the LONESTAR Linux cluster from Texas Advanced Computing Center, using 1 000 processors. We apply the method to solve nonlinear flows in highly heterogeneous porous media

    Comparison of the Structure of Equation Systems and the GPU Multifrontal Solver for Finite Difference, Collocation and Finite Element Method

    Get PDF
    AbstractThe article is an in-depth comparison of numerical solvers and corresponding solution pro- cesses of the systems of algebraic equations resulting from finite difference, collocation, and finite element approximations. The paper considers recently developed isogeometric versions of the collocation and finite element methods, employing B-splines for the computations and ensuring Cp−1 continuity on the borders of elements for the B-splines of the order p. For solving the systems, we use our GPU implementation of the state-of-the-art parallel multifrontal solver, which leverages modern GPU architectures and allows to reduce the complexity. We analyze the structures of linear equation systems resulting from each of the methods and how different matrix structures lead to different multifrontal solver elimination trees. The paper also considers the flows of multifrontal solver depending on the originally employed method

    Telescopic hybrid fast solver for 3D elliptic problems with point singularities

    Get PDF
    This paper describes a telescopic solver for two dimensional h adaptive grids with point singularities. The input for the telescopic solver is an h refined two dimensional computational mesh with rectangular finite elements. The candidates for point singularities are first localized over the mesh by using a greedy algorithm. Having the candidates for point singularities, we execute either a direct solver, that performs multiple refinements towards selected point singularities and executes a parallel direct solver algorithm which has logarithmic cost with respect to refinement level. The direct solvers executed over each candidate for point singularity return local Schur complement matrices that can be merged together and submitted to iterative solver. In this paper we utilize a parallel multi-thread GALOIS solver as a direct solver. We use Incomplete LU Preconditioned Conjugated Gradients (ILUPCG) as an iterative solver. We also show that elimination of point singularities from the refined mesh reduces significantly the number of iterations to be performed by the ILUPCG iterative solver

    PCA Beyond The Concept of Manifolds: Principal Trees, Metro Maps, and Elastic Cubic Complexes

    Full text link
    Multidimensional data distributions can have complex topologies and variable local dimensions. To approximate complex data, we propose a new type of low-dimensional ``principal object'': a principal cubic complex. This complex is a generalization of linear and non-linear principal manifolds and includes them as a particular case. To construct such an object, we combine a method of topological grammars with the minimization of an elastic energy defined for its embedment into multidimensional data space. The whole complex is presented as a system of nodes and springs and as a product of one-dimensional continua (represented by graphs), and the grammars describe how these continua transform during the process of optimal complex construction. The simplest case of a topological grammar (``add a node'', ``bisect an edge'') is equivalent to the construction of ``principal trees'', an object useful in many practical applications. We demonstrate how it can be applied to the analysis of bacterial genomes and for visualization of cDNA microarray data using the ``metro map'' representation. The preprint is supplemented by animation: ``How the topological grammar constructs branching principal components (AnimatedBranchingPCA.gif)''.Comment: 19 pages, 8 figure

    Alternating directions parallel hybrid memory iGRM direct solver for non-stationary simulations

    Get PDF
    The three-dimensional isogeometric analysis (IGA-FEM) is a modern method for simulation. The idea is to utilize B-splines or NURBS basis functions for both computational domain descriptions and the engineering computations. Refined isogeometric analysis (rIGA) employs a mixture of patches of elements with B-spline basis functions, and C0C^0 separators between them. It enables a reduction of the computational cost of direct solvers. Both IGA and rIGA come with challenging sparse matrix structure, that is expensive to generate. In this paper, we show a hybrid parallelization method to reduce the computational cost of the integration phase using hybrid-memory parallel machines. The two-level parallelization includes the partitioning of the computational mesh into sub-domains on the first level (MPI), and loop parallelization on the second level (OpenMP). We show that hybrid parallelization of the integration reduces the contribution of this phase significantly. Thus, alternative algorithms for fast isogeometric integration are not necessary

    Computational cost of isogeometric multi-frontal solvers on parallel distributed memory machines

    Get PDF
    This paper derives theoretical estimates of the computational cost for isogeometric multi-frontal direct solver executed on parallel distributed memory machines. We show theoretically that for the Cp-1 global continuity of the isogeometric solution, both the computational cost and the communication cost of a direct solver are of order O(log(N)p2) for the one dimensional (1D) case, O(Np2) for the two dimensional (2D) case, and O(N4/3p2) for the three dimensional (3D) case, where N is the number of degrees of freedom and p is the polynomial order of the B-spline basis functions. The theoretical estimates are verified by numerical experiments performed with three parallel multi-frontal direct solvers: MUMPS, PaStiX and SuperLU, available through PETIGA toolkit built on top of PETSc. Numerical results confirm these theoretical estimates both in terms of p and N. For a given problem size, the strong efficiency rapidly decreases as the number of processors increases, becoming about 20% for 256 processors for a 3D example with 1283 unknowns and linear B-splines with C0 global continuity, and 15% for a 3D example with 643 unknowns and quartic B-splines with C3 global continuity. At the same time, one cannot arbitrarily increase the problem size, since the memory required by higher order continuity spaces is large, quickly consuming all the available memory resources even in the parallel distributed memory version. Numerical results also suggest that the use of distributed parallel machines is highly beneficial when solving higher order continuity spaces, although the number of processors that one can efficiently employ is somehow limited

    Hardware-Aware Static Optimization of Hyperdimensional Computations

    Full text link
    Binary spatter code (BSC)-based hyperdimensional computing (HDC) is a highly error-resilient approximate computational paradigm suited for error-prone, emerging hardware platforms. In BSC HDC, the basic datatype is a hypervector, a typically large binary vector, where the size of the hypervector has a significant impact on the fidelity and resource usage of the computation. Typically, the hypervector size is dynamically tuned to deliver the desired accuracy; this process is time-consuming and often produces hypervector sizes that lack accuracy guarantees and produce poor results when reused for very similar workloads. We present Heim, a hardware-aware static analysis and optimization framework for BSC HD computations. Heim analytically derives the minimum hypervector size that minimizes resource usage and meets the target accuracy requirement. Heim guarantees the optimized computation converges to the user-provided accuracy target on expectation, even in the presence of hardware error. Heim deploys a novel static analysis procedure that unifies theoretical results from the neuroscience community to systematically optimize HD computations. We evaluate Heim against dynamic tuning-based optimization on 25 benchmark data structures. Given a 99% accuracy requirement, Heim-optimized computations achieve a 99.2%-100.0% median accuracy, up to 49.5% higher than dynamic tuning-based optimization, while achieving 1.15x-7.14x reductions in hypervector size compared to HD computations that achieve comparable query accuracy and finding parametrizations 30.0x-100167.4x faster than dynamic tuning-based approaches. We also use Heim to systematically evaluate the performance benefits of using analog CAMs and multiple-bit-per-cell ReRAM over conventional hardware, while maintaining iso-accuracy -- for both emerging technologies, we find usages where the emerging hardware imparts significant benefits
    corecore