901 research outputs found

    Study of preconditioners based on Markov Chain Monte Carlo methods

    Get PDF
    Nowadays, analysis and design of novel scalable methods and algorithms for fundamental linear algebra problems such as solving Systems of Linear Algebraic Equations with focus on large scale systems is a subject of study. This research focuses on the study of novel mathematical methods and scalable algorithms for computationally intensive problems such as Monte Carlo and Hybrid Methods and Algorithms

    Status and challenges of simulations with dynamical fermions

    Full text link
    An overview over the current state of algorithms for dynamical fermion simulations is given. In particular some insight into the functioning of the determinant spitting techniques is discussed. The critical slowing down of the simulations towards the continuum limit and the role of the boundary conditions is also reviewed.Comment: 20 pages, 9 figures, plenary talk presented at the 30th International Symposium on Lattice Field Theory - Lattice 2012, June 24-29, 2012 Cairns, Australi

    Status and Future Perspectives for Lattice Gauge Theory Calculations to the Exascale and Beyond

    Full text link
    In this and a set of companion whitepapers, the USQCD Collaboration lays out a program of science and computing for lattice gauge theory. These whitepapers describe how calculation using lattice QCD (and other gauge theories) can aid the interpretation of ongoing and upcoming experiments in particle and nuclear physics, as well as inspire new ones.Comment: 44 pages. 1 of USQCD whitepapers

    A Parallel Monte Carlo Algorithm for Solving the Scattering Problem in Plasmonic Nanoparticles

    Get PDF
    We propose an extension of the Ulam-Neumann algorithm for solving system of equations arising from photonic problems. This method has good parallel properties and allows to implement acceleration techniques

    Optimization of a parallel Monte Carlo method for linear algebra problems

    Get PDF
    Many problems in science and engineering can be represented by Systems of Linear Algebraic Equations (SLAEs). Numerical methods such as direct or iterative ones are used to solve these kind of systems. Depending on the size and other factors that characterize these systems they can be sometimes very difficult to solve even for iterative methods, requiring long time and large amounts of computational resources. In these cases a preconditioning approach should be applied. Preconditioning is a technique used to transform a SLAE into a equivalent but simpler system which requires less time and effort to be solved. The matrix which performs such transformation is called the preconditioner [7]. There are preconditioners for both direct and iterative methods but they are more commonly used among the later ones. In the general case a preconditioned system will require less effort to be solved than the original one. For example, when an iterative method is being used, less iterations will be required or each iteration will require less time, depending on the quality and the efficiency of the preconditioner. There are different classes of preconditioners but we will focused only on those that are based on the SParse Approximate Inverse (SPAI) approach. These algorithms are based on the fact that the approximate inverse of a given SLAE matrix can be used to approximate its result or to reduce its complexity. Monte Carlo methods are probabilistic methods, that use random numbers to either simulate a stochastic behaviour or to estimate the solution of a problem. They are good candidates for parallelization due to the fact that many independent samples are used to estimate the solution. These samples can be calculated in parallel, thereby speeding up the solution finding process [27]. In the past there has been a lot of research around the use of Monte Carlo methods to calculate SPAI preconditioners [1] [27] [10]. In this work we present the implementation of a SPAI preconditioner that is based on a Monte Carlo method. This algorithm calculates the matrix inverse by sampling a random variable which approximates the Neumann Series expansion. Using the Neumman series it is possible to calculate the matrix inverse of a system A by performing consecutive additions of the powers of a matrix expressed by the series expansion of (I − A) −1 . Given the stochastic approach of the Monte Carlo algorithm, the computational effort required to find an element of the inverse matrix is independent from the size of the matrix. This allows to target systems that, due to their size, can be prohibitive for common deterministic approaches [27]. Great part of this work is focused on the enhancement of this algorithm. First, the current errors of the implementation were fixed, making the algorithm able to target larger systems. Then multiple optimizations were applied at different stages of the implementation making a better use of the resources and improving the performance of the algorithm. Four optimizations, with consistently improvements have been performed: 1. An inefficient implementation of the realloc function within the MPI library was provoking the application to rapidly run out of memory. This function was replaced by the malloc function and some slight modifications to estimate the size of matrix A. 2. A coordinate format (COO) was introduced within the algorithm’s core to make a more efficient use of the memory, avoiding several unnecessary memory accesses. 3. A method to produce an intermediate matrix P was shown to produce similar results to the default one and with matrix P being reduced to a single vector, thus requiring less data. Given that this was a broadcast data a diminishing on it, translated into a reduction of the broadcast time. 4. Four individual procedures which accessed the whole initial matrix memory, were merged into two processes, reducing this way the number of memory accesses. For each optimization applied, a comparison was performed to show the particular improvements achieved. A set of different matrices, representing different SLAEs, was used to show the consistency of these improvements. In order to provide with insights about the scalability issues of the algorithm, other approaches are presented to show the particularities of the algorithm’s scalability: 1. Given that the original version of this algorithm was designed for a cluster of single-core machines, an hybrid approach of MPI + openMP was proposed to target the nowadays multi-core architectures. Surprisingly this new approach did not show any improvement but it was useful to show a scalability problem related to the random pattern used to access the memory. 2. Having that common MPI implementations of the broadcast operation do not take into account the different latencies between inter-node and intra-node communications [25]. Therefore, we decided to implement the broadcast in two steps. First by reaching a single process in each of the compute nodes and then using those processes to perform a local broadcast within their compute nodes. Results on this approach showed that this method could lead to improvements when very big systems are used. Finally a comparison is carried out between the optimized version of the Monte Carlo algorithm and the state of the art Modified SPAI (MSPAI). Four metrics are used to compare these approaches: 1. The amount of time needed for the preconditioner construction. 2. The time needed by the solver to calculate the solution of the preconditioned system. 3. The addition of the previous metrics, which gives a overview of the quality and efficiency of the preconditioner. 4. The number of cores used in the preconditioner construction. This gives an idea of the energy efficiency of the algorithm. Results from previous comparison showed that Monte Carlo algorithm can deal with both symmetric and nonsymmetric matrices while MSPAI only performs well with the nonsymetric ones. Furthermore the time for Monte Carlo’s algorithm is always faster for the preconditioner construction and most of the times also for the solver calculation. This means that Monte Carlo produces preconditioners of better or same quality than MSPAI. Finally, the number of cores used in the Monte Carlo approach is always equal or smaller than in the case of MSPAI

    Probabilistic structural mechanics research for parallel processing computers

    Get PDF
    Aerospace structures and spacecraft are a complex assemblage of structural components that are subjected to a variety of complex, cyclic, and transient loading conditions. Significant modeling uncertainties are present in these structures, in addition to the inherent randomness of material properties and loads. To properly account for these uncertainties in evaluating and assessing the reliability of these components and structures, probabilistic structural mechanics (PSM) procedures must be used. Much research has focused on basic theory development and the development of approximate analytic solution methods in random vibrations and structural reliability. Practical application of PSM methods was hampered by their computationally intense nature. Solution of PSM problems requires repeated analyses of structures that are often large, and exhibit nonlinear and/or dynamic response behavior. These methods are all inherently parallel and ideally suited to implementation on parallel processing computers. New hardware architectures and innovative control software and solution methodologies are needed to make solution of large scale PSM problems practical

    One-Flavour Hybrid Monte Carlo with Wilson Fermions

    Get PDF
    The Wilson fermion determinant can be written as product of the determinants of two hermitian positive definite matrices. This formulation allows to simulate non-degenerate quark flavors by means of the hybrid Monte Carlo algorithm. A major numerical difficulty is the occurrence of nested inversions. We construct a Uzawa iteration scheme which treats the nested system within one iterative process.Comment: 11 pages, to appear in proceedings of the workshop "Numerical Challenges in Lattice QCD", Springer Verla

    Towards an exact adaptive algorithm for the determinant of a rational matrix

    Full text link
    In this paper we propose several strategies for the exact computation of the determinant of a rational matrix. First, we use the Chinese Remaindering Theorem and the rational reconstruction to recover the rational determinant from its modular images. Then we show a preconditioning for the determinant which allows us to skip the rational reconstruction process and reconstruct an integer result. We compare those approaches with matrix preconditioning which allow us to treat integer instead of rational matrices. This allows us to introduce integer determinant algorithms to the rational determinant problem. In particular, we discuss the applicability of the adaptive determinant algorithm of [9] and compare it with the integer Chinese Remaindering scheme. We present an analysis of the complexity of the strategies and evaluate their experimental performance on numerous examples. This experience allows us to develop an adaptive strategy which would choose the best solution at the run time, depending on matrix properties. All strategies have been implemented in LinBox linear algebra library

    SSOR preconditioning in simulations of the QCD Schr\"odinger functional

    Get PDF
    We report on a parallelized implementation of SSOR preconditioning for O(a) improved lattice QCD with Schr\"odinger functional boundary conditions. Numerical simulations in the quenched approximation at parameters in the light quark mass region demonstrate that a performance gain of a factor \sim 1.5 over even-odd preconditioning can be achieved.Comment: 15 pages, latex2e, 4 Postscript figures, uses packages elsart and epsfi
    corecore