191,515 research outputs found

    Finite difference methods fengshui: alignment through a mathematics of arrays

    Get PDF
    Numerous scientific-computational domains make use of array data. The core computing of the numerical methods and the algorithms involved is related to multi-dimensional array manipulation. Memory layout and the access patterns of that data are crucial to the optimal performance of the array-based computations. As we move towards exascale computing, writing portable code for efficient data parallel computations is increasingly requiring an abstract productive working environment. To that end, we present the design of a framework for optimizing scientific array-based computations, building a case study for a Partial Differential Equations solver. By embedding the Mathematics of Arrays formalism in the Magnolia programming language, we assemble a software stack capable of abstracting the continuous high-level application layer from the discrete formulation of the collective array-based numerical methods and algorithms and the final detailed low-level code. The case study lays the groundwork for achieving optimized memory layout and efficient computations while preserving a stable abstraction layer independent of underlying algorithms and changes in the architecture.Peer ReviewedPostprint (author's final draft

    Quantum Algorithms for Scientific Computing and Approximate Optimization

    Get PDF
    Quantum computation appears to offer significant advantages over classical computation and this has generated a tremendous interest in the field. In this thesis we study the application of quantum computers to computational problems in science and engineering, and to combinatorial optimization problems. We outline the results below. Algorithms for scientific computing require modules, i.e., building blocks, implementing elementary numerical functions that have well-controlled numerical error, are uniformly scalable and reversible, and that can be implemented efficiently. We derive quantum algorithms and circuits for computing square roots, logarithms, and arbitrary fractional powers, and derive worst-case error and cost bounds. We describe a modular approach to quantum algorithm design as a first step towards numerical standards and mathematical libraries for quantum scientific computing. A fundamental but computationally hard problem in physics is to solve the time-independent Schrödinger equation. This is accomplished by computing the eigenvalues of the corresponding Hamiltonian operator. The eigenvalues describe the different energy levels of a system. The cost of classical deterministic algorithms computing these eigenvalues grows exponentially with the number of system degrees of freedom. The number of degrees of freedom is typically proportional to the number of particles in a physical system. We show an efficient quantum algorithm for approximating a constant number of low-order eigenvalues of a Hamiltonian using a perturbation approach. We apply this algorithm to a special case of the Schrödinger equation and show that our algorithm succeeds with high probability, and has cost that scales polynomially with the number of degrees of freedom and the reciprocal of the desired accuracy. This improves and extends earlier results on quantum algorithms for estimating the ground state energy. We consider the simulation of quantum mechanical systems on a quantum computer. We show a novel divide and conquer approach for Hamiltonian simulation. Using the Hamiltonian structure, we can obtain faster simulation algorithms. Considering a sum of Hamiltonians we split them into groups, simulate each group separately, and combine the partial results. Simulation is customized to take advantage of the properties of each group, and hence yield refined bounds to the overall simulation cost. We illustrate our results using the electronic structure problem of quantum chemistry, where we obtain significantly improved cost estimates under mild assumptions. We turn to combinatorial optimization problems. An important open question is whether quantum computers provide advantages for the approximation of classically hard combinatorial problems. A promising recently proposed approach of Farhi et al. is the Quantum Approximate Optimization Algorithm (QAOA). We study the application of QAOA to the Maximum Cut problem, and derive analytic performance bounds for the lowest circuit-depth realization, for both general and special classes of graphs. Along the way, we develop a general procedure for analyzing the performance of QAOA for other problems, and show an example demonstrating the difficulty of obtaining similar results for greater depth. We show a generalization of QAOA and its application to wider classes of combinatorial optimization problems, in particular, problems with feasibility constraints. We introduce the Quantum Alternating Operator Ansatz, which utilizes more general unitary operators than the original QAOA proposal. Our framework facilitates low-resource implementations for many applications which may be particularly suitable for early quantum computers. We specify design criteria, and develop a set of results and tools for mapping diverse problems to explicit quantum circuits. We derive constructions for several important prototypical problems including Maximum Independent Set, Graph Coloring, and the Traveling Salesman problem, and show appealing resource cost estimates for their implementations

    Fault Tolerant Distributed Computing Framework for Scientific Algorithms

    Get PDF
    Arvuti riistvara füüsilised piirangud on lõpetanud protsessorite tuumade arvutusvõimsuse suurenemist, kuid arvutiarhitektuuride suurenev parallelsus säilitab Moore'i seaduse kehtivust. Samal ajal tõuseb arvutusvõimsuse nõudlus pidevalt, sundides inimesi kohandada algoritme paralleelsete arhitektuuride kasutamiseks. Üks paljudest paralleelsete arhitektuuride probleemidest on tõrkete tekkimise tõenäosuse suurenemine parallelsete komponentide arvu suurenemisega. Piinlikult paralleelsete ja andmemahukate algoritmidega seoses on MapReduce läbinud pika tee, et tagada kasutajatele suure hulga hajutatud arvutiressursside lihtsustatud kasutamine ilma töö kaotamise hirmuta. Sama ei sa öelda kommunikatsiooni intensiivsete algoritmide jaoks mis on levinud teadusarvutuse domeenis. Selles töös on pakutud uus BSP ({\it Bulk Synchronous Parallel}) inspireeritud parallelprogrammeerimise mudel, mille lähenemisviis on sarnane {\it continuation passing} programmeerimis stiiliga ja mis võimaldab rakendada BSP struktuuril baseeruvat loomulikku tõrkekindlust. Töös on kirjeldatud loodud hajusarvutuste raamistik NEWT, mis põhineb pakutud mudelil ja on kasutatud selle lähenemisviisi valideerimiseks. Raamistik säilitab enamik MapReduce eelisi ning efektiivsemalt toetab suuremat algoritmide hulka, nagu näiteks eelmainitud iteratiivsed algoritmid.The physical limitations of computing hardware have put a stop on the increase of a single processor core's computing power. However, Moore's law is still maintained through the ever increasing parallelism of the computing architectures. At the same time the demand for computational power has been unrelentingly growing, forcing people to adapt the algorithms they use to these parallel architectures. One of the many downsides to parallel architectures is that with the rise in the number of components, the chance of failure of one of these components increases. When it comes to embarrassingly parallel data-intensive algorithms, Map-Reduce has gone a long way in ensuring users can easily utilize large amounts of distributed computing resources without the fear of losing work. However, this does not apply to iterative communication-intensive algorithms common in the scientific computing domain. In this work a new BSP-inspired (Bulk Synchronous Parallel) programming model is proposed, which adopts an approach similar to continuation passing for implementing parallel algorithms and facilitates fault-tolerance inherent in the BSP program structure. The distributed computing framework NEWT, which is based on the proposed model, is described and used to validate the approach. The framework retains most of the advantages that Map-Reduce provides, yet efficiently supports a larger assortment of algorithms, such as the aforementioned iterative ones

    QR Factorization of Tall and Skinny Matrices in a Grid Computing Environment

    Get PDF
    Previous studies have reported that common dense linear algebra operations do not achieve speed up by using multiple geographical sites of a computational grid. Because such operations are the building blocks of most scientific applications, conventional supercomputers are still strongly predominant in high-performance computing and the use of grids for speeding up large-scale scientific problems is limited to applications exhibiting parallelism at a higher level. We have identified two performance bottlenecks in the distributed memory algorithms implemented in ScaLAPACK, a state-of-the-art dense linear algebra library. First, because ScaLAPACK assumes a homogeneous communication network, the implementations of ScaLAPACK algorithms lack locality in their communication pattern. Second, the number of messages sent in the ScaLAPACK algorithms is significantly greater than other algorithms that trade flops for communication. In this paper, we present a new approach for computing a QR factorization -- one of the main dense linear algebra kernels -- of tall and skinny matrices in a grid computing environment that overcomes these two bottlenecks. Our contribution is to articulate a recently proposed algorithm (Communication Avoiding QR) with a topology-aware middleware (QCG-OMPI) in order to confine intensive communications (ScaLAPACK calls) within the different geographical sites. An experimental study conducted on the Grid'5000 platform shows that the resulting performance increases linearly with the number of geographical sites on large-scale problems (and is in particular consistently higher than ScaLAPACK's).Comment: Accepted at IPDPS10. (IEEE International Parallel & Distributed Processing Symposium 2010 in Atlanta, GA, USA.

    Fast algorithms for computing the Boltzmann collision operator

    Full text link
    The development of accurate and fast numerical schemes for the five fold Boltzmann collision integral represents a challenging problem in scientific computing. For a particular class of interactions, including the so-called hard spheres model in dimension three, we are able to derive spectral methods that can be evaluated through fast algorithms. These algorithms are based on a suitable representation and approximation of the collision operator. Explicit expressions for the errors in the schemes are given and spectral accuracy is proved. Parallelization properties and adaptivity of the algorithms are also discussed.Comment: 22 page

    Maple+GrTensorII libraries for cosmology

    Get PDF
    The article mainly presents some results in using MAPLE platform for computer algebra and GrTensorII package in doing calculations for theoretical and numerical cosmologyComment: LaTeX LLNCS style, 8 pages, accepted for SYNASC 2004 - 6th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing, Timisoara, Romania, September 26-30 200
    • …
    corecore