262,884 research outputs found

    Time-parallel iterative solvers for parabolic evolution equations

    Get PDF
    We present original time-parallel algorithms for the solution of the implicit Euler discretization of general linear parabolic evolution equations with time-dependent self-adjoint spatial operators. Motivated by the inf-sup theory of parabolic problems, we show that the standard nonsymmetric time-global system can be equivalently reformulated as an original symmetric saddle-point system that remains inf-sup stable with respect to the same natural parabolic norms. We then propose and analyse an efficient and readily implementable parallel-in-time preconditioner to be used with an inexact Uzawa method. The proposed preconditioner is non-intrusive and easy to implement in practice, and also features the key theoretical advantages of robust spectral bounds, leading to convergence rates that are independent of the number of time-steps, final time, or spatial mesh sizes, and also a theoretical parallel complexity that grows only logarithmically with respect to the number of time-steps. Numerical experiments with large-scale parallel computations show the effectiveness of the method, along with its good weak and strong scaling properties

    Parallel alogorithms for MIMD parallel computers

    Get PDF
    This thesis mainly covers the design and analysis of asynchronous parallel algorithms that can be run on MIMD (Multiple Instruction Multiple Data) parallel computers, in particular the NEPTUNE system at Loughborough University. Initially the fundamentals of parallel computer architectures are introduced with different parallel architectures being described and compared. The principles of parallel programming and the design of parallel algorithms are also outlined. Also the main characteristics of the 4 processor MIMD NEPTUNE system are presented, and performance indicators, i.e. the speed-up and the efficiency factors are defined for the measurement of parallelism in a given system. Both numerical and non-numerical algorithms are covered in the thesis. In the numerical solution of partial differential equations, a new parallel 9-point block iterative method is developed. Here, the organization of the blocks is done in such a way that each process contains its own group of 9 points on the network, therefore, they can be run in parallel. The parallel implementation of both 9-point and 4- point block iterative methods were programmed using natural and redblack ordering with synchronous and asynchronous approaches. The results obtained for these different implementations were compared and analysed. Next the parallel version of the A.G.E. (Alternating Group Explicit) method is developed in which the explicit nature of the difference equation is revealed and exploited when applied to derive the solution of both linear and non-linear 2-point boundary value problems. Two strategies have been used in the implementation of the parallel A.G.E. method using the synchronous and asynchronous approaches. The results from these implementations were compared. Also for comparison reasons the results obtained from the parallel A.G.E. were compared with the ~ corresponding results obtained from the parallel versions of the Jacobi, Gauss-Seidel and S.O.R. methods. Finally, a computational complexity analysis of the parallel A.G.E. algorithms is included. In the area of non-numeric algorithms, the problems of sorting and searching were studied. The sorting methods which were investigated was the shell and the digit sort methods. with each method different parallel strategies and approaches were used and compared to find the best results which can be obtained on the parallel machine. In the searching methods, the sequential search algorithm in an unordered table and the binary search algorithms were investigated and implemented in parallel with a presentation of the results. Finally, a complexity analysis of these methods is presented. The thesis concludes with a chapter summarizing the main results

    Towards parallelizable sampling-based Nonlinear Model Predictive Control

    Full text link
    This paper proposes a new sampling-based nonlinear model predictive control (MPC) algorithm, with a bound on complexity quadratic in the prediction horizon N and linear in the number of samples. The idea of the proposed algorithm is to use the sequence of predicted inputs from the previous time step as a warm start, and to iteratively update this sequence by changing its elements one by one, starting from the last predicted input and ending with the first predicted input. This strategy, which resembles the dynamic programming principle, allows for parallelization up to a certain level and yields a suboptimal nonlinear MPC algorithm with guaranteed recursive feasibility, stability and improved cost function at every iteration, which is suitable for real-time implementation. The complexity of the algorithm per each time step in the prediction horizon depends only on the horizon, the number of samples and parallel threads, and it is independent of the measured system state. Comparisons with the fmincon nonlinear optimization solver on benchmark examples indicate that as the simulation time progresses, the proposed algorithm converges rapidly to the "optimal" solution, even when using a small number of samples.Comment: 9 pages, 9 pictures, submitted to IFAC World Congress 201

    Design of Introspective Circuits for Analysis of Cell-Level Dis-orientation in Self-Assembled Cellular Systems

    Get PDF
    This paper discusses a novel approach to managing complexity in a large self-assembled system, by utilizing the self-assembling components themselves to address the complexity. A particular challenge is discussed – namely the question of how to deal with elements that are assembled in different orientations from each other – and a solution based on the idea ofintrospective circuitry is described. A methodology for using a set of cells to determine a nearby cell’s orientation is given, leading to a slow (O(n)) means of orienting a 2D region of cells. A modified algorithm is then describe to allow parallel analysis of/adaption to dis-oriented cells, thus allowing re-orientation of an entire 2D region of cells with better-than-linear time performance (O(sqrt(n))). The significance of this work is discussed not only in terms of managing arrays of dis-oriented cells but also more importantly as an example of the usefulness of local, distributed self-configuration to create and use introspective circuitry

    Parallel computation of echelon forms

    Get PDF
    International audienceWe propose efficient parallel algorithms and implementations on shared memory architectures of LU factorization over a finite field. Compared to the corresponding numerical routines, we have identified three main difficulties specific to linear algebra over finite fields. First, the arithmetic complexity could be dominated by modular reductions. Therefore, it is mandatory to delay as much as possible these reductions while mixing fine-grain parallelizations of tiled iterative and recursive algorithms. Second, fast linear algebra variants, e.g., using Strassen-Winograd algorithm, never suffer from instability and can thus be widely used in cascade with the classical algorithms. There, trade-offs are to be made between size of blocks well suited to those fast variants or to load and communication balancing. Third, many applications over finite fields require the rank profile of the matrix (quite often rank deficient) rather than the solution to a linear system. It is thus important to design parallel algorithms that preserve and compute this rank profile. Moreover, as the rank profile is only discovered during the algorithm, block size has then to be dynamic. We propose and compare several block decomposition: tile iterative with left-looking, right-looking and Crout variants, slab and tile recursive. Experiments demonstrate that the tile recursive variant performs better and matches the performance of reference numerical software when no rank deficiency occur. Furthermore, even in the most heterogeneous case, namely when all pivot blocks are rank deficient, we show that it is possbile to maintain a high efficiency
    • …
    corecore