58 research outputs found

    Solution of partial differential equations on vector and parallel computers

    Get PDF
    The present status of numerical methods for partial differential equations on vector and parallel computers was reviewed. The relevant aspects of these computers are discussed and a brief review of their development is included, with particular attention paid to those characteristics that influence algorithm selection. Both direct and iterative methods are given for elliptic equations as well as explicit and implicit methods for initial boundary value problems. The intent is to point out attractive methods as well as areas where this class of computer architecture cannot be fully utilized because of either hardware restrictions or the lack of adequate algorithms. Application areas utilizing these computers are briefly discussed

    Some fast algorithms in signal and image processing.

    Get PDF
    Kwok-po Ng.Thesis (Ph.D.)--Chinese University of Hong Kong, 1995.Includes bibliographical references (leaves 138-139).AbstractsSummaryIntroduction --- p.1Summary of the papers A-F --- p.2Paper A --- p.15Paper B --- p.36Paper C --- p.63Paper D --- p.87Paper E --- p.109Paper F --- p.12

    A preconditioned MINRES method for optimal control of wave equations and its asymptotic spectral distribution theory

    Full text link
    In this work, we propose a novel preconditioned Krylov subspace method for solving an optimal control problem of wave equations, after explicitly identifying the asymptotic spectral distribution of the involved sequence of linear coefficient matrices from the optimal control problem. Namely, we first show that the all-at-once system stemming from the wave control problem is associated to a structured coefficient matrix-sequence possessing an eigenvalue distribution. Then, based on such a spectral distribution of which the symbol is explicitly identified, we develop an ideal preconditioner and two parallel-in-time preconditioners for the saddle point system composed of two block Toeplitz matrices. For the ideal preconditioner, we show that the eigenvalues of the preconditioned matrix-sequence all belong to the set (32,12)(12,32)\left(-\frac{3}{2},-\frac{1}{2}\right)\bigcup \left(\frac{1}{2},\frac{3}{2}\right) well separated from zero, leading to mesh-independent convergence when the minimal residual method is employed. The proposed {parallel-in-time} preconditioners can be implemented efficiently using fast Fourier transforms or discrete sine transforms, and their effectiveness is theoretically shown in the sense that the eigenvalues of the preconditioned matrix-sequences are clustered around ±1\pm 1, which leads to rapid convergence. When these parallel-in-time preconditioners are not fast diagonalizable, we further propose modified versions which can be efficiently inverted. Several numerical examples are reported to verify our derived localization and spectral distribution result and to support the effectiveness of our proposed preconditioners and the related advantages with respect to the relevant literature

    A note on parallel preconditioning for the all-at-once solution of Riesz fractional diffusion equations

    Full text link
    The pp-step backwards difference formula (BDF) for solving the system of ODEs can result in a kind of all-at-once linear systems, which are solved via the parallel-in-time preconditioned Krylov subspace solvers (see McDonald, Pestana, and Wathen [SIAM J. Sci. Comput., 40(2) (2018): A1012-A1033] and Lin and Ng [arXiv:2002.01108, 17 pages]. However, these studies ignored that the pp-step BDF (p2p\geq 2) is not selfstarting, when they are exploited to solve time-dependent PDEs. In this note, we focus on the 2-step BDF which is often superior to the trapezoidal rule for solving the Riesz fractional diffusion equations, but its resultant all-at-once discretized system is a block triangular Toeplitz system with a low-rank perturbation. Meanwhile, we first give an estimation of the condition number of the all-at-once systems and then adapt the previous work to construct two block circulant (BC) preconditioners. Both the invertibility of these two BC preconditioners and the eigenvalue distributions of preconditioned matrices are discussed in details. The efficient implementation of these BC preconditioners is also presented especially for handling the computation of dense structured Jacobi matrices. Finally, numerical experiments involving both the one- and two-dimensional Riesz fractional diffusion equations are reported to support our theoretical findings.Comment: 18 pages. 2 figures. 6 Table. Tech. Rep.: Institute of Mathematics, Southwestern University of Finance and Economics. Revised-1: refine/shorten the contexts and correct some typos; Revised-2: correct some reference

    Parallel prefix operations on heterogeneous platforms

    Get PDF
    Programa Oficial de Doutoramento en Investigación en Tecnoloxías da Información. 524V01[Resumo] As tarxetas gráficas, coñecidas como GPUs, aportan grandes vantaxes no rendemento computacional e na eficiencia enerxética, sendo un piar clave para a computación de altas prestacións (HPC). Sen embargo, esta tecnoloxía tamén é custosa de programar, e ten certos problemas asociados á portabilidade entre as diferentes tarxetas. Por autra banda, os algoritmos de prefixo paralelo son un conxunto de algoritmos paralelos regulares e moi empregados nas ciencias compuacionais, cuxa eficiencia é esencial en moita."3 aplicacións. Neste eiclo, aínda que as GPUs poden acelerar a computación destes algoritmos, tamén poden ser unha limitación cando non explotan axeitadamente o paralelismo da arquitectura CPU. Esta Tese presenta dúas perspectivas. Dunha parte, deséñanse novos algoritmos de prefixo paralelo para calquera paradigma de programación paralela. Pola outra banda, tamén se propón unha metodoloxÍa xeral que implementa eficientemente algoritmos de prefixo paralelos, de xeito doado e portable, sobre arquitecturas GPU CUDA, mais que se centrar nun algoritmo particular ou nun modelo concreto de tarxeta. Para isto, a metodoloxía identifica os paramétros da GPU que inflúen no rendemento e, despois, seguindo unha serie de premisas teóricas, obtéñense os valores óptimos destes parámetros dependendo do algoritmo, do tamaño do problema e da arquitectura GPU empregada. Ademais, esta Tese tamén prové unha serie de fUllciólls GPU compostas de bloques de código CUDA modulares e reutilizables, o que permite a implementación de calquera algoritmo de xeito sinxelo. Segundo o tamaño do problema, propóñense tres aproximacións. As dúas primeiras resolven problemas pequenos, medios e grandes nunha única GPU) mentras que a terceira trata con tamaños extremad8.1nente grandes, usando varias GPUs. As nosas propostas proporcionan uns resultados moi competitivos a nivel de rendemento, mellorando as propostas existentes na bibliografía para as operacións probadas: a primitiva sean, ordenación e a resolución de sistemas tridiagonais.[Resumen] Las tarjetas gráficas (GPUs) han demostrado gmndes ventajas en el rendimiento computacional y en la eficiencia energética, siendo una tecnología clave para la computación de altas prestaciones (HPC). Sin embargo, esta tecnología también es costosa de progTamar, y tiene ciertos problemas asociados a la portabilidad de sus códigos entre diferentes generaciones de tarjetas. Por otra parte, los algoritmos de prefijo paralelo son un conjunto de algoritmos regulares y muy utilizados en las ciencias computacionales, cuya eficiencia es crucial en muchas aplicaciones. Aunque las GPUs puedan acelerar la computación de estos algoritmos, también pueden ser una limitación si no explotan correctamente el paralelismo de la arquitectura CPU. Esta Tesis presenta dos perspectivas. De un lado, se han diseñado nuevos algoritmos de prefijo paralelo que pueden ser implementados en cualquier paradigma de programación paralela. Por otra parte, se propone una metodología general que implementa eficientemente algoritmos de prefijo paralelo, de forma sencilla y portable, sobre cualquier arquitectura GPU CUDA, sin centrarse en un algoritmo particular o en un modelo de tarjeta. Para ello, la metodología identifica los parámetros GPU que influyen en el rendimiento y, siguiendo un conjunto de premisas teóricas, obtiene los valores óptimos para cada algoritmo, tamaño de problema y arquitectura. Además, las funciones GPU proporcionadas están compuestas de bloques de código CUDA reutilizable y modular, lo que permite la implementación de cualquier algoritmo de prefijo paralelo sencillamente. Dependiendo del tamaño del problema, se proponen tres aproximaciones. Las dos primeras resuelven tamaños pequeños, medios y grandes, utilizando para ello una única GPU i mientras que la tercera aproximación trata con tamaños extremadamente grandes, usando varias GPUs. Nuestras propuestas proporcionan resultados muy competitivos, mejorando el rendimiento de las propuestas existentes en la bibliografía para las operaciones probadas: la primitiva sean, ordenación y la resolución de sistemas tridiagonales.[Abstract] Craphics Processing Units (CPUs) have shown remarkable advantages in computing performance and energy efficiency, representing oue of the most promising trends fúr the near-fnture of high perfonnance computing. However, these devices also bring sorne programming complexities, and many efforts are required tú provide portability between different generations. Additionally, parallel prefix algorithms are a 8et of regular and highly-used parallel algorithms, whose efficiency is crutial in roany computer sCience applications. Although GPUs can accelerate the computation of such algorithms, they can also be a limitation when they do not match correctly to the CPU architecture or do not exploit the CPU parallelism properly. This dissertation presents two different perspectives. Gn the Oile hand, new parallel prefix algorithms have been algorithmicany designed for any paranel progrannning paradigm. On the other hand, a general tuning CPU methodology is proposed to provide an easy and portable mechanism tú efficiently implement paranel prefix algorithms on any CUDA CPU architecture, rather than focusing on a particular algorithm or a CPU mode!. To accomplish this goal, the methodology identifies the GPU parameters which influence on the performance and, following a set oí performance premises, obtains the cOllvillient values oí these parameters depending on the algorithm, the problem size and the CPU architecture. Additionally, the provided CPU functions are composed of modular and reusable CUDA blocks of code, which allow the easy implementation of any paranel prefix algorithm. Depending on the size of the dataset, three different approaches are proposed. The first two approaches solve small and medium-large datasets on a single GPU; whereas the third approach deals with extremely large datasets on a Multiple-CPU environment. OUT proposals provide very competitive performance, outperforming the stateof- the-art for many parallel prefix operatiOllS, such as the sean primitive, sorting and solving tridiagonal systems

    A multi-harmonic finite element method for the micro-Doppler effect, with an application to radar sensing

    Full text link
    A finite element method in the spectral domain is proposed for solving wave scattering problems with moving boundaries or, more generally, deforming domains. First, the original problem is rewritten as an equivalent weak formulation set in a fixed domain. Next, this formulation is approximated as a simpler weak form based on asymptotic expansions when the amplitude of the movements or the deformations is small. Fourier series expansions of some geometrical quantities under the assumption that the movement is periodic, and of the solution are next introduced to obtain a coupled multi-harmonic frequency domain formulation. Standard finite element methods can then be applied to solve the resulting problem and a block diagonal preconditioner is proposed to accelerate the Krylov subspace solution of the linear system for high frequency problems. The efficiency of the resulting method is demonstrated on a radar sensing application for the automotive industry
    corecore