58 research outputs found
Solution of partial differential equations on vector and parallel computers
The present status of numerical methods for partial differential equations on vector and parallel computers was reviewed. The relevant aspects of these computers are discussed and a brief review of their development is included, with particular attention paid to those characteristics that influence algorithm selection. Both direct and iterative methods are given for elliptic equations as well as explicit and implicit methods for initial boundary value problems. The intent is to point out attractive methods as well as areas where this class of computer architecture cannot be fully utilized because of either hardware restrictions or the lack of adequate algorithms. Application areas utilizing these computers are briefly discussed
Some fast algorithms in signal and image processing.
Kwok-po Ng.Thesis (Ph.D.)--Chinese University of Hong Kong, 1995.Includes bibliographical references (leaves 138-139).AbstractsSummaryIntroduction --- p.1Summary of the papers A-F --- p.2Paper A --- p.15Paper B --- p.36Paper C --- p.63Paper D --- p.87Paper E --- p.109Paper F --- p.12
A preconditioned MINRES method for optimal control of wave equations and its asymptotic spectral distribution theory
In this work, we propose a novel preconditioned Krylov subspace method for
solving an optimal control problem of wave equations, after explicitly
identifying the asymptotic spectral distribution of the involved sequence of
linear coefficient matrices from the optimal control problem. Namely, we first
show that the all-at-once system stemming from the wave control problem is
associated to a structured coefficient matrix-sequence possessing an eigenvalue
distribution. Then, based on such a spectral distribution of which the symbol
is explicitly identified, we develop an ideal preconditioner and two
parallel-in-time preconditioners for the saddle point system composed of two
block Toeplitz matrices. For the ideal preconditioner, we show that the
eigenvalues of the preconditioned matrix-sequence all belong to the set
well separated from zero, leading to
mesh-independent convergence when the minimal residual method is employed. The
proposed {parallel-in-time} preconditioners can be implemented efficiently
using fast Fourier transforms or discrete sine transforms, and their
effectiveness is theoretically shown in the sense that the eigenvalues of the
preconditioned matrix-sequences are clustered around , which leads to
rapid convergence. When these parallel-in-time preconditioners are not fast
diagonalizable, we further propose modified versions which can be efficiently
inverted. Several numerical examples are reported to verify our derived
localization and spectral distribution result and to support the effectiveness
of our proposed preconditioners and the related advantages with respect to the
relevant literature
A note on parallel preconditioning for the all-at-once solution of Riesz fractional diffusion equations
The -step backwards difference formula (BDF) for solving the system of
ODEs can result in a kind of all-at-once linear systems, which are solved via
the parallel-in-time preconditioned Krylov subspace solvers (see McDonald,
Pestana, and Wathen [SIAM J. Sci. Comput., 40(2) (2018): A1012-A1033] and Lin
and Ng [arXiv:2002.01108, 17 pages]. However, these studies ignored that the
-step BDF () is not selfstarting, when they are exploited to solve
time-dependent PDEs. In this note, we focus on the 2-step BDF which is often
superior to the trapezoidal rule for solving the Riesz fractional diffusion
equations, but its resultant all-at-once discretized system is a block
triangular Toeplitz system with a low-rank perturbation. Meanwhile, we first
give an estimation of the condition number of the all-at-once systems and then
adapt the previous work to construct two block circulant (BC) preconditioners.
Both the invertibility of these two BC preconditioners and the eigenvalue
distributions of preconditioned matrices are discussed in details. The
efficient implementation of these BC preconditioners is also presented
especially for handling the computation of dense structured Jacobi matrices.
Finally, numerical experiments involving both the one- and two-dimensional
Riesz fractional diffusion equations are reported to support our theoretical
findings.Comment: 18 pages. 2 figures. 6 Table. Tech. Rep.: Institute of Mathematics,
Southwestern University of Finance and Economics. Revised-1: refine/shorten
the contexts and correct some typos; Revised-2: correct some reference
Parallel prefix operations on heterogeneous platforms
Programa Oficial de Doutoramento en Investigación en Tecnoloxías da Información. 524V01[Resumo]
As tarxetas gráficas, coñecidas como GPUs, aportan grandes vantaxes no rendemento
computacional e na eficiencia enerxética, sendo un piar clave para a computación
de altas prestacións (HPC). Sen embargo, esta tecnoloxía tamén é custosa
de programar, e ten certos problemas asociados á portabilidade entre as diferentes
tarxetas. Por autra banda, os algoritmos de prefixo paralelo son un conxunto de
algoritmos paralelos regulares e moi empregados nas ciencias compuacionais, cuxa
eficiencia é esencial en moita."3 aplicacións. Neste eiclo, aínda que as GPUs poden
acelerar a computación destes algoritmos, tamén poden ser unha limitación cando
non explotan axeitadamente o paralelismo da arquitectura CPU.
Esta Tese presenta dúas perspectivas. Dunha parte, deséñanse novos algoritmos
de prefixo paralelo para calquera paradigma de programación paralela. Pola outra
banda, tamén se propón unha metodoloxÍa xeral que implementa eficientemente
algoritmos de prefixo paralelos, de xeito doado e portable, sobre arquitecturas GPU
CUDA, mais que se centrar nun algoritmo particular ou nun modelo concreto de
tarxeta. Para isto, a metodoloxía identifica os paramétros da GPU que inflúen no
rendemento e, despois, seguindo unha serie de premisas teóricas, obtéñense os valores
óptimos destes parámetros dependendo do algoritmo, do tamaño do problema e
da arquitectura GPU empregada. Ademais, esta Tese tamén prové unha serie de
fUllciólls GPU compostas de bloques de código CUDA modulares e reutilizables, o
que permite a implementación de calquera algoritmo de xeito sinxelo. Segundo o
tamaño do problema, propóñense tres aproximacións. As dúas primeiras resolven
problemas pequenos, medios e grandes nunha única GPU) mentras que a terceira
trata con tamaños extremad8.1nente grandes, usando varias GPUs.
As nosas propostas proporcionan uns resultados moi competitivos a nivel de
rendemento, mellorando as propostas existentes na bibliografía para as operacións
probadas: a primitiva sean, ordenación e a resolución de sistemas tridiagonais.[Resumen]
Las tarjetas gráficas (GPUs) han demostrado gmndes ventajas en el rendimiento
computacional y en la eficiencia energética, siendo una tecnología clave para la
computación de altas prestaciones (HPC). Sin embargo, esta tecnología también es
costosa de progTamar, y tiene ciertos problemas asociados a la portabilidad de sus
códigos entre diferentes generaciones de tarjetas. Por otra parte, los algoritmos de
prefijo paralelo son un conjunto de algoritmos regulares y muy utilizados en las
ciencias computacionales, cuya eficiencia es crucial en muchas aplicaciones. Aunque
las GPUs puedan acelerar la computación de estos algoritmos, también pueden ser
una limitación si no explotan correctamente el paralelismo de la arquitectura CPU.
Esta Tesis presenta dos perspectivas. De un lado, se han diseñado nuevos algoritmos
de prefijo paralelo que pueden ser implementados en cualquier paradigma de
programación paralela. Por otra parte, se propone una metodología general que implementa
eficientemente algoritmos de prefijo paralelo, de forma sencilla y portable,
sobre cualquier arquitectura GPU CUDA, sin centrarse en un algoritmo particular o
en un modelo de tarjeta. Para ello, la metodología identifica los parámetros GPU que
influyen en el rendimiento y, siguiendo un conjunto de premisas teóricas, obtiene los
valores óptimos para cada algoritmo, tamaño de problema y arquitectura. Además,
las funciones GPU proporcionadas están compuestas de bloques de código CUDA
reutilizable y modular, lo que permite la implementación de cualquier algoritmo de
prefijo paralelo sencillamente. Dependiendo del tamaño del problema, se proponen
tres aproximaciones. Las dos primeras resuelven tamaños pequeños, medios y grandes,
utilizando para ello una única GPU i mientras que la tercera aproximación trata
con tamaños extremadamente grandes, usando varias GPUs.
Nuestras propuestas proporcionan resultados muy competitivos, mejorando el
rendimiento de las propuestas existentes en la bibliografía para las operaciones probadas:
la primitiva sean, ordenación y la resolución de sistemas tridiagonales.[Abstract]
Craphics Processing Units (CPUs) have shown remarkable advantages in computing
performance and energy efficiency, representing oue of the most promising
trends fúr the near-fnture of high perfonnance computing. However, these devices
also bring sorne programming complexities, and many efforts are required tú provide
portability between different generations. Additionally, parallel prefix algorithms
are a 8et of regular and highly-used parallel algorithms, whose efficiency is crutial
in roany computer sCience applications. Although GPUs can accelerate the computation
of such algorithms, they can also be a limitation when they do not match
correctly to the CPU architecture or do not exploit the CPU parallelism properly.
This dissertation presents two different perspectives. Gn the Oile hand, new
parallel prefix algorithms have been algorithmicany designed for any paranel progrannning
paradigm. On the other hand, a general tuning CPU methodology is
proposed to provide an easy and portable mechanism tú efficiently implement paranel
prefix algorithms on any CUDA CPU architecture, rather than focusing on a
particular algorithm or a CPU mode!. To accomplish this goal, the methodology
identifies the GPU parameters which influence on the performance and, following a
set oí performance premises, obtains the cOllvillient values oí these parameters depending
on the algorithm, the problem size and the CPU architecture. Additionally,
the provided CPU functions are composed of modular and reusable CUDA blocks
of code, which allow the easy implementation of any paranel prefix algorithm. Depending
on the size of the dataset, three different approaches are proposed. The first
two approaches solve small and medium-large datasets on a single GPU; whereas the
third approach deals with extremely large datasets on a Multiple-CPU environment.
OUT proposals provide very competitive performance, outperforming the stateof-
the-art for many parallel prefix operatiOllS, such as the sean primitive, sorting and solving tridiagonal systems
A multi-harmonic finite element method for the micro-Doppler effect, with an application to radar sensing
A finite element method in the spectral domain is proposed for solving wave scattering problems with moving boundaries or, more generally, deforming domains. First, the original problem is rewritten as an equivalent weak formulation set in a fixed
domain. Next, this formulation is approximated as a simpler weak form based on asymptotic expansions when the amplitude of the movements or the deformations is small. Fourier series expansions of some geometrical quantities under the assumption that the movement is periodic, and of the solution are next introduced to obtain a coupled multi-harmonic frequency domain formulation. Standard finite element methods can then be applied to solve the resulting problem and a block diagonal preconditioner is proposed to accelerate
the Krylov subspace solution of the linear system for high frequency problems.
The efficiency of the resulting method is demonstrated on a radar sensing application for the automotive industry
- …