15 research outputs found
The symmetric-Toeplitz linear system problem in parallel
[EN] Many algorithms exist that exploit the special structure of
Toeplitz matrices for solving linear systems. Nevertheless, these algorithms
are difficult to parallelize due to its lower computational cost and
the great dependency of the operations involved that produces a great
communication cost. The foundation of the parallel algorithm presented
in this paper consists of transforming the Toeplitz matrix into a another
structured matrix called Cauchy¿like. The particular properties of
Cauchy¿like matrices are exploited in order to obtain two levels of parallelism
that makes possible to highly reduce the execution time. The
experimental results were obtained in a cluster of PC¿s.Supported by Spanish MCYT and FEDER under Grant TIC 2003-08238-C02-02Alonso-Jordá, P.; Vidal Maciá, AM. (2005). The symmetric-Toeplitz linear system problem in parallel. Computational Science -- ICCS 2005,Pt 1, Proceedings. 3514:220-228. https://doi.org/10.1007/11428831_28S2202283514Sweet, D.R.: The use of linear-time systolic algorithms for the solution of toeplitz problems. k Technical Report JCU-CS-91/1, Department of Computer Science, James Cook University, Tue, 23 April 1996 15, 17, 55 GMT (1991)Evans, D.J., Oka, G.: Parallel solution of symmetric positive definite Toeplitz systems. Parallel Algorithms and Applications 12, 297–303 (1998)Gohberg, I., Koltracht, I., Averbuch, A., Shoham, B.: Timing analysis of a parallel algorithm for Toeplitz matrices on a MIMD parallel machine. Parallel Computing 17, 563–577 (1991)Gallivan, K., Thirumalai, S., Dooren, P.V.: On solving block toeplitz systems using a block schur algorithm. In: Proceedings of the 23rd International Conference on Parallel Processing, Boca Raton, FL, USA, vol. 3, pp. 274–281. CRC Press, Boca Raton (1994)Thirumalai, S.: High performance algorithms to solve Toeplitz and block Toeplitz systems. Ph.d. th., Grad. College of the U. of Illinois at Urbana–Champaign (1996)Alonso, P., BadÃa, J.M., Vidal, A.M.: Parallel algorithms for the solution of toeplitz systems of linear equations. In: Wyrzykowski, R., Dongarra, J., Paprzycki, M., WaÅ›niewski, J. (eds.) PPAM 2004. LNCS, vol. 3019, pp. 969–976. Springer, Heidelberg (2004)Anderson, E., et al.: LAPACK Users’ Guide. SIAM, Philadelphia (1995)Blackford, L., et al.: ScaLAPACK Users’ Guide. SIAM, Philadelphia (1997)Alonso, P., BadÃa, J.M., González, A., Vidal, A.M.: Parallel design of multichannel inverse filters for audio reproduction. In: Parallel and Distributed Computing and Systems, IASTED, Marina del Rey, CA, USA, vol. II, pp. 719–724 (2003)Loan, C.V.: Computational Frameworks for the Fast Fourier Transform. SIAM Press, Philadelphia (1992)Heinig, G.: Inversion of generalized Cauchy matrices and other classes of structured matrices. Linear Algebra and Signal Proc., IMA, Math. Appl. 69, 95–114 (1994)Gohberg, I., Kailath, T., Olshevsky, V.: Fast Gaussian elimination with partial pivoting for matrices with displacement structure. Mathematics of Computation 64, 1557–1576 (1995)Alonso, P., Vidal, A.M.: An efficient and stable parallel solution for symmetric toeplitz linear systems. TR DSIC-II/2005, DSIC–Univ. Polit. Valencia (2005)Kailath, T., Sayed, A.H.: Displacement structure: Theory and applications. SIAM Review 37, 297–386 (1995
A bibliography on parallel and vector numerical algorithms
This is a bibliography of numerical methods. It also includes a number of other references on machine architecture, programming language, and other topics of interest to scientific computing. Certain conference proceedings and anthologies which have been published in book form are listed also
Solution of partial differential equations on vector and parallel computers
The present status of numerical methods for partial differential equations on vector and parallel computers was reviewed. The relevant aspects of these computers are discussed and a brief review of their development is included, with particular attention paid to those characteristics that influence algorithm selection. Both direct and iterative methods are given for elliptic equations as well as explicit and implicit methods for initial boundary value problems. The intent is to point out attractive methods as well as areas where this class of computer architecture cannot be fully utilized because of either hardware restrictions or the lack of adequate algorithms. Application areas utilizing these computers are briefly discussed
Parallel computing for image processing problems.
by Kin-wai Mak.Thesis (M.Phil.)--Chinese University of Hong Kong, 1997.Includes bibliographical references (leaves 52-54).Chapter 1 --- Introduction to Parallel Computing --- p.7Chapter 1.1 --- Parallel Computer Models --- p.8Chapter 1.2 --- Forms of Parallelism --- p.12Chapter 1.3 --- Performance Evaluation --- p.15Chapter 1.3.1 --- Finding Machine Parameters --- p.15Chapter 1.3.2 --- Amdahl's Law --- p.19Chapter 1.3.3 --- Gustafson's Law --- p.20Chapter 1.3.4 --- Scalability Analysis --- p.20Chapter 2 --- Introduction to Image Processing --- p.26Chapter 2.1 --- Image Restoration Problem --- p.26Chapter 2.1.1 --- Toeplitz Least Squares Problems --- p.29Chapter 2.1.2 --- The Need For Regularization --- p.31Chapter 2.1.3 --- Guide Star Image --- p.32Chapter 3 --- Toeplitz Solvers --- p.34Chapter 3.1 --- Introduction --- p.34Chapter 3.2 --- Parallel Implementation --- p.38Chapter 3.2.1 --- Overview of MasPar --- p.38Chapter 3.2.2 --- Design Methodology --- p.39Chapter 3.2.3 --- Implementation Details --- p.42Chapter 3.2.4 --- Application to Ground Based Astronomy --- p.44Chapter 3.2.5 --- Performance Analysis --- p.46Chapter 3.2.6 --- The Graphical Interface --- p.48Bibliograph
Mapping Signal Processing Algorithms on Parallel Arcidtectures
Electrical Engineerin
Performance of a parallel code for the Euler equations on hypercube computers
The performance of hypercubes were evaluated on a computational fluid dynamics problem and the parallel environment issues were considered that must be addressed, such as algorithm changes, implementation choices, programming effort, and programming environment. The evaluation focuses on a widely used fluid dynamics code, FLO52, which solves the two dimensional steady Euler equations describing flow around the airfoil. The code development experience is described, including interacting with the operating system, utilizing the message-passing communication system, and code modifications necessary to increase parallel efficiency. Results from two hypercube parallel computers (a 16-node iPSC/2, and a 512-node NCUBE/ten) are discussed and compared. In addition, a mathematical model of the execution time was developed as a function of several machine and algorithm parameters. This model accurately predicts the actual run times obtained and is used to explore the performance of the code in interesting but yet physically realizable regions of the parameter space. Based on this model, predictions about future hypercubes are made
Research summary, January 1989 - June 1990
The Research Institute for Advanced Computer Science (RIACS) was established at NASA ARC in June of 1983. RIACS is privately operated by the Universities Space Research Association (USRA), a consortium of 62 universities with graduate programs in the aerospace sciences, under a Cooperative Agreement with NASA. RIACS serves as the representative of the USRA universities at ARC. This document reports our activities and accomplishments for the period 1 Jan. 1989 - 30 Jun. 1990. The following topics are covered: learning systems, networked systems, and parallel systems
Parallel computation techniques for virtual acoustics and physical modelling synthesis
The numerical simulation of large-scale virtual acoustics and physical modelling
synthesis is a computationally expensive process. Time stepping methods, such as
finite difference time domain, can be used to simulate wave behaviour in models of
three-dimensional room acoustics and virtual instruments. In the absence of any form
of simplifying assumptions, and at high audio sample rates, this can lead to simulations
that require many hours of computation on a standard Central Processing Unit
(CPU). In recent years the video game industry has driven the development of Graphics
Processing Units (GPUs) that are now capable of multi-teraflop performance using
highly parallel architectures. Whilst these devices are primarily designed for graphics
calculations, they can also be used for general purpose computing. This thesis explores
the use of such hardware to accelerate simulations of three-dimensional acoustic wave
propagation, and embedded systems that create physical models for the synthesis of
sound.
Test case simulations of virtual acoustics are used to compare the performance of
workstation CPUs to that of Nvidia’s Tesla GPU hardware. Using representative multicore
CPU benchmarks, such simulations can be accelerated in the order of 5X for
single precision and 3X for double precision floating-point arithmetic. Optimisation
strategies are examined for maximising GPU performance when using single devices,
as well as for multiple device codes that can compute simulations using billions of grid
points. This allows the simulation of room models of several thousand cubic metres
at audio rates such as 44.1kHz, all within a useable time scale. The performance of
alternative finite difference schemes is explored, as well as strategies for the efficient
implementation of boundary conditions.
Creating physical models of acoustic instruments requires embedded systems that
often rely on sparse linear algebra operations. The performance efficiency of various
sparse matrix storage formats is detailed in terms of the fundamental operations that
are required to compute complex models, with an optimised storage system achieving
substantial performance gains over more generalised formats. An integrated instrument
model of the timpani drum is used to demonstrate the performance gains that are
possible using the optimisation strategies developed through this thesis