3 research outputs found
Application of constrained optimisation techniques in electrical impedance tomography
A Constrained Optimisation technique is described for the reconstruction of temporal resistivity images. The approach solves the Inverse problem by optimising a cost function under constraints, in the form of normalised boundary potentials.
Mathematical models have been developed for two different data collection methods for the chosen criterion. Both of these models express the reconstructed image in terms of one dimensional (I-D) Lagrange multiplier functions. The reconstruction problem becomes one of estimating these 1-D functions from the
normalised boundary potentials. These models are based on a cost criterion of the minimisation of the variance between the reconstructed resistivity distribution and the true resistivity distribution.
The methods presented In this research extend the algorithms previously developed for X-ray systems. Computational efficiency is enhanced by exploiting the structure of the associated system matrices. The structure of the system matrices was preserved in the Electrical Impedance Tomography (EIT) implementations by applying a weighting due to non-linear current distribution during the backprojection of the Lagrange multiplier functions.
In order to obtain the best possible reconstruction it is important to consider the effects of noise in the boundary data. This is achieved by using a fast algorithm which matches the statistics of the error in the approximate inverse of the associated system matrix with the statistics of the noise error in the boundary data. This yields the optimum solution with the available boundary data. Novel approaches have been developed to produce the Lagrange multiplier functions.
Two alternative methods are given for the design of VLSI implementations of hardware accelerators to improve computational efficiencies. These accelerators are designed to implement parallel geometries and are modelled using a verification
description language to assess their performance capabilities
Singular value computations on the AP1000 array computer
The increasing popularity of singular value decomposition algorithms, used as a tool in
many areas of science and engineering, demands a rapid development of their fast and
reliable implementations. No longer are those implementations bounded to the single
processor environment since more and more parallel computers are available on the market.
This situation requires that often software need to be re-implemented on those new
parallel architectures efficiently.
In this thesis we show, on the example of a singular value decomposition algorithm,
how this task of changing the working environments can be accomplished with non-trivial
gains in performance. We show several optimisation techniques and their impact on
the algorithm performance on all parallel memory hierarchy levels (register, cache, main
memory and external processor memory levels). The central principle in all of the optimisations
presented herein is to increase the number of columns (column-segments) being
held in each level of the memory hierarchy and therefore increase the data reuse factors. In
the optimisations for the parallel memory hierarchy the techniques used are, rectangular
processor configuration, partitioning, and four-column rotation.
The rectangular processor configuration technique is where the data were mapped onto
a rectangular network of processors instead of a linear one. This technique improves the
communication and cache performance such that on average, we reduced the execution
time by a factor of 2 and, in the case of long column-segments, by a factor of 5.
The partitioning technique involves rearranging data and the order of computations
in the cells. This technique increases the cache hit ratio for large matrices. For the
relatively modest improvements in the cache performance of 2 to 5%, we achieved a
significant reduction in the execution times of 10 to 20%.
The four-column rotation technique improves the performance by a better register
reuse. For the cases of large number of columns stored per processor, this technique gave 2 to 10% improvement in execution time over the 'classic', two column rotation.
Apart from the optimisations on the memory hierarchy levels, several floating point
optimisations are presented on the algorithm itself which can be applied in any architecture.
The main ideas behind those optimisations are the reduction of the number of
floating point instructions executed in a unit of time and the balance of the floating point
operations. This was accomplished by reshaping the relevant parts of the code to use the
APlOOO processors architecture (SPARC) to its full potential.
After combining all of the optimisations, we achieved a sustained 60% reduction of
the execution time which corresponds to the 2.5 fold reduction. In the cases where long
columns of the input matrix were used, we achieved nearly 5 fold reduction in execution
time without adversely affecting the accuracy of the singular values and maintaining the
quadratic convergence of the algorithm.
The algorithm was implemented on the Fujitsu's APlOOO Array Multiprocessor, but
all optimisations described can be easily applied to any MIMD architecture with a mesh
or hypercube topology, and all but one can be applied to register-cache uniprocessors also.
Despite many changes in the structure of the algorithm we found that the convergence
was not adversely affected and the accuracy of the orthogonalisation was no worse than
for the uniprocessor implementation of the noted SVD algorithm
The application of parallel computer technology to the dynamic analysis of suspension bridges
This research is concerned with the application of distributed computer technology to the solution of non-linear structural dynamic problems, in particular the onset of aerodynamic instabilities in long span suspension bridge structures, such as flutter which is a catastrophic aeroelastic phenomena.
The thesis is set out in two distinct parts:-
Part I, presents the theoretical background of the main forms of aerodynamic instabilities, presenting in detail the main solution techniques used to solve the flutter problem. The previously written analysis package ANSUSP is presented which has been specifically developed to predict numerically the onset of flutter instability. The various solution techniques which were employed to predict the onset of flutter for the Severn Bridge are discussed. All the results presented in Part I were obtained using a 486DX2 66MHz serial personal computer.
Part II, examines the main solution techniques in detail and goes on to apply them to a large distributed supercomputer, which allows the solution of the problem to be achieved considerably faster than is possible using the serial computer system. The solutions presented in Part II are represented as Performance Indices (PI) which quote the ratio of time to performing a specific calculation using a serial algorithm compared to a parallel algorithm running on the same computer system