3 research outputs found
Singular value computations on the AP1000 array computer
The increasing popularity of singular value decomposition algorithms, used as a tool in
many areas of science and engineering, demands a rapid development of their fast and
reliable implementations. No longer are those implementations bounded to the single
processor environment since more and more parallel computers are available on the market.
This situation requires that often software need to be re-implemented on those new
parallel architectures efficiently.
In this thesis we show, on the example of a singular value decomposition algorithm,
how this task of changing the working environments can be accomplished with non-trivial
gains in performance. We show several optimisation techniques and their impact on
the algorithm performance on all parallel memory hierarchy levels (register, cache, main
memory and external processor memory levels). The central principle in all of the optimisations
presented herein is to increase the number of columns (column-segments) being
held in each level of the memory hierarchy and therefore increase the data reuse factors. In
the optimisations for the parallel memory hierarchy the techniques used are, rectangular
processor configuration, partitioning, and four-column rotation.
The rectangular processor configuration technique is where the data were mapped onto
a rectangular network of processors instead of a linear one. This technique improves the
communication and cache performance such that on average, we reduced the execution
time by a factor of 2 and, in the case of long column-segments, by a factor of 5.
The partitioning technique involves rearranging data and the order of computations
in the cells. This technique increases the cache hit ratio for large matrices. For the
relatively modest improvements in the cache performance of 2 to 5%, we achieved a
significant reduction in the execution times of 10 to 20%.
The four-column rotation technique improves the performance by a better register
reuse. For the cases of large number of columns stored per processor, this technique gave 2 to 10% improvement in execution time over the 'classic', two column rotation.
Apart from the optimisations on the memory hierarchy levels, several floating point
optimisations are presented on the algorithm itself which can be applied in any architecture.
The main ideas behind those optimisations are the reduction of the number of
floating point instructions executed in a unit of time and the balance of the floating point
operations. This was accomplished by reshaping the relevant parts of the code to use the
APlOOO processors architecture (SPARC) to its full potential.
After combining all of the optimisations, we achieved a sustained 60% reduction of
the execution time which corresponds to the 2.5 fold reduction. In the cases where long
columns of the input matrix were used, we achieved nearly 5 fold reduction in execution
time without adversely affecting the accuracy of the singular values and maintaining the
quadratic convergence of the algorithm.
The algorithm was implemented on the Fujitsu's APlOOO Array Multiprocessor, but
all optimisations described can be easily applied to any MIMD architecture with a mesh
or hypercube topology, and all but one can be applied to register-cache uniprocessors also.
Despite many changes in the structure of the algorithm we found that the convergence
was not adversely affected and the accuracy of the orthogonalisation was no worse than
for the uniprocessor implementation of the noted SVD algorithm
Algorithmic strategies for applicable real quantifier elimination
One of the most important algorithms for real quantifier elimination is the quantifier elimination by virtual substitution introduced by Weispfenning in 1988. In this thesis we present numerous algorithmic approaches for optimizing this quantifier elimination algorithm. Optimization goals are the actual running time of the implementation of the algorithm and the size of the output formula. Strategies for obtaining these goals include simplification of first-order formulas,reduction of the size of the computed elimination set, and condensing a new replacement for the virtual substitution. Local quantifier elimination computes formulas that are equivalent to the input formula only nearby a given point. We can make use of this restriction for further optimizing the quantifier elimination by virtual substitution. Finally we discuss how to solve a large class of scheduling problems by real quantifier elimination. To optimize our algorithm for solving scheduling problems we make use of the special form of the input formula and of additional information given by the description of the scheduling problemEines der bedeutendsten Verfahren zur reellen Quantorenelimination ist die Quantorenelimination mittels virtueller Substitution, die von Weispfenning 1988 eingeführt wurde. In der vorliegenden Arbeit werden zahlreiche algorithmische Strategien zur Optimierung dieses Verfahrens präsentiert. Optimierungsziele der Arbeit waren dabei die tatsächliche Laufzeit der Implementierung des Algorithmus sowie die Größe der Ausgabeformel. Zur Optimierung werden dabei die Simplifikation vonFormeln erster Stufe, die Reduktion der Größe der Eliminationsmenge sowie das Condensing, ein Ersatz für die virtuelle Substitution,untersucht. Lokale Quantorenelimination berechnet Formeln, die nur inder Nähe eines gegebenen Punktes äquivalent zur Eingabeformel ist. Diese Einschränkung erlaubt es, das Verfahren weiter zu verbessern.Als Anwendung des Eliminationsverfahren diskutieren wir abschließend, wie man eine große Klasse von Schedulingproblemen mittels reeller Quantorenelimination lösen kann. In diesem Fall benutzen wir die spezielle Struktur der Eingabeformel und zusätzliche Informationen über das Schedulingproblem, um die Quantorenelimination mittels virtueller Substitution problemspezifisch zu optimieren
First International Conference on Ada (R) Programming Language Applications for the NASA Space Station, volume 2
Topics discussed include: reusability; mission critical issues; run time; expert systems; language issues; life cycle issues; software tools; and computers for Ada