3 research outputs found

    Singular value computations on the AP1000 array computer

    No full text
    The increasing popularity of singular value decomposition algorithms, used as a tool in many areas of science and engineering, demands a rapid development of their fast and reliable implementations. No longer are those implementations bounded to the single processor environment since more and more parallel computers are available on the market. This situation requires that often software need to be re-implemented on those new parallel architectures efficiently. In this thesis we show, on the example of a singular value decomposition algorithm, how this task of changing the working environments can be accomplished with non-trivial gains in performance. We show several optimisation techniques and their impact on the algorithm performance on all parallel memory hierarchy levels (register, cache, main memory and external processor memory levels). The central principle in all of the optimisations presented herein is to increase the number of columns (column-segments) being held in each level of the memory hierarchy and therefore increase the data reuse factors. In the optimisations for the parallel memory hierarchy the techniques used are, rectangular processor configuration, partitioning, and four-column rotation. The rectangular processor configuration technique is where the data were mapped onto a rectangular network of processors instead of a linear one. This technique improves the communication and cache performance such that on average, we reduced the execution time by a factor of 2 and, in the case of long column-segments, by a factor of 5. The partitioning technique involves rearranging data and the order of computations in the cells. This technique increases the cache hit ratio for large matrices. For the relatively modest improvements in the cache performance of 2 to 5%, we achieved a significant reduction in the execution times of 10 to 20%. The four-column rotation technique improves the performance by a better register reuse. For the cases of large number of columns stored per processor, this technique gave 2 to 10% improvement in execution time over the 'classic', two column rotation. Apart from the optimisations on the memory hierarchy levels, several floating point optimisations are presented on the algorithm itself which can be applied in any architecture. The main ideas behind those optimisations are the reduction of the number of floating point instructions executed in a unit of time and the balance of the floating point operations. This was accomplished by reshaping the relevant parts of the code to use the APlOOO processors architecture (SPARC) to its full potential. After combining all of the optimisations, we achieved a sustained 60% reduction of the execution time which corresponds to the 2.5 fold reduction. In the cases where long columns of the input matrix were used, we achieved nearly 5 fold reduction in execution time without adversely affecting the accuracy of the singular values and maintaining the quadratic convergence of the algorithm. The algorithm was implemented on the Fujitsu's APlOOO Array Multiprocessor, but all optimisations described can be easily applied to any MIMD architecture with a mesh or hypercube topology, and all but one can be applied to register-cache uniprocessors also. Despite many changes in the structure of the algorithm we found that the convergence was not adversely affected and the accuracy of the orthogonalisation was no worse than for the uniprocessor implementation of the noted SVD algorithm

    Algorithmic strategies for applicable real quantifier elimination

    Get PDF
    One of the most important algorithms for real quantifier elimination is the quantifier elimination by virtual substitution introduced by Weispfenning in 1988. In this thesis we present numerous algorithmic approaches for optimizing this quantifier elimination algorithm. Optimization goals are the actual running time of the implementation of the algorithm and the size of the output formula. Strategies for obtaining these goals include simplification of first-order formulas,reduction of the size of the computed elimination set, and condensing a new replacement for the virtual substitution. Local quantifier elimination computes formulas that are equivalent to the input formula only nearby a given point. We can make use of this restriction for further optimizing the quantifier elimination by virtual substitution. Finally we discuss how to solve a large class of scheduling problems by real quantifier elimination. To optimize our algorithm for solving scheduling problems we make use of the special form of the input formula and of additional information given by the description of the scheduling problemEines der bedeutendsten Verfahren zur reellen Quantorenelimination ist die Quantorenelimination mittels virtueller Substitution, die von Weispfenning 1988 eingeführt wurde. In der vorliegenden Arbeit werden zahlreiche algorithmische Strategien zur Optimierung dieses Verfahrens präsentiert. Optimierungsziele der Arbeit waren dabei die tatsächliche Laufzeit der Implementierung des Algorithmus sowie die Größe der Ausgabeformel. Zur Optimierung werden dabei die Simplifikation vonFormeln erster Stufe, die Reduktion der Größe der Eliminationsmenge sowie das Condensing, ein Ersatz für die virtuelle Substitution,untersucht. Lokale Quantorenelimination berechnet Formeln, die nur inder Nähe eines gegebenen Punktes äquivalent zur Eingabeformel ist. Diese Einschränkung erlaubt es, das Verfahren weiter zu verbessern.Als Anwendung des Eliminationsverfahren diskutieren wir abschließend, wie man eine große Klasse von Schedulingproblemen mittels reeller Quantorenelimination lösen kann. In diesem Fall benutzen wir die spezielle Struktur der Eingabeformel und zusätzliche Informationen über das Schedulingproblem, um die Quantorenelimination mittels virtueller Substitution problemspezifisch zu optimieren

    First International Conference on Ada (R) Programming Language Applications for the NASA Space Station, volume 2

    Get PDF
    Topics discussed include: reusability; mission critical issues; run time; expert systems; language issues; life cycle issues; software tools; and computers for Ada
    corecore