16 research outputs found

    Application of Modern Fortran to Spacecraft Trajectory Design and Optimization

    Get PDF
    In this paper, applications of the modern Fortran programming language to the field of spacecraft trajectory optimization and design are examined. Modern object-oriented Fortran has many advantages for scientific programming, although many legacy Fortran aerospace codes have not been upgraded to use the newer standards (or have been rewritten in other languages perceived to be more modern). NASA's Copernicus spacecraft trajectory optimization program, originally a combination of Fortran 77 and Fortran 95, has attempted to keep up with modern standards and makes significant use of the new language features. Various algorithms and methods are presented from trajectory tools such as Copernicus, as well as modern Fortran open source libraries and other projects

    Extracting UML Class Diagrams from Object-Oriented Fortran: ForUML

    Get PDF
    Many scientists who implement computational science and engineering software have adopted the object-oriented (OO) Fortran paradigm. One of the challenges faced by OO Fortran developers is the inability to obtain high level software design descriptions of existing applications. Knowledge of the overall software design is not only valuable in the absence of documentation, it can also serve to assist developers with accomplishing different tasks during the software development process, especially maintenance and refactoring. The software engineering community commonly uses reverse engineering techniques to deal with this challenge. A number of reverse engineering-based tools have been proposed, but few of them can be applied to OO Fortran applications. In this paper, we propose a software tool to extract unified modeling language (UML) class diagrams from Fortran code. The UML class diagram facilitates the developers' ability to examine the entities and their relationships in the software system. The extracted diagrams enhance software maintenance and evolution. The experiments carried out to evaluate the proposed tool show its accuracy and a few of the limitations

    Proceedings of the 7th International Conference on PGAS Programming Models

    Get PDF

    Parallel computing 2011, ParCo 2011: book of abstracts

    Get PDF
    This book contains the abstracts of the presentations at the conference Parallel Computing 2011, 30 August - 2 September 2011, Ghent, Belgiu

    Using machine learning to improve dense and sparse matrix multiplication kernels

    Get PDF
    This work is comprised of two different projects in numerical linear algebra. The first project is about using machine learning to speed up dense matrix-matrix multiplication computations on a shared-memory computer architecture. We found that found basic loop-based matrix-matrix multiplication algorithms tied to a decision tree algorithm selector were competitive to using Intel\u27s Math Kernel Library for the same computation. The second project is a preliminary report about re-implementing an encoding format for spare matrix-vector multiplication called Compressed Spare eXtended (CSX). The goal for the second project is to use machine learning to aid in encoding matrix substructures in the CSX format without using exhaustive search and a Just-In-Time compiler

    Real-time tomographic reconstruction

    Get PDF
    With tomography it is possible to reconstruct the interior of an object without destroying. It is an important technique for many applications in, e.g., science, industry, and medicine. The runtime of conventional reconstruction algorithms is typically much longer than the time it takes to perform the tomographic experiment, and this prohibits the real-time reconstruction and visualization of the imaged object. The research in this dissertation introduces various techniques such as new parallelization schemes, data partitioning methods, and a quasi-3D reconstruction framework, that significantly reduce the time it takes to run conventional tomographic reconstruction algorithms without affecting image quality. The resulting methods and software implementations put reconstruction times in the same ballpark as the time it takes to do a tomographic scan, so that we can speak of real-time tomographic reconstruction.NWONumber theory, Algebra and Geometr

    Parallel Asynchronous Matrix Multiplication for a Distributed Pipelined Neural Network

    Get PDF
    Machine learning is an approach to devise algorithms that compute an output without a given rule set but based on a self-learning concept. This approach is of great importance for several fields of applications in science and industry where traditional programming methods are not sufficient. In neural networks, a popular subclass of machine learning algorithms, commonly previous experience is used to train the network and produce good outputs for newly introduced inputs. By increasing the size of the network more complex problems can be solved which again rely on a huge amount of training data. Increasing the complexity also leads to higher computational demand and storage requirements and to the need for parallelization. Several parallelization approaches of neural networks have already been considered. Most approaches use special purpose hardware whilst other work focuses on using standard hardware. Often these approaches target the problem by parallelizing the training data. In this work a new parallelization method named poadSGD is proposed for the parallelization of fully-connected, largescale feedforward networks on a compute cluster with standard hardware. poadSGD is based on the stochastic gradient descent algorithm. A block-wise distribution of the network's layers to groups of processes and a pipelining scheme for batches of the training samples are used. The network is updated asynchronously without interrupting ongoing computations of subsequent batches. For this task a one-sided communication scheme is used. A main algorithmic part of the batch-wise pipelined version consists of matrix multiplications which occur for a special distributed setup, where each matrix is held by a different process group. GASPI, a parallel programming model from the field of "Partitioned Global Address Spaces" (PGAS) models is introduced and compared to other models from this class. As it mainly relies on one-sided and asynchronous communication it is a perfect candidate for the asynchronous update task in the poadSGD algorithm. Therefore, the matrix multiplication is also implemented based GASPI. In order to efficiently handle upcoming synchronizations within the process groups and achieve a good workload distribution, a two-dimensional block-cyclic data distribution is applied for the matrices. Based on this distribution, the multiplication algorithm is computed by diagonally iterating over the sub blocks of the resulting matrix and computing the sub blocks in subgroups of the processes. The sub blocks are computed by sharing the workload between the process groups and communicating mostly in pairs or in subgroups. The communication in pairs is set up to be overlapped by other ongoing computations. The implementations provide a special challenge, since the asynchronous communication routines must be handled with care as to which processor is working at what point in time with which data in order to prevent an unintentional dual use of data. The theoretical analysis shows the matrix multiplication to be superior to a naive implementation when the dimension of the sub blocks of the matrices exceeds 382. The performance achieved in the test runs did not withstand the expectations the theoretical analysis predicted. The algorithm is executed on up to 512 cores and for matrices up to a size of 131,072 x 131,072. The implementation using the GASPI API was found not be straightforward but to provide a good potential for overlapping communication with computations whenever the data dependencies of an application allow for it. The matrix multiplication was successfully implemented and can be used within an implementation of the poadSGD method that is yet to come. The poadSGD method seems to be very promising, especially as nowadays, with the larger amount of data and the increased complexity of the applications, the approaches to parallelization of neural networks are increasingly of interest

    Das unstetige Galerkinverfahren für Strömungen mit freier Oberfläche und im Grundwasserbereich in geophysikalischen Anwendungen

    Get PDF
    Free surface flows and subsurface flows appear in a broad range of geophysical applications and in many environmental settings situations arise which even require the coupling of free surface and subsurface flows. Many of these application scenarios are characterized by large domain sizes and long simulation times. Hence, they need considerable amounts of computational work to achieve accurate solutions and the use of efficient algorithms and high performance computing resources to obtain results within a reasonable time frame is mandatory. Discontinuous Galerkin methods are a class of numerical methods for solving differential equations that share characteristics with methods from the finite volume and finite element frameworks. They feature high approximation orders, offer a large degree of flexibility, and are well-suited for parallel computing. This thesis consists of eight articles and an extended summary that describe the application of discontinuous Galerkin methods to mathematical models including free surface and subsurface flow scenarios with a strong focus on computational aspects. It covers discretization and implementation aspects, the parallelization of the method, and discrete stability analysis of the coupled model.Für viele geophysikalische Anwendungen spielen Strömungen mit freier Oberfläche und im Grundwasserbereich oder sogar die Kopplung dieser beiden eine zentrale Rolle. Oftmals charakteristisch für diese Anwendungsszenarien sind große Rechengebiete und lange Simulationszeiten. Folglich ist das Berechnen akkurater Lösungen mit beträchtlichem Rechenaufwand verbunden und der Einsatz effizienter Lösungsverfahren sowie von Techniken des Hochleistungsrechnens obligatorisch, um Ergebnisse innerhalb eines annehmbaren Zeitrahmens zu erhalten. Unstetige Galerkinverfahren stellen eine Gruppe numerischer Verfahren zum Lösen von Differentialgleichungen dar, und kombinieren Eigenschaften von Methoden der Finiten Volumen- und Finiten Elementeverfahren. Sie ermöglichen hohe Approximationsordnungen, bieten einen hohen Grad an Flexibilität und sind für paralleles Rechnen gut geeignet. Diese Dissertation besteht aus acht Artikeln und einer erweiterten Zusammenfassung, in diesen die Anwendung unstetiger Galerkinverfahren auf mathematische Modelle inklusive solcher für Strömungen mit freier Oberfläche und im Grundwasserbereich beschrieben wird. Die behandelten Themen umfassen Diskretisierungs- und Implementierungsaspekte, die Parallelisierung der Methode sowie eine diskrete Stabilitätsanalyse des gekoppelten Modells

    Supporting general data structures and execution models in runtime environments

    Get PDF
    Para aprovechar las plataformas paralelas, se necesitan herramientas de programación para poder representar apropiadamente los algoritmos paralelos. Además, los entornos paralelos requieren sistemas en tiempo de ejecución que ofrezcan diferentes paradigmas de computación. Existen diferentes áreas a estudiar con el fin de construir un sistema en tiempo de ejecución completo para un entorno paralelo. Esta Tesis aborda dos problemas comunes: el soporte unificado de datos densos y dispersos, y la integración de paralelismo orientado a mapeo de datos y paralelismo orientado a flujo de datos. Esta Tesis propone una solución que desacopla la representación, partición y reparto de datos, del algoritmo y de la estrategia de diseño paralelo para integrar manejo para datos densos y dispersos. Además, se presenta un nuevo modelo de programación basado en el paradigma de flujo de datos, donde diferentes actividades pueden ser arbitrariamente enlazadas para formar redes genéricas pero estructuradas que representan el cómputo globalDepartamento de Informática (Arquitectura y Tecnología de Computadores, Ciencias de la Computación e Inteligencia Artificial, Lenguajes y Sistemas Informáticos

    Cray X1 Evaluation Status Report

    Full text link
    corecore