159 research outputs found

    Adaptive Parallelism for Coupled, Multithreaded Message-Passing Programs

    Get PDF
    Hybrid parallel programming models that combine message passing (MP) and shared- memory multithreading (MT) are becoming more popular, especially with applications requiring higher degrees of parallelism and scalability. Consequently, coupled parallel programs, those built via the integration of independently developed and optimized software libraries linked into a single application, increasingly comprise message-passing libraries with differing preferred degrees of threading, resulting in thread-level heterogeneity. Retroactively matching threading levels between independently developed and maintained libraries is difficult, and the challenge is exacerbated because contemporary middleware services provide only static scheduling policies over entire program executions, necessitating suboptimal, over-subscribed or under-subscribed, configurations. In coupled applications, a poorly configured component can lead to overall poor application performance, suboptimal resource utilization, and increased time-to-solution. So it is critical that each library executes in a manner consistent with its design and tuning for a particular system architecture and workload. Therefore, there is a need for techniques that address dynamic, conflicting configurations in coupled multithreaded message-passing (MT-MP) programs. Our thesis is that we can achieve significant performance improvements over static under-subscribed approaches through reconfigurable execution environments that consider compute phase parallelization strategies along with both hardware and software characteristics. In this work, we present new ways to structure, execute, and analyze coupled MT- MP programs. Our study begins with an examination of contemporary approaches used to accommodate thread-level heterogeneity in coupled MT-MP programs. Here we identify potential inefficiencies in how these programs are structured and executed in the high-performance computing domain. We then present and evaluate a novel approach for accommodating thread-level heterogeneity. Our approach enables full utilization of all available compute resources throughout an application’s execution by providing programmable facilities with modest overheads to dynamically reconfigure runtime environments for compute phases with differing threading factors and affinities. Our performance results show that for a majority of the tested scientific workloads our approach and corresponding open-source reference implementation render speedups greater than 50 % over the static under-subscribed baseline. Motivated by our examination of reconfigurable execution environments and their memory overhead, we also study the memory attribution problem: the inability to predict or evaluate during runtime where the available memory is used across the software stack comprising the application, reusable software libraries, and supporting runtime infrastructure. Specifically, dynamic adaptation requires runtime intervention, which by its nature introduces additional runtime and memory overhead. To better understand the latter, we propose and evaluate a new way to quantify component-level memory usage from unmodified binaries dynamically linked to a message-passing communication library. Our experimental results show that our approach and corresponding implementation accurately measure memory resource usage as a function of time, scale, communication workload, and software or hardware system architecture, clearly distinguishing between application and communication library usage at a per-process level

    A debugging engine for parallel and distributed programs

    Get PDF
    Dissertação apresentada para a obtenção do Grau de Doutor em Informática pela Universidade Nova de Lisboa, Faculdade de Ciências e Tecnologia.In the last decade a considerable amount of research work has focused on distributed debugging, one of the crucial fields in the parallel software development cycle. The productivity of the software development process strongly depends on the adequate definition of what debugging tools should be provided, and what debugging methodologies and functionalities should these tools support. The work described in this dissertation was initiated in 1995, in the context of two research projects, the SEPP (Software Engineering for Parallel Processing) and HPCTI (High-Performance Computing Tools for Industry), both sponsored by the European Union in the Copernicus programme, which aimed at the design and implementation of an integrated parallel software development environment. In the context of these projects, two independent toolsets have been developed, the GRADE and EDPEPPS parallel software development environments. Our contribution to these projects was in the debugging support. We have designed a debugging engine and developed a prototype, which was integrated the both toolsets (it was the only tool developed in the context of the SEPP and HPCTI projects which achieved such a result). Even after the closing of those research projects, further research work on distributed debugger has been carried on, which conducted to the re-design and re-implementation of the debugging engine. This dissertation describes the debugging engine according to its most up-to-date design and implementation stages. It also reposts some of the experimentalworkmade with both the initial and the current implementations, and how it contributed to validate the design and implementations of the debugging engine

    A generic framework for process execution and secure multi-party transaction authorization

    Get PDF
    Process execution engines are not only an integral part of workflow and business process management systems but are increasingly used to build process-driven applications. In other words, they are potentially used in all kinds of software across all application domains. However, contemporary process engines and workflow systems are unsuitable for use in such diverse application scenarios for several reasons. The main shortcomings can be observed in the areas of interoperability, versatility, and programmability. Therefore, this thesis makes a step away from domain specific, monolithic workflow engines towards generic and versatile process runtime frameworks, which enable integration of process technology into all kinds of software. To achieve this, the idea and corresponding architecture of a generic and embeddable process virtual machine (ePVM), which supports defining process flows along the theoretical foundation of communicating extended finite state machines, are presented. The architecture focuses on the core process functionality such as control flow and state management, monitoring, persistence, and communication, while using JavaScript as a process definition language. This approach leads to a very generic yet easily programmable process framework. A fully functional prototype implementation of the proposed framework is provided along with multiple example applications. Despite the fact that business processes are increasingly automated and controlled by information systems, humans are still involved, directly or indirectly, in many of them. Thus, for process flows involving sensitive transactions, a highly secure authorization scheme supporting asynchronous multi-party transaction authorization must be available within process management systems. Therefore, along with the ePVM framework, this thesis presents a novel approach for secure remote multi-party transaction authentication - the zone trusted information channel (ZTIC). The ZTIC approach uniquely combines multiple desirable properties such as the highest level of security, ease-of-use, mobility, remote administration, and smooth integration with existing infrastructures into one device and method. Extensively evaluating both, the ePVM framework and the ZTIC, this thesis shows that ePVM in combination with the ZTIC approach represents a unique and very powerful framework for building workflow systems and process-driven applications including support for secure multi-party transaction authorization

    An Autonomic Cross-Platform Operating Environment for On-Demand Internet Computing

    Get PDF
    The Internet has evolved into a global and ubiquitous communication medium interconnecting powerful application servers, diverse desktop computers and mobile notebooks. Along with recent developments in computer technology, such as the convergence of computing and communication devices, the way how people use computers and the Internet has changed people´s working habits and has led to new application scenarios. On the one hand, pervasive computing, ubiquitous computing and nomadic computing become more and more important since different computing devices like PDAs and notebooks may be used concurrently and alternately, e.g. while the user is on the move. On the other hand, the ubiquitous availability and pervasive interconnection of computing systems have fostered various trends towards the dynamic utilization and spontaneous collaboration of available remote computing resources, which are addressed by approaches like utility computing, grid computing, cloud computing and public computing. From a general point of view, the common objective of this development is the use of Internet applications on demand, i.e. applications that are not installed in advance by a platform administrator but are dynamically deployed and run as they are requested by the application user. The heterogeneous and unmanaged nature of the Internet represents a major challenge for the on demand use of custom Internet applications across heterogeneous hardware platforms, operating systems and network environments. Promising remedies are autonomic computing systems that are supposed to maintain themselves without particular user or application intervention. In this thesis, an Autonomic Cross-Platform Operating Environment (ACOE) is presented that supports On Demand Internet Computing (ODIC), such as dynamic application composition and ad hoc execution migration. The approach is based on an integration middleware called crossware that does not replace existing middleware but operates as a self-managing mediator between diverse application requirements and heterogeneous platform configurations. A Java implementation of the Crossware Development Kit (XDK) is presented, followed by the description of the On Demand Internet Computing System (ODIX). The feasibility of the approach is shown by the implementation of an Internet Application Workbench, an Internet Application Factory and an Internet Peer Federation. They illustrate the use of ODIX to support local, remote and distributed ODIC, respectively. Finally, the suitability of the approach is discussed with respect to the support of ODIC

    Wind-US Users Guide Version 3.0

    Get PDF
    Wind-US is a computational platform which may be used to numerically solve various sets of equations governing physical phenomena. Currently, the code supports the solution of the Euler and Navier-Stokes equations of fluid mechanics, along with supporting equation sets governing turbulent and chemically reacting flows. Wind-US is a product of the NPARC Alliance, a partnership between the NASA Glenn Research Center (GRC) and the Arnold Engineering Development Complex (AEDC) dedicated to the establishment of a national, applications-oriented flow simulation capability. The Boeing Company has also been closely associated with the Alliance since its inception, and represents the interests of the NPARC User's Association. The "Wind-US User's Guide" describes the operation and use of Wind-US, including: a basic tutorial; the physical and numerical models that are used; the boundary conditions; monitoring convergence; the files that are read and/or written; parallel execution; and a complete list of input keywords and test options. For current information about Wind-US and the NPARC Alliance, please see the Wind-US home page at http://www.grc.nasa.gov/WWW/winddocs/ and the NPARC Alliance home page at http://www.grc.nasa.gov/WWW/wind/. This manual describes the operation and use of Wind-US, a computational platform which may be used to numerically solve various sets of equations governing physical phenomena. Wind-US represents a merger of the capabilities of four CFD codes - NASTD (a structured grid flow solver developed at McDonnell Douglas, now part of Boeing), NPARC (the original NPARC Alliance structured grid flow solver), NXAIR (an AEDC structured grid code used primarily for store separation analysis), and ICAT (an unstructured grid flow solver developed at the Rockwell Science Center and Boeing)

    Advanced Concepts for Automatic Differentiation based on Operator Overloading

    Get PDF
    Mit Hilfe der Technik des Automatischen Differenzierens (AD) lassen sich für Funktionen, die als Programmquellcode gegeben sind, Ableitungsinformationen rechentechnisch effizient und mit geringem Aufwand für den Nutzer bereitstellen. Eine Variante der Implementierung von AD basiert auf der Überladung von Operatoren und Funktionen, die von vielen modernen Programmiersprachen ermöglicht wird. Durch Ausnutzung des Konzepts der Überladung wird eine interne Funktions-Repräsentation (Tape) generiert, die anschließend für die Ableitungsberechnung herangezogen wird. In der Dissertation werden neue Techniken erarbeitet, die eine effizientere Tape-Erstellung und die parallele Tape-Auswertung ermöglichen. Anhand von Laufzeituntersuchungen für numerische Beispiele werden die Möglichkeiten der neuen Techniken verdeutlicht.Using the technique of Automatic Differentiation (AD), derivative information can be computed efficiently for any function that is given as source code in a supported programming languages. One basic implementation strategy is based on the concept of operator overloading that is available for many programming languages. Due the overloading of operators, an internal representation of the function can be generated at runtime. This so-called tape can then be used for computing derivatives. In the thesis, new techniques are introduced that allow a more efficient tape creation and the parallel evaluation of tapes. Advantages of the new techniques are demonstrated by means of runtime analyses for numerical examples

    HPCCP/CAS Workshop Proceedings 1998

    Get PDF
    This publication is a collection of extended abstracts of presentations given at the HPCCP/CAS (High Performance Computing and Communications Program/Computational Aerosciences Project) Workshop held on August 24-26, 1998, at NASA Ames Research Center, Moffett Field, California. The objective of the Workshop was to bring together the aerospace high performance computing community, consisting of airframe and propulsion companies, independent software vendors, university researchers, and government scientists and engineers. The Workshop was sponsored by the HPCCP Office at NASA Ames Research Center. The Workshop consisted of over 40 presentations, including an overview of NASA's High Performance Computing and Communications Program and the Computational Aerosciences Project; ten sessions of papers representative of the high performance computing research conducted within the Program by the aerospace industry, academia, NASA, and other government laboratories; two panel sessions; and a special presentation by Mr. James Bailey

    Source-to-Source Automatic Differentiation of OpenMP Parallel Loops

    Get PDF
    International audienceThis paper presents our work toward correct and efficient automatic differentiation of OpenMP parallel worksharing loops in forward and reverse mode. Automatic differentiation is a method to obtain gradients of numerical programs, which are crucial in optimization, uncertainty quantification, and machine learning. The computational cost to compute gradients is a common bottleneck in practice. For applications that are parallelized for multicore CPUs or GPUs using OpenMP, one also wishes to compute the gradients in parallel. We propose a framework to reason about the correctness of the generated derivative code, from which we justify our OpenMP extension to the differentiation model. We implement this model in the automatic differentiation tool Tapenade and present test cases that are differentiated following our extended differentiation procedure. Performance of the generated derivative programs in forward and reverse mode is better than sequential, although our reverse mode often scales worse than the input programs

    Source-to-Source Automatic Differentiation of OpenMP Parallel Loops

    Get PDF
    International audienceThis paper presents our work toward correct and efficient automatic differentiation of OpenMP parallel worksharing loops in forward and reverse mode. Automatic differentiation is a method to obtain gradients of numerical programs, which are crucial in optimization, uncertainty quantification, and machine learning. The computational cost to compute gradients is a common bottleneck in practice. For applications that are parallelized for multicore CPUs or GPUs using OpenMP, one also wishes to compute the gradients in parallel. We propose a framework to reason about the correctness of the generated derivative code, from which we justify our OpenMP extension to the differentiation model. We implement this model in the automatic differentiation tool Tapenade and present test cases that are differentiated following our extended differentiation procedure. Performance of the generated derivative programs in forward and reverse mode is better than sequential, although our reverse mode often scales worse than the input programs
    • …
    corecore