13 research outputs found

    Automating Topology Aware Mapping for Supercomputers

    Get PDF
    Petascale machines with hundreds of thousands of cores are being built. These machines have varying interconnect topologies and large network diameters. Computation is cheap and communication on the network is becoming the bottleneck for scaling of parallel applications. Network contention, specifically, is becoming an increasingly important factor affecting overall performance. The broad goal of this dissertation is performance optimization of parallel applications through reduction of network contention. Most parallel applications have a certain communication topology. Mapping of tasks in a parallel application based on their communication graph, to the physical processors on a machine can potentially lead to performance improvements. Mapping of the communication graph for an application on to the interconnect topology of a machine while trying to localize communication is the research problem under consideration. The farther different messages travel on the network, greater is the chance of resource sharing between messages. This can create contention on the network for networks commonly used today. Evaluative studies in this dissertation show that on IBM Blue Gene and Cray XT machines, message latencies can be severely affected under contention. Realizing this fact, application developers have started paying attention to the mapping of tasks to physical processors to minimize contention. Placement of communicating tasks on nearby physical processors can minimize the distance traveled by messages and reduce the chances of contention. Performance improvements through topology aware placement for applications such as NAMD and OpenAtom are used to motivate this work. Building on these ideas, the dissertation proposes algorithms and techniques for automatic mapping of parallel applications to relieve the application developers of this burden. The effect of contention on message latencies is studied in depth to guide the design of mapping algorithms. The hop-bytes metric is proposed for the evaluation of mapping algorithms as a better metric than the previously used maximum dilation metric. The main focus of this dissertation is on developing topology aware mapping algorithms for parallel applications with regular and irregular communication patterns. The automatic mapping framework is a suite of such algorithms with capabilities to choose the best mapping for a problem with a given communication graph. The dissertation also briefly discusses completely distributed mapping techniques which will be imperative for machines of the future.published or submitted for publicationnot peer reviewe

    Atomic detail visualization of photosynthetic membranes with GPU-accelerated ray tracing

    Get PDF
    The cellular process responsible for providing energy for most life on Earth, namely, photosynthetic light-harvesting, requires the cooperation of hundreds of proteins across an organelle, involving length and time scales spanning several orders of magnitude over quantum and classical regimes. Simulation and visualization of this fundamental energy conversion process pose many unique methodological and computational challenges. We present, in two accompanying movies, light-harvesting in the photosynthetic apparatus found in purple bacteria, the so-called chromatophore. The movies are the culmination of three decades of modeling efforts, featuring the collaboration of theoretical, experimental, and computational scientists. We describe the techniques that were used to build, simulate, analyze, and visualize the structures shown in the movies, and we highlight cases where scientific needs spurred the development of new parallel algorithms that efficiently harness GPU accelerators and petascale computers

    Optimization of communication intensive applications on HPC networks

    Get PDF
    Communication is a necessary but overhead inducing component of parallel programming. Its impact on application design and performance is due to several related aspects of a parallel job execution: network topology, routing protocol, suitability of algorithm being used to the network, job placement, etc. This thesis is aimed at developing an understanding of how communication plays out on networks of high performance computing systems and exploring methods that can be used to improve communication performance of large scale applications. Broadly speaking, three topics have been studied in detail in this thesis. The first of these topics is task mapping and job placement on practical installations of torus and dragonfly networks. Next, use of supervised learning algorithms for conducting diagnostic studies of how communication evolves on networks is explored. Finally, efficacy of packet-level simulations for prediction-based studies of communication performance on different networks using different network parameters is analyzed. The primary contribution of this thesis is development of scalable diagnostic and prediction methods that can assist in the process of network designing, adapting applications to future systems, and optimizing execution of applications on existing systems. These meth- ods include a supervised learning approach, a functional modeling tool (called Damselfly), and a PDES-based packet level simulator (called TraceR), all of which are described in this thesis

    PICS - a Performance-analysis-based Introspective Control System to steer parallel applications

    Get PDF
    Parallel programming has always been difficult due to the complexity of hardware and the diversity of applications. Although significant progress has been achieved over the years, attaining high parallel efficiency on large supercomputers for various applications is still quite challenging. As we go beyond the current scale of computers to those with peak capacities of an ExaFLOP/s, it is clear that an introspective and adaptive runtime system (RTS) will be critical to reduce programmers' tuning efforts by automatically handling the complexities of applications and machines. This is the motivation for my research on a Performance-analysis-based Introspective Control System - PICS. PICS intelligently steers parallel applications and runtime system configurations to achieve desired goals by utilizing expert knowledge to analyze performance data and adaptively reconfiguring applications. This thesis designs a holistic introspective control system for automatic performance tuning that combines the real-time performance analysis and performance steering to effectively automate the optimization. A few techniques are explored to make the parallel runtime system and applications more adaptive and controllable. Control points are defined for applications to interact with PICS. Decision tree based automatic performance analysis is implemented to significantly reduce the search space of multiple configurations. Parallel evaluation and sampling techniques are exploited to reduce the overhead of the system and to improve its scalability. In addition, the result of automatic performance analysis can be visualized to help developers manually tune their applications. The utility of PICS is demonstrated with several benchmarks and real- world applications

    Software for Exascale Computing - SPPEXA 2016-2019

    Get PDF
    This open access book summarizes the research done and results obtained in the second funding phase of the Priority Program 1648 "Software for Exascale Computing" (SPPEXA) of the German Research Foundation (DFG) presented at the SPPEXA Symposium in Dresden during October 21-23, 2019. In that respect, it both represents a continuation of Vol. 113 in Springer’s series Lecture Notes in Computational Science and Engineering, the corresponding report of SPPEXA’s first funding phase, and provides an overview of SPPEXA’s contributions towards exascale computing in today's sumpercomputer technology. The individual chapters address one or more of the research directions (1) computational algorithms, (2) system software, (3) application software, (4) data management and exploration, (5) programming, and (6) software tools. The book has an interdisciplinary appeal: scholars from computational sub-fields in computer science, mathematics, physics, or engineering will find it of particular interest

    Reducing communication in sparse solvers

    Get PDF
    Sparse matrix operations dominate the cost of many scientific applications. In parallel, the performance and scalability of these operations is limited by irregular point-to-point communication. Multiple methods are investigated throughout this dissertation for reducing the cost associated with communication throughout sparse matrix operations. Algorithmic changes reduce communication requirements, but also affect accuracy of the operation, leading to reduced convergence of scientific codes. We investigate a method of systematically removing relatively small non-zeros throughout an algebraic multigrid hierarchy, yielding significant reductions to the cost of sparse matrix-vector multiplication that outweigh affects of reduced accuracy of the multiplication. Therefore, the reduction in per-iteration communication costs outweigh the cost of extra solver iterations. As a result, sparsification yields improvement of both the performance and scalability of algebraic multigrid. Alterations to the parallel implementation of MPI communication also yield reduced costs with no effect on accuracy. We investigate methods of agglomerating messages on-node before injecting into the network, reducing the amount of costly inter-node communication. This node-aware communication yields improvements to both performance and scalability of matrix operations, particularly in strong scaling studies. Furthermore, we show an improvement in the cost of algebraic multigrid as a result of reduced communication costs in sparse matrix operations. Finally, performance models can be used to analyze the costs of matrix operations, indicating the source of dominant communication costs, such as initializing messages or transporting bytes of data. We investigate methods of improving traditional performance models of irregular point-to-point communication through the addition of node-awareness, queue search costs, and network contention penalties

    Toward biologically realistic computational membrane protein structure prediction and design

    Get PDF
    Membrane proteins function as gates and checkpoints that control the transit of molecules and information across the lipid bilayer. Understanding their structures will provide mechanistic insights in how to keep cells healthy and defend against disease. However, experimental difficulties have slowed the progress of structure determination. Previous work has demonstrated the promise of computational modeling for elucidating membrane protein structures. A remaining challenge is to model proteins coupled with the heterogeneous cell membrane environment. In the first half of this dissertation, I detail the development, testing and integration of a biologically realistic implicit lipid bilayer model in Rosetta. First, I describe the initial iteration of the implicit model that captures the anisotropic structure, shape of water-filled pores, and nanoscale dimensions of membranes with different lipid compositions. Second, I explain my approach to energy function benchmarking and optimization given the challenge of sparse and low-quality experimental data. Third, I outline the second generation that incorporates a new electrostatics and pH model. All of these developments have advanced the accuracy of Rosetta membrane protein structure prediction and design. In the second half of this dissertation, I investigate three challenging biological and engineering applications involving membrane proteins. In the first application, I examine mutation-induced stability changes in the integral membrane zinc metalloprotease ZMPSTE24: a protein with a large voluminous chamber that is not captured by current implicit models. In the second application, I model interactions between the SERCA2a calcium pump and the regulatory transmembrane protein phospholamban: a key membrane protein-protein interaction implicated in the heart’s response to adrenaline. Finally, I explore the challenge of membrane protein design to engineer a self-assembling transmembrane protein pore for nanotechnology applications. These applications highlight the next steps required to improve computational membrane protein modeling tools. Taken together, my work in both methods development and applications has advanced our understanding and ability to model and design membrane protein structures
    corecore