91 research outputs found

    General Purpose Flow Visualization at the Exascale

    Get PDF
    Exascale computing, i.e., supercomputers that can perform 1018 math operations per second, provide significant opportunity for improving the computational sciences. That said, these machines can be difficult to use efficiently, due to their massive parallelism, due to the use of accelerators, and due to the diversity of accelerators used. All areas of the computational science stack need to be reconsidered to address these problems. With this dissertation, we consider flow visualization, which is critical for analyzing vector field data from simulations. We specifically consider flow visualization techniques that use particle advection, i.e., tracing particle trajectories, which presents performance and implementation challenges. The dissertation makes four primary contributions. First, it synthesizes previous work on particle advection performance and introduces a high-level analytical cost model. Second, it proposes an approach for performance portability across accelerators. Third, it studies expected speedups based on using accelerators, including the importance of factors such as duration, particle count, data set, and others. Finally, it proposes an exascale-capable particle advection system that addresses diversity in many dimensions, including accelerator type, parallelism approach, analysis use case, underlying vector field, and more

    19th SC@RUG 2022 proceedings 2021-2022

    Get PDF

    19th SC@RUG 2022 proceedings 2021-2022

    Get PDF

    19th SC@RUG 2022 proceedings 2021-2022

    Get PDF

    Computational Modelling of Concrete and Concrete Structures

    Get PDF
    Computational Modelling of Concrete and Concrete Structures contains the contributions to the EURO-C 2022 conference (Vienna, Austria, 23-26 May 2022). The papers review and discuss research advancements and assess the applicability and robustness of methods and models for the analysis and design of concrete, fibre-reinforced and prestressed concrete structures, as well as masonry structures. Recent developments include methods of machine learning, novel discretisation methods, probabilistic models, and consideration of a growing number of micro-structural aspects in multi-scale and multi-physics settings. In addition, trends towards the material scale with new fibres and 3D printable concretes, and life-cycle oriented models for ageing and durability of existing and new concrete infrastructure are clearly visible. Overall computational robustness of numerical predictions and mathematical rigour have further increased, accompanied by careful model validation based on respective experimental programmes. The book will serve as an important reference for both academics and professionals, stimulating new research directions in the field of computational modelling of concrete and its application to the analysis of concrete structures. EURO-C 2022 is the eighth edition of the EURO-C conference series after Innsbruck 1994, Bad Gastein 1998, St. Johann im Pongau 2003, Mayrhofen 2006, Schladming 2010, St. Anton am Arlberg 2014, and Bad Hofgastein 2018. The overarching focus of the conferences is on computational methods and numerical models for the analysis of concrete and concrete structures

    Performance engineering of data-intensive applications

    Get PDF
    Data-intensive programs deal with big chunks of data and often contain compute-intensive characteristics. Among various HPC application domains, big data analytics, machine learning and the more recent deep-learning models are well-known data-intensive applications. An efficient design of such applications demands extensive knowledge of the target hardware and software, particularly the memory/cache hierarchy and the data communication among threads/processes. Such a requirement makes code development an arduous task, as inappropriate data structures and algorithm design may result in superfluous runtime, let alone hardware incompatibilities while porting the code to other platforms. In this dissertation, we introduce a set of tools and methods for the performance engineering of parallel data-intensive programs. We start with performance profiling to gain insights on thread communications and relevant code optimizations. Then, by narrowing down our scope to deep-learning applications, we introduce our tools for enhancing the performance portability and scalability of convolutional neural networks (ConvNet) at inference and training phases. Our first contribution is a novel performance-profiling method to unveil potential communication bottlenecks caused by data-access patterns and thread interactions. Our findings show that the data shared between a pair of threads should be reused with a reasonably short intervals to preserve data locality, yet existing profilers neglect them and mainly report the communication volume. We propose new hardware-independent metrics to characterize thread communication and provide suggestions for applying appropriate optimizations on a specific code region. Our experiments show that applying relevant optimizations improves the performance in Rodinia benchmarks by up to 56%. For the next contribution, we developed a framework for automatic generation of efficient and performance-portable convolution kernels, including Winograd convolutions, for various GPU platforms. We employed a synergy of meta-programming, symbolic execution, and auto-tuning. The results demonstrate efficient kernels generated through an automated optimization pipeline with runtimes close to vendor deep-learning libraries, and the minimum required programming effort confirms the performance portability of our approach. Furthermore, our symbolic execution method exploits repetitive patterns in Winograd convolutions, enabling us to reduce the number of arithmetic operations by up to 62% without compromising the numerical stability. Lastly, we investigate possible methods to scale the performance of ConvNets in training and inference phases. Our specialized training platform equipped with a novel topology-aware network pruning algorithm enables rapid training, neural architecture search, and network compression. Thus, an AI model training can be easily scaled to a multitude of compute nodes, leading to faster model design with less operating costs. Furthermore, the network compression component scales a ConvNet model down by removing redundant layers, preparing the model for a more pertinent deployment. Altogether, this work demonstrates the necessity and shows the benefit of performance engineering and parallel programming methods in accelerating emerging data-intensive workloads. With the help of the proposed tools and techniques, we pinpoint data communication bottlenecks and achieve performance portability and scalability in data-intensive applications

    19th SC@RUG 2022 proceedings 2021-2022

    Get PDF

    Proceedings of the 19th Sound and Music Computing Conference

    Get PDF
    Proceedings of the 19th Sound and Music Computing Conference - June 5-12, 2022 - Saint-Étienne (France). https://smc22.grame.f
    corecore