4 research outputs found

    Systematically Exploring High-Performance Representations of Vector Fields Through Compile-Time Composition

    Get PDF
    We present a novel benchmark suite for implementations of vector fields in high-performance computing environments to aid developers in quantifying and ranking their performance. We decompose the design space of such benchmarks into access patterns and storage backends, the latter of which can be further decomposed into components with different functional and non-functional properties. Through compile-time meta-programming, we generate a large number of benchmarks with minimal effort and ensure the extensibility of our suite. Our empirical analysis, based on real-world applications in high-energy physics, demonstrates the feasibility of our approach on CPU and GPU platforms, and highlights that our suite is able to evaluate performance-critical design choices. Finally, we propose that our work towards composing vector fields from elementary components is not only useful for the purposes of benchmarking, but that it naturally gives rise to a novel library for implementing such fields in domain applications

    Software performance of the ATLAS track reconstruction for LHC run 3

    Get PDF
    Charged particle reconstruction in the presence of many simultaneous proton–proton (pp) collisions in the LHC is a challenging task for the ATLAS experiment’s reconstruction software due to the combinatorial complexity. This paper describes the major changes made to adapt the software to reconstruct high-activity collisions with an average of 50 or more simultaneous pp interactions per bunch crossing (pileup) promptly using the available computing resources. The performance of the key components of the track reconstruction chain and its dependence on pile-up are evaluated, and the improvement achieved compared to the previous software version is quantified. For events with an average of 60 pp collisions per bunch crossing, the updated track reconstruction is twice as fast as the previous version, without significant reduction in reconstruction efficiency and while reducing the rate of combinatorial fake tracks by more than a factor two

    Modelling Performance Loss due to Thread Imbalance in Stochastic Variable-Length SIMT Workloads

    Get PDF
    When designing algorithms for single-instruction multiple-thread (SIMT) devices such as general purpose graphics processing units (GPGPUs), thread imbalance is an important performance consideration. Thread imbalance can emerge in iterative applications where workloads are of variable length, because threads processing larger amounts of work will cause threads with less work to idle. This form of thread imbalance influences the design space of algorithms-particularly in terms of processing granularity-but we lack models to quantify its impact on application performance. In this paper, we present a statistical model for quantifying the performance loss due to thread imbalance for iterative SIMT applications with stochastic, variable-length workloads. Our model is designed to operate with minimal knowledge of the implementation details of the algorithm, relying solely on an understanding of the probability distribution of the lengths of the workloads. We validate our model against a synthetic benchmark based on a Monte Carlo simulation of matrix exponentiation, and show that our model achieves nearly perfect accuracy. Compared to empirical data extracted from real hardware, our model maintains a high degree of accuracy, predicting mean performance loss within a margin of 2%
    corecore