7 research outputs found

    Parallelizing and Optimizing LHCb-Kalman for Intel Xeon Phi KNL Processors

    No full text
    Real time data processing is an important component of particle physics experiments with large computing resource requirements. As the Large Hadron Collider (LHC) at CERN is preparing for its next upgrade the LHCb experiment is upgrading its detector for a 30x increase in data throughput. In preparation for this upgrade the experiment is considering a number of architectural improvements encompassing both its software and hardware infrastructure. One of the hardware platforms under consideration is the Intel Xeon-Phi Knights Landing processor. Thanks to its on-package high-bandwidth memory and many-core architecture it offers an interesting alternative to more traditional server systems. We present a scalable, multi-threaded and NUMA-aware Kalman filter proto-application for particle track fitting expressed in terms of generic parallel patterns using the GrPPI interface. We show how code maintainability and readability improves, while maintaining comparable levels of performance to the baseline implementation. This is achieved by keeping the parallel algorithms in the underlying framework generic, but topology aware through the use of the Portable Hardware Locality (hwloc) library, which allows us to target different architectures with the same program. We measure the performance of our topology-aware GrPPI Kalman filter implementation on the Intel Xeon-Phi Knights Landing platform and conclude on the feasibility of integrating such high-level parallelization libraries in complex software frameworks such as LHCb's Gaudi framework

    Relaxing the one definition rule in interpreted C++

    No full text
    Most implementations of the C++ programming language generate binary executable code. However, interpreted execution of C++ sources has its own use cases as the Cling interpreter from CERN’s ROOT project has shown. Some limitations are derived from the ODR (One Definition Rule) that rules out multiple definitions of entities within a single translation unit (TU). ODR is there to ensure uniform view of a given C++ entity across translation units. Ensuring uniform view of C++ entities helps when producing ABI compatible binaries. Interpreting C++ presumes a single evergrowing translation unit that define away some of the ODR use-cases. Therefore, it may well be desirable to relax the ODR and, consequently, to support the ability of developers to override any existing definition for a given declaration. This approach is especially well-suited for iterative prototyping. In this paper, we extend Cling, a Clang/LLVM-based C++ interpreter, to enable redefinitions of C++ entities at the prompt. To achieve this, top-level declarations are nested into inline namespaces and the translation unit lookup table is adjusted to invalidate previous definitions that would otherwise result in ambiguities. Formally, this technique refactors the code to an equivalent that does not violate the ODR, as each definition is nested in a different namespace. Furthermore, any previous definition that has been shadowed is still accessible by means of its fully-qualified name. A prototype implementation of the presented technique has been integrated into the Cling C++ interpreter, showing that our technique is feasible and usable

    Embedding semantics of the single-producer/single-consumer lock-free queue into a race detection tool

    No full text
    The rapid progress of multi-/many-core architectures has caused data-intensive parallel applications not yet be fully suited for getting the maximum performance. The advent of parallel programming frameworks offering structured pat- terns has alleviated developers' burden adapting such applications to parallel platforms. For example, the use of synchronization mechanisms in multithreaded applications is essential on shared-cache multi-core architectures. How- ever, ensuring an appropriate use of their interfaces can be challenging, since different memory models plus instruction reordering at compiler/processor levels may inuence the occurrence of data races. The benefits of race detectors are formidable in this sense, nevertheless if lock-free data structures with no high-level atomics are used, they may emit false positives. In this paper, we extend the ThreadSani- tizer race detection tool in order to support semantics of the general Single-Producer/Single-Consumer (SPSC) lock- free parallel queue and to detect benign data races where it was correctly used. To perform our analysis, we leverage the FastFlow SPSC bounded lock-free queue implementa- tion to test our extensions over a set of ÎĽ-benchmarks and real applications on a dual-socket Intel Xeon CPU E5-2695 platform. We demonstrate that this approach can reduce, on average, 30% the number of data race warning messages
    corecore