7 research outputs found
Parallelizing and Optimizing LHCb-Kalman for Intel Xeon Phi KNL Processors
Real time data processing is an important component of particle physics experiments with large computing resource requirements. As the Large Hadron Collider (LHC) at CERN is preparing for its next upgrade the LHCb experiment is upgrading its detector for a 30x increase in data throughput. In preparation for this upgrade the experiment is considering a number of architectural improvements encompassing both its software and hardware infrastructure. One of the hardware platforms under consideration is the Intel Xeon-Phi Knights Landing processor. Thanks to its on-package high-bandwidth memory and many-core architecture it offers an interesting alternative to more traditional server systems. We present a scalable, multi-threaded and NUMA-aware Kalman filter proto-application for particle track fitting expressed in terms of generic parallel patterns using the GrPPI interface. We show how code maintainability and readability improves, while maintaining comparable levels of performance to the baseline implementation. This is achieved by keeping the parallel algorithms in the underlying framework generic, but topology aware through the use of the Portable Hardware Locality (hwloc) library, which allows us to target different architectures with the same program. We measure the performance of our topology-aware GrPPI Kalman filter implementation on the Intel Xeon-Phi Knights Landing platform and conclude on the feasibility of integrating such high-level parallelization libraries in complex software frameworks such as LHCb's Gaudi framework
Relaxing the one definition rule in interpreted C++
Most implementations of the C++ programming language
generate binary executable code. However, interpreted execution of C++ sources has its own use cases as the Cling
interpreter from CERN’s ROOT project has shown. Some
limitations are derived from the ODR (One Definition Rule)
that rules out multiple definitions of entities within a single translation unit (TU). ODR is there to ensure uniform
view of a given C++ entity across translation units. Ensuring uniform view of C++ entities helps when producing ABI
compatible binaries. Interpreting C++ presumes a single evergrowing translation unit that define away some of the ODR
use-cases. Therefore, it may well be desirable to relax the
ODR and, consequently, to support the ability of developers
to override any existing definition for a given declaration.
This approach is especially well-suited for iterative prototyping. In this paper, we extend Cling, a Clang/LLVM-based
C++ interpreter, to enable redefinitions of C++ entities at
the prompt. To achieve this, top-level declarations are nested
into inline namespaces and the translation unit lookup table
is adjusted to invalidate previous definitions that would otherwise result in ambiguities. Formally, this technique refactors the code to an equivalent that does not violate the ODR,
as each definition is nested in a different namespace. Furthermore, any previous definition that has been shadowed
is still accessible by means of its fully-qualified name. A prototype implementation of the presented technique has been integrated into the Cling C++ interpreter, showing that our
technique is feasible and usable
Embedding semantics of the single-producer/single-consumer lock-free queue into a race detection tool
The rapid progress of multi-/many-core architectures has caused data-intensive parallel applications not yet be fully suited for getting the maximum performance. The advent of parallel programming frameworks offering structured pat- terns has alleviated developers' burden adapting such applications to parallel platforms. For example, the use of synchronization mechanisms in multithreaded applications is essential on shared-cache multi-core architectures. How- ever, ensuring an appropriate use of their interfaces can be challenging, since different memory models plus instruction reordering at compiler/processor levels may inuence the occurrence of data races. The benefits of race detectors are formidable in this sense, nevertheless if lock-free data structures with no high-level atomics are used, they may emit false positives. In this paper, we extend the ThreadSani- tizer race detection tool in order to support semantics of the general Single-Producer/Single-Consumer (SPSC) lock- free parallel queue and to detect benign data races where it was correctly used. To perform our analysis, we leverage the FastFlow SPSC bounded lock-free queue implementa- tion to test our extensions over a set of ÎĽ-benchmarks and real applications on a dual-socket Intel Xeon CPU E5-2695 platform. We demonstrate that this approach can reduce, on average, 30% the number of data race warning messages