14 research outputs found
SCIPHI Score-P and Cube Extensions for Intel Phi
The Knights Landing processors offers unique features with regards to memory hierarchy and vectorization capabilities. To improve tool support within these two areas, we present extensions to the Score-P measurement infrastructure and the Cube report explorer. With the Knights Landing edition, Intel introduced a new memory architecture, utilizing two types of memory, MCDRAM and DDR4 SDRAM. To assist the user in the decision where to place data structures, we introduce a MCDRAM candidate metric to the Cube report explorer. In addition we track all MCDRAM allocations through the hbwmalloc interface, providing memory metrics like leaked memory or the high-water mark on a per-region basis, as already known for the ubiquitous malloc/free. A Score-P metric plugin that records memory statistics via numastat on a per process level enables a timeline analysis using the Vampir toolset. To get the best performance out of , the large vector processing units need to be utilized effectively. The ratio between computation and data access and the vector processing unit (VPU) intensity are introduced as metrics to identify vectorization candidates on a per-region basis. The Portable Hardware Locality (hwloc) Broquedis et al. (hwloc: a generic framework for managing hardware affinities in hpc applications, 2010 [2]) library allows us to visualize the distribution of the KNL-specific performance metrics within the Cube report explorer, taking the hardware topology consisting of processor tiles and cores into account
Profiling Hybrid HMPP Applications with Score-P on Heterogeneous Hardware
In heterogeneous environments with multi-core systems and accelerators, programming and optimizing large parallel applications turns into a time-intensive and hardware-dependent challenge. To assist application developers in this process, a number of tools and high-level compilers have been developed. Directive-based programming models such as HMPP and OpenACC provide abstractions over low-level GPU programming models,such as CUDA or OpenCL. The compilers developed by CAPS automatically transform the pragma-annotated application code into low-level code, thereby allowing the parallelization and optimization for a given accelerator hardware. To analyze the performance of parallel applications, multiple partners in Germany and the US jointly develop the community measurement infrastructure Score-P. Score-P gathers performance execution profiles, which can be presented and analyzed within the CUBE result browser, and collects detailed event traces to be processed by post-mortem analysis tools such as Scalasca and Vampir.In this paper we present the integration and combined use of Score-P and the CAPS compilers as one approach to efficiently parallelize and optimize codes. Specifically, we describe the PHMPP profiling interface, it's implementation in Score-P, and the presentation of preliminary results in CUBE
A Picture Is Worth a Thousand Numbers—Enhancing Cube’s Analysis Capabilities with Plugins
In the last couple of years, supercomputers became increasingly large and more and more complex. Performance analysis tools need to adapt to the system complexity in order to be used effectively at large scale. Thus, we introduced a plugin infrastructure in Cube 4, the performance report explorer for Score-P and Scalasca, which allows to extend Cube’s analysis features without modifying the source code of the GUI. In this paper we describe the Cube plugin infrastructure and show how it makes Cube a more versatile and powerful tool. We present different plugins provided by JSC that extend and enhance the CubeGUI’s analysis capabilities. These add new types of system-tree visualizations, help create reasonable filter files for Score-P and visualize simple OTF2 trace files. We also present a plugin which provides a high-level overview of the efficiency of the application and its kernels. We further discuss context-free plugins, which are used to integrate command-line Cube algebra utilities, like cube_diff and similar commands, in the GUI
SCIPHI - Score-P and Cube Extensions for Intel Xeon Phi
The KNL processors offers unique features concerning memory hierarchy and vectorization capabilities. To improve tool support within these two areas, we present extensions to the Score-P measurement system and the Cube report explorer.KNL introduced a new memory architecture, utilizing MCDRAM and DDR. To help the user in the decision where to place data structures, we record a MCDRAM candidate metric. In addition we track all MCDRAM allocations through the hbwmalloc API, providing memory metrics like leaked memory or the high-watermark on a per-region basis. For time-line analysis per-process memory statistics are recorded via numastat.KNL's large vector processing unit needs to be utilized and utilized effectively. The metrics compute-to-data access ratio and VPU intensity are introduced to identify vectorization candidates on a per-region basis.Taking the hardware structure into account, the distribution of the KNL-specific metrics is visualized in the Cube report explorer
Hands-on Practical Hybrid Parallel Application Performance Engineering
This tutorial presents state-of-the-art performance tools for leading-edge HPC systems founded on the community-developed Score-P instrumentation and measurement infrastructure, demonstrating how they can be used for performance engineering of effective scientific applications based on standard MPI, OpenMP, hybrid combination of both, and increasingly common usage of accelerators. Parallel performance tools from VI-HPS.org are introduced and featured in hands-on exercises with Score-P, Scalasca, Vampir and TAU. We present the complete workflow of performance engineering, including instrumentation, measurement (profiling and tracing, timing and PAPI hardware counters), data storage, analysis, tuning and visualization. Emphasis is placed on how tools are used together for identifying performance problems and investigating optimization alternatives. Using an AWS instance of E4S with all of the necessary tools, participants will conduct exercises with support for a remote desktop session for GUI tools. This will help to prepare participants to locate and diagnose performance bottlenecks in their own parallel programs
Hands-on Practical Hybrid Parallel Application Performance Engineering
This tutorial presents state-of-the-art performance tools for leading HPC systems founded on the community-developed Score-P instrumentation and measurement infrastructure, demonstrating how they can be used for performance engineering of effective scientific applications based on standard MPI, OpenMP, hybrid combination of both, and increasingly common usage of accelerators. Parallel performance tools from the Virtual Institute – High Productivity Supercomputing (VI-HPS) are introduced and featured in hands-on exercises with Score-P, Scalasca, Vampir, and TAU. These platform-agnostic tools are installed and supported on many of the HPC systems coordinated via PRACE, ECP, XSEDE/ACCESS, and others. We present the complete workflow of performance engineering, including instrumentation, measurement (profiling and tracing, timers and hardware counters), data storage, analysis, tuning, and visualization. Emphasis is placed on how tools are used in combination for identifying performance problems and investigating optimization alternatives. Participants will use their notebook computer for guided exercises on contemporary CPU+GPU HPC systems which will prepare them to locate and diagnose performance bottlenecks in their own parallel programs
Natural killer cell profiles in recurrent pregnancy loss: increased expression and positive associations with tactile and lilrb1
PROBLEM: NK cells are important for healthy pregnancy and aberrant phenotypes or effector functions have been associated with RPL. We compared expression of a broad panel of NK cell receptors, including immune checkpoint receptors, and investigated their clinical association with RPL as this might improve patient stratification and prediction of RPL. METHOD OF STUDY: Peripheral blood mononuclear cells were isolated from fifty-two women with RPL and from twenty-two women with an uncomplicated pregnancy for flowcytometric analysis and plasma was used to determine anti-CMV IgG antibodies. RESULTS: Between RPL and controls, we observed no difference in frequencies of T-, NKT or NK cells, in CD56dimCD16+ or CD56brightCD16- NK cell subsets or in the expression of KIRs, NKG2A, NKG2C, NKG2D, NKp30, NKp44, NKp46 or DNAM1. NK cells from women with RPL had a higher expression of LILRB1 and TACTILE and this was associated with the number of losses. The immune checkpoint receptors PD1, TIM3 and LAG3 were not expressed on peripheral blood NK cells. In RPL patients, there was a large variation in NKG2C expression and higher levels could be explained by CMV seropositivity. CONCLUSIONS: Our study identified LILRB1 and TACTILE as NK cell receptors associated with RPL. Moreover, we provide first support for the potential role of CMV in RPL via its impact on the NK cell compartment. Thereby our study could guide future studies to confirm the clinical association of LILRB1, TACTILE and NKG2C with RPL in a larger cohort and to explore their functional relevance in reproductive success. This article is protected by copyright. All rights reserved
Scalasca Trace Tools: Toolset for scalable performance analysis of large-scale parallel applications (v2.6.1)
Scalasca is a software tool that supports the performance optimization of parallel programs by measuring and analyzing their runtime behaviour. The analysis identifies potential performance bottlenecks – in particular those concerning communication and synchronization – and offers guidance in exploring their causes. Scalasca supports the performance optimization of simulation codes on a wide range of current HPC platforms. Its powerful analysis and intuitive result presentation guides the developer through the tuning process. Scalasca targets mainly scientific and engineering applications based on the programming interfaces MPI and OpenMP, including hybrid applications based on a combination of the two. The tool has been specifically designed for use on large-scale systems, but is also well suited for small- and medium-scale HPC platforms. The software is available for free download under the New BSD open-source license