5,389 research outputs found

    Performance Characterization of Multi-threaded Graph Processing Applications on Intel Many-Integrated-Core Architecture

    Full text link
    Intel Xeon Phi many-integrated-core (MIC) architectures usher in a new era of terascale integration. Among emerging killer applications, parallel graph processing has been a critical technique to analyze connected data. In this paper, we empirically evaluate various computing platforms including an Intel Xeon E5 CPU, a Nvidia Geforce GTX1070 GPU and an Xeon Phi 7210 processor codenamed Knights Landing (KNL) in the domain of parallel graph processing. We show that the KNL gains encouraging performance when processing graphs, so that it can become a promising solution to accelerating multi-threaded graph applications. We further characterize the impact of KNL architectural enhancements on the performance of a state-of-the art graph framework.We have four key observations: 1 Different graph applications require distinctive numbers of threads to reach the peak performance. For the same application, various datasets need even different numbers of threads to achieve the best performance. 2 Only a few graph applications benefit from the high bandwidth MCDRAM, while others favor the low latency DDR4 DRAM. 3 Vector processing units executing AVX512 SIMD instructions on KNLs are underutilized when running the state-of-the-art graph framework. 4 The sub-NUMA cache clustering mode offering the lowest local memory access latency hurts the performance of graph benchmarks that are lack of NUMA awareness. At last, We suggest future works including system auto-tuning tools and graph framework optimizations to fully exploit the potential of KNL for parallel graph processing.Comment: published as L. Jiang, L. Chen and J. Qiu, "Performance Characterization of Multi-threaded Graph Processing Applications on Many-Integrated-Core Architecture," 2018 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), Belfast, United Kingdom, 2018, pp. 199-20

    A Phase Field Model for Continuous Clustering on Vector Fields

    Get PDF
    A new method for the simplification of flow fields is presented. It is based on continuous clustering. A well-known physical clustering model, the Cahn Hilliard model, which describes phase separation, is modified to reflect the properties of the data to be visualized. Clusters are defined implicitly as connected components of the positivity set of a density function. An evolution equation for this function is obtained as a suitable gradient flow of an underlying anisotropic energy functional. Here, time serves as the scale parameter. The evolution is characterized by a successive coarsening of patterns-the actual clustering-during which the underlying simulation data specifies preferable pattern boundaries. We introduce specific physical quantities in the simulation to control the shape, orientation and distribution of the clusters as a function of the underlying flow field. In addition, the model is expanded, involving elastic effects. In the early stages of the evolution shear layer type representation of the flow field can thereby be generated, whereas, for later stages, the distribution of clusters can be influenced. Furthermore, we incorporate upwind ideas to give the clusters an oriented drop-shaped appearance. Here, we discuss the applicability of this new type of approach mainly for flow fields, where the cluster energy penalizes cross streamline boundaries. However, the method also carries provisions for other fields as well. The clusters can be displayed directly as a flow texture. Alternatively, the clusters can be visualized by iconic representations, which are positioned by using a skeletonization algorithm.

    A Parallel Adaptive P3M code with Hierarchical Particle Reordering

    Full text link
    We discuss the design and implementation of HYDRA_OMP a parallel implementation of the Smoothed Particle Hydrodynamics-Adaptive P3M (SPH-AP3M) code HYDRA. The code is designed primarily for conducting cosmological hydrodynamic simulations and is written in Fortran77+OpenMP. A number of optimizations for RISC processors and SMP-NUMA architectures have been implemented, the most important optimization being hierarchical reordering of particles within chaining cells, which greatly improves data locality thereby removing the cache misses typically associated with linked lists. Parallel scaling is good, with a minimum parallel scaling of 73% achieved on 32 nodes for a variety of modern SMP architectures. We give performance data in terms of the number of particle updates per second, which is a more useful performance metric than raw MFlops. A basic version of the code will be made available to the community in the near future.Comment: 34 pages, 12 figures, accepted for publication in Computer Physics Communication
    corecore