2,776,002 research outputs found

    Propagation and Decay of Injected One-Off Delays on Clusters: A Case Study

    Full text link
    Analytic, first-principles performance modeling of distributed-memory applications is difficult due to a wide spectrum of random disturbances caused by the application and the system. These disturbances (commonly called "noise") destroy the assumptions of regularity that one usually employs when constructing simple analytic models. Despite numerous efforts to quantify, categorize, and reduce such effects, a comprehensive quantitative understanding of their performance impact is not available, especially for long delays that have global consequences for the parallel application. In this work, we investigate various traces collected from synthetic benchmarks that mimic real applications on simulated and real message-passing systems in order to pinpoint the mechanisms behind delay propagation. We analyze the dependence of the propagation speed of idle waves emanating from injected delays with respect to the execution and communication properties of the application, study how such delays decay under increased noise levels, and how they interact with each other. We also show how fine-grained noise can make a system immune against the adverse effects of propagating idle waves. Our results contribute to a better understanding of the collective phenomena that manifest themselves in distributed-memory parallel applications.Comment: 10 pages, 9 figures; title change

    Parallelizing Windowed Stream Joins in a Shared-Nothing Cluster

    Full text link
    The availability of large number of processing nodes in a parallel and distributed computing environment enables sophisticated real time processing over high speed data streams, as required by many emerging applications. Sliding window stream joins are among the most important operators in a stream processing system. In this paper, we consider the issue of parallelizing a sliding window stream join operator over a shared nothing cluster. We propose a framework, based on fixed or predefined communication pattern, to distribute the join processing loads over the shared-nothing cluster. We consider various overheads while scaling over a large number of nodes, and propose solution methodologies to cope with the issues. We implement the algorithm over a cluster using a message passing system, and present the experimental results showing the effectiveness of the join processing algorithm.Comment: 11 page

    A Non-blocking Buddy System for Scalable Memory Allocation on Multi-core Machines

    Get PDF
    Common implementations of core memory allocation components handle concurrent allocation/release requests by synchronizing threads via spin-locks. This approach is not prone to scale with large thread counts, a problem that has been addressed in the literature by introducing layered allocation services or replicating the core allocators - the bottom most ones within the layered architecture. Both these solutions tend to reduce the pressure of actual concurrent accesses to each individual core allocator. In this article we explore an alternative approach to scalability of memory allocation/release, which can be still combined with those literature proposals. We present a fully non-blocking buddy-system, that allows threads to proceed in parallel, and commit their allocations/releases unless a conflict is materialized while handling its metadata. Beyond improving scalability and performance it is resilient to performance degradation in face of concurrent accesses independently of the current level of fragmentation of the handled memory blocks

    LIKWID Monitoring Stack: A flexible framework enabling job specific performance monitoring for the masses

    Full text link
    System monitoring is an established tool to measure the utilization and health of HPC systems. Usually system monitoring infrastructures make no connection to job information and do not utilize hardware performance monitoring (HPM) data. To increase the efficient use of HPC systems automatic and continuous performance monitoring of jobs is an essential component. It can help to identify pathological cases, provides instant performance feedback to the users, offers initial data to judge on the optimization potential of applications and helps to build a statistical foundation about application specific system usage. The LIKWID monitoring stack is a modular framework build on top of the LIKWID tools library. It aims on enabling job specific performance monitoring using HPM data, system metrics and application-level data for small to medium sized commodity clusters. Moreover, it is designed to integrate in existing monitoring infrastructures to speed up the change from pure system monitoring to job-aware monitoring.Comment: 4 pages, 4 figures. Accepted for HPCMASPA 2017, the Workshop on Monitoring and Analysis for High Performance Computing Systems Plus Applications, held in conjunction with IEEE Cluster 2017, Honolulu, HI, September 5, 201

    Cluster algebras arising from cluster tubes

    Full text link
    We study the cluster algebras arising from cluster tubes with rank bigger than 11. Cluster tubes are 22-Calabi-Yau triangulated categories which contain no cluster tilting objects, but maximal rigid objects. Fix a certain maximal rigid object TT in the cluster tube Cn\mathcal{C}_n of rank nn. For any indecomposable rigid object MM in Cn\mathcal{C}_n, we define an analogous XMX_M of Caldero-Chapton's formula (or Palu's cluster character formula) by using the geometric information of MM. We show that XM,XMX_M, X_{M'} satisfy the mutation formula when M,MM,M' form an exchange pair, and that X?:MXMX_{?}: M\mapsto X_M gives a bijection from the set of indecomposable rigid objects in Cn\mathcal{C}_n to the set of cluster variables of cluster algebra of type Cn1C_{n-1}, which induces a bijection between the set of basic maximal rigid objects in Cn\mathcal{C}_n and the set of clusters. This strengths a surprising result proved recently by Buan-Marsh-Vatne that the combinatorics of maximal rigid objects in the cluster tube Cn\mathcal{C}_n encode the combinatorics of the cluster algebra of type Bn1B_{n-1} since the combinatorics of cluster algebras of type Bn1B_{n-1} or of type Cn1C_{n-1} are the same by a result of Fomin and Zelevinsky. As a consequence, we give a categorification of cluster algebras of type CC.Comment: 21 pages, title changed, rewrite the proof of the main theorem in Section 3, add Section 5, final version to appear in Jour. London Math. So

    Kang-Redner Anomaly in Cluster-Cluster Aggregation

    Full text link
    The large time, small mass, asymptotic behavior of the average mass distribution \pb is studied in a dd-dimensional system of diffusing aggregating particles for 1d21\leq d \leq 2. By means of both a renormalization group computation as well as a direct re-summation of leading terms in the small reaction-rate expansion of the average mass distribution, it is shown that \pb \sim \frac{1}{t^d} (\frac{m^{1/d}}{\sqrt{t}})^{e_{KR}} for mtd/2m \ll t^{d/2}, where eKR=ϵ+O(ϵ2)e_{KR}=\epsilon +O(\epsilon ^2) and ϵ=2d\epsilon =2-d. In two dimensions, it is shown that \pb \sim \frac{\ln(m) \ln(t)}{t^2} for mt/ln(t) m \ll t/ \ln(t). Numerical simulations in two dimensions supporting the analytical results are also presented.Comment: 11 pages, 6 figures, Revtex

    Globular Cluster Formation in the Virgo Cluster

    Full text link
    Metal poor globular clusters (MPGCs) are a unique probe of the early universe, in particular the reionization era. Systems of globular clusters in galaxy clusters are particularly interesting as it is in the progenitors of galaxy clusters that the earliest reionizing sources first formed. Although the exact physical origin of globular clusters is still debated, it is generally admitted that globular clusters form in early, rare dark matter peaks (Moore et al. 2006; Boley et al. 2009). We provide a fully numerical analysis of the Virgo cluster globular cluster system by identifying the present day globular cluster system with exactly such early, rare dark matter peaks. A popular hypothesis is that that the observed truncation of blue metal poor globular cluster formation is due to reionization (Spitler et al. 2012; Boley et al. 2009; Brodie & Strader 2006); adopting this view, constraining the formation epoch of MPGCs provides a complementary constraint on the epoch of reionization. By analyzing both the line of sight velocity dispersion and the surface density distribution of the present day distribution we are able to constrain the redshift and mass of the dark matter peaks. We find and quantify a dependence on the chosen line of sight of these quantities, whose strength varies with redshift, and coupled with star formation efficiency arguments find a best fitting formation mass and redshift of 5×108M\simeq 5 \times 10^8 \rm{M}_\odot and z9z\simeq 9. We predict 300\simeq 300 intracluster MPGCs in the Virgo cluster. Our results confirm the techniques pioneered by Moore et al. (2006) when applied to the the Virgo cluster and extend and refine the analytic results of Spitler et al. (2012) numerically.Comment: 13 Pages, 13 Figures, submitted to MNRA

    Towards a Mini-App for Smoothed Particle Hydrodynamics at Exascale

    Full text link
    The smoothed particle hydrodynamics (SPH) technique is a purely Lagrangian method, used in numerical simulations of fluids in astrophysics and computational fluid dynamics, among many other fields. SPH simulations with detailed physics represent computationally-demanding calculations. The parallelization of SPH codes is not trivial due to the absence of a structured grid. Additionally, the performance of the SPH codes can be, in general, adversely impacted by several factors, such as multiple time-stepping, long-range interactions, and/or boundary conditions. This work presents insights into the current performance and functionalities of three SPH codes: SPHYNX, ChaNGa, and SPH-flow. These codes are the starting point of an interdisciplinary co-design project, SPH-EXA, for the development of an Exascale-ready SPH mini-app. To gain such insights, a rotating square patch test was implemented as a common test simulation for the three SPH codes and analyzed on two modern HPC systems. Furthermore, to stress the differences with the codes stemming from the astrophysics community (SPHYNX and ChaNGa), an additional test case, the Evrard collapse, has also been carried out. This work extrapolates the common basic SPH features in the three codes for the purpose of consolidating them into a pure-SPH, Exascale-ready, optimized, mini-app. Moreover, the outcome of this serves as direct feedback to the parent codes, to improve their performance and overall scalability.Comment: 18 pages, 4 figures, 5 tables, 2018 IEEE International Conference on Cluster Computing proceedings for WRAp1

    A runtime heuristic to selectively replicate tasks for application-specific reliability targets

    Get PDF
    In this paper we propose a runtime-based selective task replication technique for task-parallel high performance computing applications. Our selective task replication technique is automatic and does not require modification/recompilation of OS, compiler or application code. Our heuristic, we call App_FIT, selects tasks to replicate such that the specified reliability target for an application is achieved. In our experimental evaluation, we show that App FIT selective replication heuristic is low-overhead and highly scalable. In addition, results indicate that complete task replication is overkill for achieving reliability targets. We show that with App FIT, we can tolerate pessimistic exascale error rates with only 53% of the tasks being replicated.This work was supported by FI-DGR 2013 scholarship and the European Community’s Seventh Framework Programme [FP7/2007-2013] under the Mont-blanc 2 Project (www.montblanc-project.eu), grant agreement no. 610402 and in part by the European Union (FEDER funds) under contract TIN2015-65316-P.Peer ReviewedPostprint (author's final draft
    corecore