Search CORE

2,776,002 research outputs found

Propagation and Decay of Injected One-Off Delays on Clusters: A Case Study

Author: Afzal Ayesha
Hager Georg
Wellein Gerhard
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 28/08/2019
Field of study

Analytic, first-principles performance modeling of distributed-memory applications is difficult due to a wide spectrum of random disturbances caused by the application and the system. These disturbances (commonly called "noise") destroy the assumptions of regularity that one usually employs when constructing simple analytic models. Despite numerous efforts to quantify, categorize, and reduce such effects, a comprehensive quantitative understanding of their performance impact is not available, especially for long delays that have global consequences for the parallel application. In this work, we investigate various traces collected from synthetic benchmarks that mimic real applications on simulated and real message-passing systems in order to pinpoint the mechanisms behind delay propagation. We analyze the dependence of the propagation speed of idle waves emanating from injected delays with respect to the execution and communication properties of the application, study how such delays decay under increased noise levels, and how they interact with each other. We also show how fine-grained noise can make a system immune against the adverse effects of propagating idle waves. Our results contribute to a better understanding of the collective phenomena that manifest themselves in distributed-memory parallel applications.Comment: 10 pages, 9 figures; title change

arXiv.org e-Print Archive

Crossref

Parallelizing Windowed Stream Joins in a Shared-Nothing Cluster

Author: Chakraborty Abhirup
Singh Ajit
Publication venue
Publication date: 24/07/2013
Field of study

The availability of large number of processing nodes in a parallel and distributed computing environment enables sophisticated real time processing over high speed data streams, as required by many emerging applications. Sliding window stream joins are among the most important operators in a stream processing system. In this paper, we consider the issue of parallelizing a sliding window stream join operator over a shared nothing cluster. We propose a framework, based on fixed or predefined communication pattern, to distribute the join processing loads over the shared-nothing cluster. We consider various overheads while scaling over a large number of nodes, and propose solution methodologies to cope with the issues. We implement the algorithm over a cluster using a message passing system, and present the experimental results showing the effectiveness of the join processing algorithm.Comment: 11 page

arXiv.org e-Print Archive

Crossref

A Non-blocking Buddy System for Scalable Memory Allocation on Multi-core Machines

Author: Ianni Mauro
Marotta Romolo
Pellegrini Alessandro
Quaglia Francesco
Scarselli Andrea
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

Common implementations of core memory allocation components handle concurrent allocation/release requests by synchronizing threads via spin-locks. This approach is not prone to scale with large thread counts, a problem that has been addressed in the literature by introducing layered allocation services or replicating the core allocators - the bottom most ones within the layered architecture. Both these solutions tend to reduce the pressure of actual concurrent accesses to each individual core allocator. In this article we explore an alternative approach to scalability of memory allocation/release, which can be still combined with those literature proposals. We present a fully non-blocking buddy-system, that allows threads to proceed in parallel, and commit their allocations/releases unless a conflict is materialized while handling its metadata. Beyond improving scalability and performance it is resilient to performance degradation in face of concurrent accesses independently of the current level of fragmentation of the handled memory blocks

Crossref

Archivio della ricerca- Università di Roma La Sapienza

LIKWID Monitoring Stack: A flexible framework enabling job specific performance monitoring for the masses

Author: Eitzinger Jan
Hager Georg
Röhl Thomas
Wellein Gerhard
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 04/08/2017
Field of study

System monitoring is an established tool to measure the utilization and health of HPC systems. Usually system monitoring infrastructures make no connection to job information and do not utilize hardware performance monitoring (HPM) data. To increase the efficient use of HPC systems automatic and continuous performance monitoring of jobs is an essential component. It can help to identify pathological cases, provides instant performance feedback to the users, offers initial data to judge on the optimization potential of applications and helps to build a statistical foundation about application specific system usage. The LIKWID monitoring stack is a modular framework build on top of the LIKWID tools library. It aims on enabling job specific performance monitoring using HPM data, system metrics and application-level data for small to medium sized commodity clusters. Moreover, it is designed to integrate in existing monitoring infrastructures to speed up the change from pure system monitoring to job-aware monitoring.Comment: 4 pages, 4 figures. Accepted for HPCMASPA 2017, the Workshop on Monitoring and Analysis for High Performance Computing Systems Plus Applications, held in conjunction with IEEE Cluster 2017, Honolulu, HI, September 5, 201

arXiv.org e-Print Archive

Crossref

Cluster algebras arising from cluster tubes

Author: Amiot
Barot
Bin Zhu
Buan
Buan
Buan
Burban
Caldero
Caldero
Dehy
Ding
Dupont
Dupont
Fomin
Fomin
Fomin
Fomin
Fomin
Fomin
Fu
Geiß
Geiß
Happel
Iyama
Iyama
Keller
Keller
Keller
Keller
Koenig
Marsh
Palu
Palu
Plamondon
Plamondon
Vatne
Yang
Yu Zhou
Zhou
Zhu
Publication venue: 'Wiley'
Publication date: 29/11/2013
Field of study

We study the cluster algebras arising from cluster tubes with rank bigger than

1

. Cluster tubes are

2-

Calabi-Yau triangulated categories which contain no cluster tilting objects, but maximal rigid objects. Fix a certain maximal rigid object

T

in the cluster tube

\mathcal{C}_n

of rank

n

. For any indecomposable rigid object

M

\mathcal{C}_n

, we define an analogous

X_M

of Caldero-Chapton's formula (or Palu's cluster character formula) by using the geometric information of

M

. We show that

X_M, X_{M'}

satisfy the mutation formula when

M,M'

form an exchange pair, and that

X_{?}: M\mapsto X_M

gives a bijection from the set of indecomposable rigid objects in

\mathcal{C}_n

to the set of cluster variables of cluster algebra of type

C_{n-1}

, which induces a bijection between the set of basic maximal rigid objects in

\mathcal{C}_n

and the set of clusters. This strengths a surprising result proved recently by Buan-Marsh-Vatne that the combinatorics of maximal rigid objects in the cluster tube

\mathcal{C}_n

encode the combinatorics of the cluster algebra of type

B_{n-1}

since the combinatorics of cluster algebras of type

B_{n-1}

or of type

C_{n-1}

are the same by a result of Fomin and Zelevinsky. As a consequence, we give a categorification of cluster algebras of type

C

.Comment: 21 pages, title changed, rewrite the proof of the main theorem in Section 3, add Section 5, final version to appear in Jour. London Math. So

arXiv.org e-Print Archive

Crossref

Kang-Redner Anomaly in Cluster-Cluster Aggregation

Author: A.E. Scheidegger
A.S. Mikhailov
A.S. Mikhailov
B. Lee
B.P. Lee
D.J. Aldous
E.K.O. Hellén
G. Oshanin
J.-P. Laval
J.L. Spouge
K. Kang
L. Peliti
M. Bramson
M. Doi
M. Doi
M. Fisher
M. Howard
O. Zaboronski
Oleg Zaboronski
P. Krapivsky
P.-A. Ray
P.C. Martin
P.L. Krapivsky
R. Rajesh
S.N. Majumdar
Supriya Krishnamurthy
V. Privman
W.H. White
W.R. White
Ya.B. Zel’dovich
Publication venue: 'American Physical Society (APS)'
Publication date: 06/09/2002
Field of study

The large time, small mass, asymptotic behavior of the average mass distribution \pb is studied in a

d

-dimensional system of diffusing aggregating particles for

1\leq d \leq 2

. By means of both a renormalization group computation as well as a direct re-summation of leading terms in the small reaction-rate expansion of the average mass distribution, it is shown that \pb \sim \frac{1}{t^d} (\frac{m^{1/d}}{\sqrt{t}})^{e_{KR}} for

m \ll t^{d/2}

, where

e_{KR}=\epsilon +O(\epsilon ^2)

and

\epsilon =2-d

. In two dimensions, it is shown that \pb \sim \frac{\ln(m) \ln(t)}{t^2} for

m \ll t/ \ln(t)

. Numerical simulations in two dimensions supporting the analytical results are also presented.Comment: 11 pages, 6 figures, Revtex

arXiv.org e-Print Archive

Crossref

Warwick Research Archives Portal Repository

Globular Cluster Formation in the Virgo Cluster

Author: Lake G.
Moran C. Corbett
Teyssier R.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 16/03/2014
Field of study

Metal poor globular clusters (MPGCs) are a unique probe of the early universe, in particular the reionization era. Systems of globular clusters in galaxy clusters are particularly interesting as it is in the progenitors of galaxy clusters that the earliest reionizing sources first formed. Although the exact physical origin of globular clusters is still debated, it is generally admitted that globular clusters form in early, rare dark matter peaks (Moore et al. 2006; Boley et al. 2009). We provide a fully numerical analysis of the Virgo cluster globular cluster system by identifying the present day globular cluster system with exactly such early, rare dark matter peaks. A popular hypothesis is that that the observed truncation of blue metal poor globular cluster formation is due to reionization (Spitler et al. 2012; Boley et al. 2009; Brodie & Strader 2006); adopting this view, constraining the formation epoch of MPGCs provides a complementary constraint on the epoch of reionization. By analyzing both the line of sight velocity dispersion and the surface density distribution of the present day distribution we are able to constrain the redshift and mass of the dark matter peaks. We find and quantify a dependence on the chosen line of sight of these quantities, whose strength varies with redshift, and coupled with star formation efficiency arguments find a best fitting formation mass and redshift of

\simeq 5 \times 10^8 \rm{M}_\odot

and

z\simeq 9

. We predict

\simeq 300

intracluster MPGCs in the Virgo cluster. Our results confirm the techniques pioneered by Moore et al. (2006) when applied to the the Virgo cluster and extend and refine the analytic results of Spitler et al. (2012) numerically.Comment: 13 Pages, 13 Figures, submitted to MNRA

arXiv.org e-Print Archive

CiteSeerX

Crossref

ZORA

Towards a Mini-App for Smoothed Particle Hydrodynamics at Exascale

Author: Cabezón Rubén M.
Cavelan Aurélien
Ciorba Florina M.
Guerrera Danilo
Imbert David
Mayer Lucio
Piccinali Jean-Guillaume
Reed Darren
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

The smoothed particle hydrodynamics (SPH) technique is a purely Lagrangian method, used in numerical simulations of fluids in astrophysics and computational fluid dynamics, among many other fields. SPH simulations with detailed physics represent computationally-demanding calculations. The parallelization of SPH codes is not trivial due to the absence of a structured grid. Additionally, the performance of the SPH codes can be, in general, adversely impacted by several factors, such as multiple time-stepping, long-range interactions, and/or boundary conditions. This work presents insights into the current performance and functionalities of three SPH codes: SPHYNX, ChaNGa, and SPH-flow. These codes are the starting point of an interdisciplinary co-design project, SPH-EXA, for the development of an Exascale-ready SPH mini-app. To gain such insights, a rotating square patch test was implemented as a common test simulation for the three SPH codes and analyzed on two modern HPC systems. Furthermore, to stress the differences with the codes stemming from the astrophysics community (SPHYNX and ChaNGa), an additional test case, the Evrard collapse, has also been carried out. This work extrapolates the common basic SPH features in the three codes for the purpose of consolidating them into a pure-SPH, Exascale-ready, optimized, mini-app. Moreover, the outcome of this serves as direct feedback to the parent codes, to improve their performance and overall scalability.Comment: 18 pages, 4 figures, 5 tables, 2018 IEEE International Conference on Cluster Computing proceedings for WRAp1

arXiv.org e-Print Archive

Crossref

edoc

ZORA

A runtime heuristic to selectively replicate tasks for application-specific reliability targets

Author: Labarta Mancho Jesús José
Subasi Omer
Unsal Osman Sabri
Yalcin Gulay
Zyulkyarov Ferad
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

In this paper we propose a runtime-based selective task replication technique for task-parallel high performance computing applications. Our selective task replication technique is automatic and does not require modification/recompilation of OS, compiler or application code. Our heuristic, we call App_FIT, selects tasks to replicate such that the specified reliability target for an application is achieved. In our experimental evaluation, we show that App FIT selective replication heuristic is low-overhead and highly scalable. In addition, results indicate that complete task replication is overkill for achieving reliability targets. We show that with App FIT, we can tolerate pessimistic exascale error rates with only 53% of the tasks being replicated.This work was supported by FI-DGR 2013 scholarship and the European Community’s Seventh Framework Programme [FP7/2007-2013] under the Mont-blanc 2 Project (www.montblanc-project.eu), grant agreement no. 610402 and in part by the European Union (FEDER funds) under contract TIN2015-65316-P.Peer ReviewedPostprint (author's final draft

Crossref

UPCommons. Portal del coneixement obert de la UPC

UPCommons (Universitat Politècnica de Catalunya)