232 research outputs found
Dynamic Loop Scheduling Using MPI Passive-Target Remote Memory Access
Scientific applications often contain large computationally-intensive
parallel loops. Loop scheduling techniques aim to achieve load balanced
executions of such applications. For distributed-memory systems, existing
dynamic loop scheduling (DLS) libraries are typically MPI-based, and employ a
master-worker execution model to assign variably-sized chunks of loop
iterations. The master-worker execution model may adversely impact performance
due to the master-level contention. This work proposes a distributed
chunk-calculation approach that does not require the master-worker execution
scheme. Moreover, it considers the novel features in the latest MPI standards,
such as passive-target remote memory access, shared-memory window creation, and
atomic read-modify-write operations. To evaluate the proposed approach, five
well-known DLS techniques, two applications, and two heterogeneous hardware
setups have been considered. The DLS techniques implemented using the proposed
approach outperformed their counterparts implemented using the traditional
master-worker execution model
Hierarchical Dynamic Loop Self-Scheduling on Distributed-Memory Systems Using an MPI+MPI Approach
Computationally-intensive loops are the primary source of parallelism in
scientific applications. Such loops are often irregular and a balanced
execution of their loop iterations is critical for achieving high performance.
However, several factors may lead to an imbalanced load execution, such as
problem characteristics, algorithmic, and systemic variations. Dynamic loop
self-scheduling (DLS) techniques are devised to mitigate these factors, and
consequently, improve application performance. On distributed-memory systems,
DLS techniques can be implemented using a hierarchical master-worker execution
model and are, therefore, called hierarchical DLS techniques. These techniques
self-schedule loop iterations at two levels of hardware parallelism: across and
within compute nodes. Hybrid programming approaches that combine the message
passing interface (MPI) with open multi-processing (OpenMP) dominate the
implementation of hierarchical DLS techniques. The MPI-3 standard includes the
feature of sharing memory regions among MPI processes. This feature introduced
the MPI+MPI approach that simplifies the implementation of parallel scientific
applications. The present work designs and implements hierarchical DLS
techniques by exploiting the MPI+MPI approach. Four well-known DLS techniques
are considered in the evaluation proposed herein. The results indicate certain
performance advantages of the proposed approach compared to the hybrid
MPI+OpenMP approach
Assessing Data Usefulness for Failure Analysis in Anonymized System Logs
System logs are a valuable source of information for the analysis and
understanding of systems behavior for the purpose of improving their
performance. Such logs contain various types of information, including
sensitive information. Information deemed sensitive can either directly be
extracted from system log entries by correlation of several log entries, or can
be inferred from the combination of the (non-sensitive) information contained
within system logs with other logs and/or additional datasets. The analysis of
system logs containing sensitive information compromises data privacy.
Therefore, various anonymization techniques, such as generalization and
suppression have been employed, over the years, by data and computing centers
to protect the privacy of their users, their data, and the system as a whole.
Privacy-preserving data resulting from anonymization via generalization and
suppression may lead to significantly decreased data usefulness, thus,
hindering the intended analysis for understanding the system behavior.
Maintaining a balance between data usefulness and privacy preservation,
therefore, remains an open and important challenge. Irreversible encoding of
system logs using collision-resistant hashing algorithms, such as SHAKE-128, is
a novel approach previously introduced by the authors to mitigate data privacy
concerns. The present work describes a study of the applicability of the
encoding approach from earlier work on the system logs of a production high
performance computing system. Moreover, a metric is introduced to assess the
data usefulness of the anonymized system logs to detect and identify the
failures encountered in the system.Comment: 11 pages, 3 figures, submitted to 17th IEEE International Symposium
on Parallel and Distributed Computin
rDLB: A Novel Approach for Robust Dynamic Load Balancing of Scientific Applications with Parallel Independent Tasks
Scientific applications often contain large and computationally intensive
parallel loops. Dynamic loop self scheduling (DLS) is used to achieve a
balanced load execution of such applications on high performance computing
(HPC) systems. Large HPC systems are vulnerable to processors or node failures
and perturbations in the availability of resources. Most self-scheduling
approaches do not consider fault-tolerant scheduling or depend on failure or
perturbation detection and react by rescheduling failed tasks. In this work, a
robust dynamic load balancing (rDLB) approach is proposed for the robust self
scheduling of independent tasks. The proposed approach is proactive and does
not depend on failure or perturbation detection. The theoretical analysis of
the proposed approach shows that it is linearly scalable and its cost decrease
quadratically by increasing the system size. rDLB is integrated into an MPI DLS
library to evaluate its performance experimentally with two computationally
intensive scientific applications. Results show that rDLB enables the tolerance
of up to (P minus one) processor failures, where P is the number of processors
executing an application. In the presence of perturbations, rDLB boosted the
robustness of DLS techniques up to 30 times and decreased application execution
time up to 7 times compared to their counterparts without rDLB
Efficient Generation of Parallel Spin-images Using Dynamic Loop Scheduling
High performance computing (HPC) systems underwent a significant increase in
their processing capabilities. Modern HPC systems combine large numbers of
homogeneous and heterogeneous computing resources. Scalability is, therefore,
an essential aspect of scientific applications to efficiently exploit the
massive parallelism of modern HPC systems. This work introduces an efficient
version of the parallel spin-image algorithm (PSIA), called EPSIA. The PSIA is
a parallel version of the spin-image algorithm (SIA). The (P)SIA is used in
various domains, such as 3D object recognition, categorization, and 3D face
recognition. EPSIA refers to the extended version of the PSIA that integrates
various well-known dynamic loop scheduling (DLS) techniques. The present work:
(1) Proposes EPSIA, a novel flexible version of PSIA; (2) Showcases the
benefits of applying DLS techniques for optimizing the performance of the PSIA;
(3) Assesses the performance of the proposed EPSIA by conducting several
scalability experiments. The performance results are promising and show that
using well-known DLS techniques, the performance of the EPSIA outperforms the
performance of the PSIA by a factor of 1.2 and 2 for homogeneous and
heterogeneous computing resources, respectively
Performance Reproduction and Prediction of Selected Dynamic Loop Scheduling Experiments
Scientific applications are complex, large, and often exhibit irregular and
stochastic behavior. The use of efficient loop scheduling techniques in
computationally-intensive applications is crucial for improving their
performance on high-performance computing (HPC) platforms. A number of dynamic
loop scheduling (DLS) techniques have been proposed between the late 1980s and
early 2000s, and efficiently used in scientific applications. In most cases,
the computing systems on which they have been tested and validated are no
longer available. This work is concerned with the minimization of the sources
of uncertainty in the implementation of DLS techniques to avoid unnecessary
influences on the performance of scientific applications. Therefore, it is
important to ensure that the DLS techniques employed in scientific applications
today adhere to their original design goals and specifications. The goal of
this work is to attain and increase the trust in the implementation of DLS
techniques in present studies. To achieve this goal, the performance of a
selection of scheduling experiments from the 1992 original work that introduced
factoring is reproduced and predicted via both, simulative and native
experimentation. The experiments show that the simulation reproduces the
performance achieved on the past computing platform and accurately predicts the
performance achieved on the present computing platform. The performance
reproduction and prediction confirm that the present implementation of the DLS
techniques considered both, in simulation and natively, adheres to their
original description. The results confirm the hypothesis that reproducing
experiments of identical scheduling scenarios on past and modern hardware leads
to an entirely different behavior from expected
The Power of Localization for Efficiently Learning Linear Separators with Noise
We introduce a new approach for designing computationally efficient learning
algorithms that are tolerant to noise, and demonstrate its effectiveness by
designing algorithms with improved noise tolerance guarantees for learning
linear separators.
We consider both the malicious noise model and the adversarial label noise
model. For malicious noise, where the adversary can corrupt both the label and
the features, we provide a polynomial-time algorithm for learning linear
separators in under isotropic log-concave distributions that can
tolerate a nearly information-theoretically optimal noise rate of . For the adversarial label noise model, where the
distribution over the feature vectors is unchanged, and the overall probability
of a noisy label is constrained to be at most , we also give a
polynomial-time algorithm for learning linear separators in under
isotropic log-concave distributions that can handle a noise rate of .
We show that, in the active learning model, our algorithms achieve a label
complexity whose dependence on the error parameter is
polylogarithmic. This provides the first polynomial-time active learning
algorithm for learning linear separators in the presence of malicious noise or
adversarial label noise.Comment: Contains improved label complexity analysis communicated to us by
Steve Hannek
THE MAIN RESULTS OF ARCHAEOLOGICAL RESEARCH AT REÈCA-ROMULA (1869-2019) AND CONSIDERATIONS ON THE GEOPHYSICAL APPROACH â PART II
Book Chapter in INSIGHTS OF GEOSCIENCES FOR NATURAL HAZARDS AND CULTURAL HERITAGE, Editor: Florina CHITE
- âŠ