4,795 research outputs found
Performance and Power Analysis of HPC Workloads on Heterogenous Multi-Node Clusters
Performance analysis tools allow application developers to identify and characterize the inefficiencies that cause performance degradation in their codes, allowing for application optimizations. Due to the increasing interest in the High Performance Computing (HPC) community towards energy-efficiency issues, it is of paramount importance to be able to correlate performance and power figures within the same profiling and analysis tools. For this reason, we present a performance and energy-efficiency study aimed at demonstrating how a single tool can be used to collect most of the relevant metrics. In particular, we show how the same analysis techniques can be applicable on different architectures, analyzing the same HPC application on a high-end and a low-power cluster. The former cluster embeds Intel Haswell CPUs and NVIDIA K80 GPUs, while the latter is made up of NVIDIA Jetson TX1 boards, each hosting an Arm Cortex-A57 CPU and an NVIDIA Tegra X1 Maxwell GPU.The research leading to these results has received funding from the European Community’s Seventh Framework Programme [FP7/2007-2013] and Horizon 2020 under the Mont-Blanc projects [17], grant agreements n. 288777, 610402 and 671697. E.C. was partially founded by “Contributo 5 per mille assegnato all’Università degli Studi di Ferrara-dichiarazione dei redditi dell’anno 2014”. We thank the University of Ferrara and INFN Ferrara for the access to the COKA Cluster. We warmly thank the BSC tools group, supporting us for the smooth integration and test of our setup within Extrae and Paraver.Peer ReviewedPostprint (published version
Workflow Partitioning and Deployment on the Cloud using Orchestra
Orchestrating service-oriented workflows is typically based on a design model
that routes both data and control through a single point - the centralised
workflow engine. This causes scalability problems that include the unnecessary
consumption of the network bandwidth, high latency in transmitting data between
the services, and performance bottlenecks. These problems are highly prominent
when orchestrating workflows that are composed from services dispersed across
distant geographical locations. This paper presents a novel workflow
partitioning approach, which attempts to improve the scalability of
orchestrating large-scale workflows. It permits the workflow computation to be
moved towards the services providing the data in order to garner optimal
performance results. This is achieved by decomposing the workflow into smaller
sub workflows for parallel execution, and determining the most appropriate
network locations to which these sub workflows are transmitted and subsequently
executed. This paper demonstrates the efficiency of our approach using a set of
experimental workflows that are orchestrated over Amazon EC2 and across several
geographic network regions.Comment: To appear in Proceedings of the IEEE/ACM 7th International Conference
on Utility and Cloud Computing (UCC 2014
A Convex Formulation for Spectral Shrunk Clustering
Spectral clustering is a fundamental technique in the field of data mining
and information processing. Most existing spectral clustering algorithms
integrate dimensionality reduction into the clustering process assisted by
manifold learning in the original space. However, the manifold in
reduced-dimensional subspace is likely to exhibit altered properties in
contrast with the original space. Thus, applying manifold information obtained
from the original space to the clustering process in a low-dimensional subspace
is prone to inferior performance. Aiming to address this issue, we propose a
novel convex algorithm that mines the manifold structure in the low-dimensional
subspace. In addition, our unified learning process makes the manifold learning
particularly tailored for the clustering. Compared with other related methods,
the proposed algorithm results in more structured clustering result. To
validate the efficacy of the proposed algorithm, we perform extensive
experiments on several benchmark datasets in comparison with some
state-of-the-art clustering approaches. The experimental results demonstrate
that the proposed algorithm has quite promising clustering performance.Comment: AAAI201
A Global Optimisation Toolbox for Massively Parallel Engineering Optimisation
A software platform for global optimisation, called PaGMO, has been developed
within the Advanced Concepts Team (ACT) at the European Space Agency, and was
recently released as an open-source project. PaGMO is built to tackle
high-dimensional global optimisation problems, and it has been successfully
used to find solutions to real-life engineering problems among which the
preliminary design of interplanetary spacecraft trajectories - both chemical
(including multiple flybys and deep-space maneuvers) and low-thrust (limited,
at the moment, to single phase trajectories), the inverse design of
nano-structured radiators and the design of non-reactive controllers for
planetary rovers. Featuring an arsenal of global and local optimisation
algorithms (including genetic algorithms, differential evolution, simulated
annealing, particle swarm optimisation, compass search, improved harmony
search, and various interfaces to libraries for local optimisation such as
SNOPT, IPOPT, GSL and NLopt), PaGMO is at its core a C++ library which employs
an object-oriented architecture providing a clean and easily-extensible
optimisation framework. Adoption of multi-threaded programming ensures the
efficient exploitation of modern multi-core architectures and allows for a
straightforward implementation of the island model paradigm, in which multiple
populations of candidate solutions asynchronously exchange information in order
to speed-up and improve the optimisation process. In addition to the C++
interface, PaGMO's capabilities are exposed to the high-level language Python,
so that it is possible to easily use PaGMO in an interactive session and take
advantage of the numerous scientific Python libraries available.Comment: To be presented at 'ICATT 2010: International Conference on
Astrodynamics Tools and Techniques
Idle Period Propagation in Message-Passing Applications
Idle periods on different processes of Message Passing applications are
unavoidable. While the origin of idle periods on a single process is well
understood as the effect of system and architectural random delays, yet it is
unclear how these idle periods propagate from one process to another. It is
important to understand idle period propagation in Message Passing applications
as it allows application developers to design communication patterns avoiding
idle period propagation and the consequent performance degradation in their
applications. To understand idle period propagation, we introduce a methodology
to trace idle periods when a process is waiting for data from a remote delayed
process in MPI applications. We apply this technique in an MPI application that
solves the heat equation to study idle period propagation on three different
systems. We confirm that idle periods move between processes in the form of
waves and that there are different stages in idle period propagation. Our
methodology enables us to identify a self-synchronization phenomenon that
occurs on two systems where some processes run slower than the other processes.Comment: 18th International Conference on High Performance Computing and
Communications, IEEE, 201
- …