164,332 research outputs found
Development of an oceanographic application in HPC
High Performance Computing (HPC) is used for running advanced application programs
efficiently, reliably, and quickly.
In earlier decades, performance analysis of HPC applications was evaluated based on
speed, scalability of threads, memory hierarchy. Now, it is essential to consider the
energy or the power consumed by the system while executing an application.
In fact, the High Power Consumption (HPC) is one of biggest problems for the High
Performance Computing (HPC) community and one of the major obstacles for exascale
systems design.
The new generations of HPC systems intend to achieve exaflop performances and will
demand even more energy to processing and cooling. Nowadays, the growth of HPC
systems is limited by energy issues
Recently, many research centers have focused the attention on doing an automatic tuning
of HPC applications which require a wide study of HPC applications in terms of power
efficiency.
In this context, this paper aims to propose the study of an oceanographic application,
named OceanVar, that implements Domain Decomposition based 4D Variational model
(DD-4DVar), one of the most commonly used HPC applications, going to evaluate not
only the classic aspects of performance but also aspects related to power efficiency in
different case of studies.
These work were realized at Bsc (Barcelona Supercomputing Center), Spain within the
Mont-Blanc project, performing the test first on HCA server with Intel technology and then on a mini-cluster Thunder with ARM technology.
In this work of thesis it was initially explained the concept of assimilation date, the
context in which it is developed, and a brief description of the mathematical model
4DVAR.
After this problem’s close examination, it was performed a porting from Matlab
description of the problem of data-assimilation to its sequential version in C language.
Secondly, after identifying the most onerous computational kernels in order of time, it
has been developed a parallel version of the application with a parallel multiprocessor
programming style, using the MPI (Message Passing Interface) protocol.
The experiments results, in terms of performance, have shown that, in the case of
running on HCA server, an Intel architecture, values of efficiency of the two most
onerous functions obtained, growing the number of process, are approximately equal to
80%.
In the case of running on ARM architecture, specifically on Thunder mini-cluster,
instead, the trend obtained is labeled as "SuperLinear Speedup" and, in our case, it can
be explained by a more efficient use of resources (cache memory access) compared with
the sequential case.
In the second part of this paper was presented an analysis of the some issues of this
application that has impact in the energy efficiency.
After a brief discussion about the energy consumption characteristics of the Thunder
chip in technological landscape, through the use of a power consumption detector, the
Yokogawa Power Meter, values of energy consumption of mini-cluster Thunder were
evaluated in order to determine an overview on the power-to-solution of this application
to use as the basic standard for successive analysis with other parallel styles.
Finally, a comprehensive performance evaluation, targeted to estimate the goodness of
MPI parallelization, is conducted using a suitable performance tool named Paraver,
developed by BSC.
Paraver is such a performance analysis and visualisation tool which can be used to
analyse MPI, threaded or mixed mode programmes and represents the key to perform a parallel profiling and to optimise the code for High Performance Computing.
A set of graphical representation of these statistics make it easy for a developer to
identify performance problems. Some of the problems that can be easily identified are
load imbalanced decompositions, excessive communication overheads and poor average
floating operations per second achieved.
Paraver can also report statistics based on hardware counters, which are provided by the
underlying hardware.
This project aimed to use Paraver configuration files to allow certain metrics to be
analysed for this application.
To explain in some way the performance trend obtained in the case of analysis on the
mini-cluster Thunder, the tracks were extracted from various case of studies and the
results achieved is what expected, that is a drastic drop of cache misses by the case ppn
(process per node) = 1 to case ppn = 16.
This in some way explains a more efficient use of cluster resources with an increase of
the number of processes
LUNES: Agent-based Simulation of P2P Systems (Extended Version)
We present LUNES, an agent-based Large Unstructured NEtwork Simulator, which
allows to simulate complex networks composed of a high number of nodes. LUNES
is modular, since it splits the three phases of network topology creation,
protocol simulation and performance evaluation. This permits to easily
integrate external software tools into the main software architecture. The
simulation of the interaction protocols among network nodes is performed via a
simulation middleware that supports both the sequential and the
parallel/distributed simulation approaches. In the latter case, a specific
mechanism for the communication overhead-reduction is used; this guarantees
high levels of performance and scalability. To demonstrate the efficiency of
LUNES, we test the simulator with gossip protocols executed on top of networks
(representing peer-to-peer overlays), generated with different topologies.
Results demonstrate the effectiveness of the proposed approach.Comment: Proceedings of the International Workshop on Modeling and Simulation
of Peer-to-Peer Architectures and Systems (MOSPAS 2011). As part of the 2011
International Conference on High Performance Computing and Simulation (HPCS
2011
Efficient Parallel Statistical Model Checking of Biochemical Networks
We consider the problem of verifying stochastic models of biochemical
networks against behavioral properties expressed in temporal logic terms. Exact
probabilistic verification approaches such as, for example, CSL/PCTL model
checking, are undermined by a huge computational demand which rule them out for
most real case studies. Less demanding approaches, such as statistical model
checking, estimate the likelihood that a property is satisfied by sampling
executions out of the stochastic model. We propose a methodology for
efficiently estimating the likelihood that a LTL property P holds of a
stochastic model of a biochemical network. As with other statistical
verification techniques, the methodology we propose uses a stochastic
simulation algorithm for generating execution samples, however there are three
key aspects that improve the efficiency: first, the sample generation is driven
by on-the-fly verification of P which results in optimal overall simulation
time. Second, the confidence interval estimation for the probability of P to
hold is based on an efficient variant of the Wilson method which ensures a
faster convergence. Third, the whole methodology is designed according to a
parallel fashion and a prototype software tool has been implemented that
performs the sampling/verification process in parallel over an HPC
architecture
Formal and Informal Methods for Multi-Core Design Space Exploration
We propose a tool-supported methodology for design-space exploration for
embedded systems. It provides means to define high-level models of applications
and multi-processor architectures and evaluate the performance of different
deployment (mapping, scheduling) strategies while taking uncertainty into
account. We argue that this extension of the scope of formal verification is
important for the viability of the domain.Comment: In Proceedings QAPL 2014, arXiv:1406.156
Synthesis and Stochastic Assessment of Cost-Optimal Schedules
We present a novel approach to synthesize good schedules for a class
of scheduling problems that is slightly more general than the
scheduling problem FJm,a|gpr,r_j,d_j|early/tardy. The idea is to prime
the schedule synthesizer with stochastic information more meaningful
than performance factors with the objective to minimize the expected
cost caused by storage or delay. The priming information is
obtained by stochastic simulation of the system environment. The generated
schedules are assessed again by simulation. The approach is
demonstrated by means of a non-trivial scheduling problem from
lacquer production. The experimental results show that our approach
achieves in all considered scenarios better results than the
extended processing times approach
- …