33,994 research outputs found
Beyond Reuse Distance Analysis: Dynamic Analysis for Characterization of Data Locality Potential
Emerging computer architectures will feature drastically decreased flops/byte
(ratio of peak processing rate to memory bandwidth) as highlighted by recent
studies on Exascale architectural trends. Further, flops are getting cheaper
while the energy cost of data movement is increasingly dominant. The
understanding and characterization of data locality properties of computations
is critical in order to guide efforts to enhance data locality. Reuse distance
analysis of memory address traces is a valuable tool to perform data locality
characterization of programs. A single reuse distance analysis can be used to
estimate the number of cache misses in a fully associative LRU cache of any
size, thereby providing estimates on the minimum bandwidth requirements at
different levels of the memory hierarchy to avoid being bandwidth bound.
However, such an analysis only holds for the particular execution order that
produced the trace. It cannot estimate potential improvement in data locality
through dependence preserving transformations that change the execution
schedule of the operations in the computation. In this article, we develop a
novel dynamic analysis approach to characterize the inherent locality
properties of a computation and thereby assess the potential for data locality
enhancement via dependence preserving transformations. The execution trace of a
code is analyzed to extract a computational directed acyclic graph (CDAG) of
the data dependences. The CDAG is then partitioned into convex subsets, and the
convex partitioning is used to reorder the operations in the execution trace to
enhance data locality. The approach enables us to go beyond reuse distance
analysis of a single specific order of execution of the operations of a
computation in characterization of its data locality properties. It can serve a
valuable role in identifying promising code regions for manual transformation,
as well as assessing the effectiveness of compiler transformations for data
locality enhancement. We demonstrate the effectiveness of the approach using a
number of benchmarks, including case studies where the potential shown by the
analysis is exploited to achieve lower data movement costs and better
performance.Comment: Transaction on Architecture and Code Optimization (2014
Parallel and Distributed Simulation from Many Cores to the Public Cloud (Extended Version)
In this tutorial paper, we will firstly review some basic simulation concepts
and then introduce the parallel and distributed simulation techniques in view
of some new challenges of today and tomorrow. More in particular, in the last
years there has been a wide diffusion of many cores architectures and we can
expect this trend to continue. On the other hand, the success of cloud
computing is strongly promoting the everything as a service paradigm. Is
parallel and distributed simulation ready for these new challenges? The current
approaches present many limitations in terms of usability and adaptivity: there
is a strong need for new evaluation metrics and for revising the currently
implemented mechanisms. In the last part of the paper, we propose a new
approach based on multi-agent systems for the simulation of complex systems. It
is possible to implement advanced techniques such as the migration of simulated
entities in order to build mechanisms that are both adaptive and very easy to
use. Adaptive mechanisms are able to significantly reduce the communication
cost in the parallel/distributed architectures, to implement load-balance
techniques and to cope with execution environments that are both variable and
dynamic. Finally, such mechanisms will be used to build simulations on top of
unreliable cloud services.Comment: Tutorial paper published in the Proceedings of the International
Conference on High Performance Computing and Simulation (HPCS 2011). Istanbul
(Turkey), IEEE, July 2011. ISBN 978-1-61284-382-
Policy-based techniques for self-managing parallel applications
This paper presents an empirical investigation of policy-based self-management techniques for parallel applications executing in loosely-coupled environments. The dynamic and heterogeneous nature of these environments is discussed and the special considerations for parallel applications are identified. An adaptive strategy for the run-time deployment of tasks of parallel applications is presented. The strategy is based on embedding numerous policies which are informed by contextual and environmental inputs. The policies govern various aspects of behaviour, enhancing flexibility so that the goals of efficiency and performance are achieved despite high levels of environmental variability. A prototype self-managing parallel application is used as a vehicle to explore the feasibility and benefits of the strategy. In particular, several aspects of stability are investigated. The implementation and behaviour of three policies are discussed and sample results examined
- …