5 research outputs found
Automated and accurate cache behavior analysis for codes with irregular access patterns
This is the peer reviewed version of the following article: Andrade, D. , Arenaz, M. , Fraguela, B. B., Touriño, J. and Doallo, R. (2007), Automated and accurate cache behavior analysis for codes with irregular access patterns. Concurrency Computat.: Pract. Exper., 19: 2407-2423. doi:10.1002/cpe.1173, which has been published in final form at https://doi.org/10.1002/cpe.1173. This article may be used for non-commercial purposes in accordance with Wiley Terms and Conditions for Use of Self-Archived Versions.[Abstract] The memory hierarchy plays an essential role in the performance of current computers, so good analysis tools that help in predicting and understanding its behavior are required. Analytical modeling is the ideal base for such tools if its traditional limitations in accuracy and scope of application can be overcome. While there has been extensive research on the modeling of codes with regular access patterns, less attention has been paid to codes with irregular patterns due to the increased difficulty in analyzing them. Nevertheless, many important applications exhibit this kind of pattern, and their lack of locality make them more cache‐demanding, which makes their study more relevant. The focus of this paper is the automation of the Probabilistic Miss Equations (PME) model, an analytical model of the cache behavior that provides fast and accurate predictions for codes with irregular access patterns. The information requirements of the PME model are defined and its integration in the XARK compiler, a research compiler oriented to automatic kernel recognition in scientific codes, is described. We show how to exploit the powerful information‐gathering capabilities provided by this compiler to allow the automated modeling of loop‐oriented scientific codes. Experimental results that validate the correctness of the automated PME model are also presented.Ministerio de Educación y Ciencia; TIN2004-07797-C02Xunta de Galicia; PGIDIT03TIC10502PRXunta de Galicia; PGIDT05PXIC10504P
Iterative Compilation and Performance Prediction for Numerical Applications
Institute for Computing Systems ArchitectureAs the current rate of improvement in processor performance far exceeds the rate
of memory performance, memory latency is the dominant overhead in many
performance critical applications. In many cases, automatic compiler-based
approaches to improving memory performance are limited and programmers
frequently resort to manual optimisation techniques. However, this process is tedious
and time-consuming. Furthermore, a diverse range of a rapidly evolving hardware
makes the optimisation process even more complex. It is often hard to predict the
potential benefits from different optimisations and there are no simple criteria to stop
optimisations i.e. when optimal memory performance has been achieved or
sufficiently approached.
This thesis presents a platform independent optimisation approach for numerical
applications based on iterative feedback-directed program restructuring using a new
reasonably fast and accurate performance prediction technique for guiding
optimisations. New strategies for searching the optimisation space, by means of
profiling to find the best possible program variant, have been developed. These
strategies have been evaluated using a range of kernels and programs on different
platforms and operating systems. A significant performance improvement has been
achieved using new approaches when compared to the state-of-the-art native static
and platform-specific feedback directed compilers
Analyzing data locality in numeric applications
In this article, we introduce SPLAT (Static and Profiled Data Locality Analysis Tool). The tool's purpose is to provide a fast study of memory behavior without the necessity of a costly memory simulator. SPLAT consists of a static locality analysis enhanced by simple profiling data. Its overhead is low because it performs most of the analysis at compile time, and because the required profiling support is just a basic-block-execution count. Many commercial compilers support this profiling option. Compared with simulation techniques, SPLAT's estimation technique is highly accurate for numeric codes.Peer Reviewe