Search CORE

4 research outputs found

Performance Estimation for Task Graphs Combining Sequential Path Profiling and Control Dependence Regions

Author: A. Tumeo
C. Pilato
F. Ferrandi
M. Lattuada
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2009
Field of study

The speed-up estimation of parallelized code is crucial to efficiently compare different parallelization techniques or task graph transformations. Unfortunately, most of the time, during the parallelization of a specification, the information that can be extracted by profiling the corresponding sequential code (e.g. the most executed paths) are not properly taken into account. In particular, correlating sequential path profiling with the corresponding parallelized code can help in the identification of code hot spots, opening new possibilities for automatic parallelization. For this reason, starting from a well-known profiling technique, the Efficient Path Profiling, we propose a methodology that estimates the speed-up of a parallelized specification, just using the corresponding hierarchical task graph representation and the information coming from the dynamic profiling of the initial sequential specification. Experimental results show that the proposed solution outperforms existing approaches

Archivio istituzionale della ricerca - Politecnico di Milano

Crossref

Accurate, efficient, and adaptive calling context profiling

Author: Arnold M.
Bernat A. R.
Cormen T. H.
Feller P. T.
Grcevski N.
Harold W. Cain
Hazelwood K.
Hirzel M.
Jong-Deok Choi
Mauricio J. Serrano
Xiaotong Zhuang
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Profile based optimization techniques for large scale applications

Author: 安江俊明
Publication venue: [出版者不明]
Publication date: 01/03/2007
Field of study

制度:新 ; 文部省報告番号:乙2095号 ; 学位の種類:博士(工学) ; 授与年月日:2007/3/24 ; 早大学位記番号:新454

Waseda University Repository

Performance analysis for parallel programs from multicore to petascale

Author: Tallent Nathan Russell
Publication venue
Publication date: 01/01/2010
Field of study

Cutting-edge science and engineering applications require petascale computing. Petascale computing platforms are characterized by both extreme parallelism (systems of hundreds of thousands to millions of cores) and hybrid parallelism (nodes with multicore chips). Consequently, to effectively use petascale resources, applications must exploit concurrency at both the node and system level --- a difficult problem. The challenge of developing scalable petascale applications is only partially aided by existing languages and compilers. As a result, manual performance tuning is often necessary to identify and resolve poor parallel and serial efficiency. Our thesis is that it is possible to achieve unique, accurate, and actionable insight into the performance of fully optimized parallel programs by measuring them with asynchronous-sampling-based call path profiles; attributing the resulting binary-level measurements to source code structure; analyzing measurements on-the-fly and postmortem to highlight performance inefficiencies; and presenting the resulting context- sensitive metrics in three complementary views. To support this thesis, we have developed several techniques for identifying performance problems in fully optimized serial, multithreaded and petascale programs. First, we describe how to attribute very precise (instruction-level) measurements to source-level static and dynamic contexts in fully optimized applications --- all for an average run-time overhead of a few percent. We then generalize this work with the development of logical call path profiling and apply it to work-stealing-based applications. Second, we describe techniques for pinpointing and quantifying parallel inefficiencies such as parallel idleness, parallel overhead and lock contention in multithreaded executions. Third, we show how to diagnose scalability bottlenecks in petascale applications by scaling our our measurement, analysis and presentation tools to support large-scale executions. Finally, we provide a coherent framework for these techniques by sketching a unique and comprehensive performance analysis methodology. This work forms the basis of Rice University's HPCTOOLKIT performance tools

DSpace at Rice University