67 research outputs found
The Backstroke framework for source level reverse computation applied to parallel discrete event simulation
This report introduces Backstroke, a new open source framework for the automatic generation of reverse code for functions written in C++. Backstroke enables reverse computation for optimistic parallel discrete event simulations. It is built over the ROSE open- source compiler infrastructure, and handles complex C++ features including pointers and pointer types, arrays, function and method calls, class types. inheritance, polymorphism, virtual functions, abstract classes, templated classes and containers. Backstroke also introduces new program inversion techniques based on advanced compiler analysis tools built into ROSE. We explore and illustrate some of the complex language and semantic issues that arise in generating correct reverse code for C++ functions
Tool Support for Inspecting the Code Quality of HPC Applications
The nature of HPC application development encourages ad hoc design and implementation, rather than formal requirements analysis and design specification as is typical in software engineering. However, we cannot simply expect HPC developers to adopt formal software engineering processes wholesale, even while there is a need to improve software structure and quality to ensure future maintainability. Therefore, we propose tools that HPC developers can use at their discretion to obtain feedback on the structure and quality of their codes. This feedback would come in the form of code quality metrics and analyses, presented when necessary in intuitive and interactive visualizations. This paper summarizes our implementation of just such a tool, which we apply to a standard HPC benchmark as ''proof-of-concept.'
An Extensible Open-Source Compiler Infrastructure for Testing
Testing forms a critical part of the development process for large-scale software, and there is growing need for automated tools that can read, represent, analyze, and transform the application's source code to help carry out testing tasks. However, the support required to compile applications written in common general purpose languages is generally inaccessible to the testing research community. In this paper, we report on an extensible, open-source compiler infrastructure called ROSE, which is currently in development at Lawrence Livermore National Laboratory. ROSE specifically targets developers who wish to build source-based tools that implement customized analyses and optimizations for large-scale C, C++, and Fortran90 scientific computing applications (on the order of a million lines of code or more). However, much of this infrastructure can also be used to address problems in testing, and ROSE is by design broadly accessible to those without a formal compiler background. This paper details the interactions between testing of applications and the ways in which compiler technology can aid in the understanding of those applications. We emphasize the particular aspects of ROSE, such as support for the general analysis of whole programs, that are particularly well-suited to the testing research community and the scale of the problems that community solves
Recommended from our members
Analyzing and Visualizing Whole Program Architectures
This paper describes our work to develop new tool support for analyzing and visualizing the architecture of complete large-scale (millions or more lines of code) programs. Our approach consists of (i) creating a compact, accurate representation of a whole C or C++ program, (ii) analyzing the program in this representation, and (iii) visualizing the analysis results with respect to the program's architecture. We have implemented our approach by extending and combining a compiler infrastructure and a program visualization tool, and we believe our work will be of broad interest to those engaged in a variety of program understanding and transformation tasks. We have added new whole-program analysis support to ROSE [15, 14], a source-to-source C/C++ compiler infrastructure for creating customized analysis and transformation tools. Our whole-program work does not rely on procedure summaries; rather, we preserve all of the information present in the source while keeping our representation compact. In our representation, a million-line application fits in well less than 1 GB of memory. Because whole-program analyses can generate large amounts of data, we believe that abstracting and visualizing analysis results at the architecture level is critical to reducing the cognitive burden on the consumer of the analysis results. Therefore, we have extended Vizz3D [19], an interactive program visualization tool, with an appropriate metaphor and layout algorithm for representing a program's architecture. Our implementation provides developers with an intuitive, interactive way to view analysis results, such as those produced by ROSE, in the context of the program's architecture. The remainder of this paper summarizes our approach to whole-program analysis (Section 2) and provides an example of how we visualize the analysis results (Section 3)
Autotuning Algorithmic Choice for Input Sensitivity
Empirical autotuning is increasingly being used in many domains to achieve optimized performance in a variety of different execution environments. A daunting challenge faced by such autotuners is input sensitivity, where the best autotuned configuration may vary with different input sets. In this paper, we propose a two level solution that: first, clusters to find input sets that are similar in input feature space; then, uses an evolutionary autotuner to build an optimized program for each of these clusters; and, finally, builds an adaptive overhead aware classifier which assigns each input to a specific input optimized program. Our approach addresses the complex trade-off between using expensive features, to accurately characterize an input, and cheaper features, which can be computed with less overhead. Experimental results show that by adapting to different inputs one can obtain up to a 3x speedup over using a single configuration for all inputs
Size-Selective Personal Air Sampling: A New Approach Using Porous Foams
Simultaneous sampling of three dust fractions (inhalable, thoracic, respirable) has been achieved using porous polyurethane foams, which serve both as selecting and sampling media. The particle penetration was measured in laboratory tests. Foam geometries were predicted using a semi-empirical model. Prototype samplers were constructed based on the IOM and GSP inhalable personal samplers. Weighing and chemical analysis procedures were checked for the foam
A fast sparse block circulant matrix vector product
In the context of computed tomography (CT), iterative image reconstruction techniques are gaining attention because high-quality images are becoming computationally feasible. They involve the solution of large systems of equations, whose cost is dominated by the sparse matrix vector product (SpMV). Our work considers the case of the sparse matrices being block circulant, which arises when taking advantage of the rotational symmetry in the tomographic system. Besides the straightforward storage saving, we exploit the circulant structure to rewrite the poor-performance SpMVs into a high-performance product between sparse and dense matrices. This paper describes the implementations developed for multi-core CPUs and GPUs, and presents experimental results with typical CT matrices. The presented approach is up to ten times faster than without exploiting the circulant structure.Romero Alcalde, E.; Tomás DomĂnguez, AE.; Soriano Asensi, A.; Blanquer Espert, I. (2014). A fast sparse block circulant matrix vector product. En Euro-Par 2014 Parallel Processing. Springer. 548-559. doi:10.1007/978-3-319-09873-9_46S548559Bian, J., Siewerdsen, J.H., Han, X., Sidky, E.Y., Prince, J.L., Pelizzari, C.A., Pal, X.: Evaluation of sparse-view reconstruction from flat-panel-detector cone-beam ct. Physics in Medicine and Biology 55, 6575–6599 (2010)Dalton, S., Bell, N.: CUSP: A C++ templated sparse matrix library version 0.4.0 (2014), http://cusplibrary.github.com/Feldkamp, L., Davis, L., Kress, J.: Practical cone-beam algorithm. Journal of the Optical Society of America 1, 612–619 (1984)Ganine, V., Legrand, M., Michalska, H., Pierre, C.: A sparse preconditioned iterative method for vibration analysis of geometrically mistuned bladed disks. Computers & Structures 87(5-6), 342–354 (2009)Hara, A.K., Paden, R.G., Silva, A.C., Kujak, J.L., Lawder, H.J., Pavlicek, W.: Iterative reconstruction technique for reducing body radiation dose at CT: Feasibility study. American Journal of Roentgenology 193, 764–771 (2009)Heroux, M.A., Bartlett, R.A., Howle, V.E., Hoekstra, R.J., Hu, J.J., Kolda, T.G., Lehoucq, R.B., Long, K.R., Pawlowski, R.P., Phipps, E.T., Salinger, A.G., Thornquist, H.K., Tuminaro, R.S., Willenbring, J.M., Williams, A., Stanley, K.S.: An overview of the Trilinos project. ACM Trans. Math. Softw. 31(3), 397–423 (2005)Im, E.J., Yelick, K., Vuduc, R.: Sparsity: Optimization framework for sparse matrix kernels. International Journal of High Performance Computing Applications 18(1), 135–158 (2004)Jones, E., Oliphant, T., Peterson, P., et al.: SciPy: Open source scientific tools for Python (2001), http://www.scipy.org/Kaveh, A., Rahami, H.: Block circulant matrices and applications in free vibration analysis of cyclically repetitive structures. Acta Mechanica 217(1-2), 51–62 (2011)Kourtis, K., Goumas, G., Koziris, N.: Optimizing sparse matrix-vector multiplication using index and value compression. In: Proceedings of the 5th Conference on Computing Frontiers, CF 2008, pp. 87–96. ACM, New York (2008)Krotkiewski, M., Dabrowski, M.: Parallel symmetric sparse matrix–vector product on scalar multi-core CPUs. Parallel Computing 36(4), 181–198 (2010)Lee, B., Vuduc, R., Demmel, J., Yelick, K.: Performance models for evaluation and automatic tuning of symmetric sparse matrix-vector multiply. In: International Conference on Parallel Processing, ICPP 2004, vol. 1, pp. 169–176 (2004)Leroux, J.D., Selivanov, V., Fontaine, R., Lecomte, R.: Accelerated iterative image reconstruction methods based on block-circulant system matrix derived from a cylindrical image representation. In: Nuclear Science Symposium Conference Record, NSS 2007, vol. 4, pp. 2764–2771. IEEE (2007)NVIDIA: CUSPARSE library (2014), https://developer.nvidia.com/cusparsePan, X., Sidky, E.Y., Vannier, M.: Why do commercial CT scanners still employ traditional, filtered back-projection for image reconstruction? Inverse Problems 25, 123009 (2008)RodrĂguez-Alvarez, M.J., Soriano, A., Iborra, A., Sánchez, F., González, A.J., Conde, P., Hernández, L., Moliner, L., Orero, A., Vidal, L.F., Benlloch, J.M.: Expectation maximization (EM) algorithms using polar symmetries for computed tomography CT image reconstruction. Computers in Biology and Medicine 43(8), 1053–1061 (2013)Sheep, L., Vardi, Y.: Maximum likelihood reconstruction for emmision tomography. IEEE Transactions on Medical Imaging 1, 113–122 (1982)Sidky, E.Y., Pan, X.: Image reconstruction in circular cone-beam computed tomography by constrained, total-variation minimization. Physics in Medicine and Biology 53, 4777–4807 (2008)Soriano, A., RodrĂguez-Alvarez, M.J., Iborra, A., Sánchez, F., Carles, M., Conde, P., González, A.J., Hernández, L., Moliner, L., Orero, A., Vidal, L.F., Benlloch, J.M.: EM tomographic image reconstruction using polar voxels. Journal of Instrumentation 8, C01004 (2013)Thibaudeau, C., Leroux, J.D., Pratte, J.F., Fontaine, R., Lecomte, R.: Cylindrical and spherical ray-tracing for ct iterative reconstruction. In: 2011 IEEE Nuclear Science Symposium and Medical Imaging Conference (NSS/MIC), pp. 4378–4381 (2011)Vuduc, R., Demmel, J.W., Yelick, K.A.: OSKI: A library of automatically tuned sparse matrix kernels. Journal of Physics: Conference Series 16(1), 521 (2005)Vuduc, R.W., Moon, H.-J.: Fast sparse matrix-vector multiplication by exploiting variable block structure. In: Yang, L.T., Rana, O.F., Di Martino, B., Dongarra, J. (eds.) HPCC 2005. LNCS, vol. 3726, pp. 807–816. Springer, Heidelberg (2005)Williams, S., Oliker, L., Vuduc, R., Shalf, J., Yelick, K., Demmel, J.: Optimization of sparse matrix-vector multiplication on emerging multicore platforms. Parallel Computing 35(3), 178–194 (2009
Recommended from our members
POET: Parameterized Optimization for Empirical Tuning
The excessive complexity of both machine architectures and applications have made it difficult for compilers to statically model and predict application behavior. This observation motivates the recent interest in performance tuning using empirical techniques. We present a new embedded scripting language, POET (Parameterized Optimization for Empirical Tuning), for parameterizing complex code transformations so that they can be empirically tuned. The POET language aims to significantly improve the generality, flexibility, and efficiency of existing empirical tuning systems. We have used the language to parameterize and to empirically tune three loop optimizations-interchange, blocking, and unrolling-for two linear algebra kernels. We show experimentally that the time required to tune these optimizations using POET, which does not require any program analysis, is significantly shorter than that when using a full compiler-based source-code optimizer which performs sophisticated program analysis and optimizations
Recommended from our members
Support for Whole-Program Analysis and the Verification of the One-Definition Rule in C++
We present a compact and accurate representation of a whole-program abstract syntax tree, and use it to detect a specific security vulnerability in C++ programs known as a One-Definition Rule (ODR) violation. The ODR states that types and functions appearing in multiple compilation units must be defined identically. However, no current compiler can enforce ODR because doing so requires the ability to see the full application source at once; where ODR is violated, the program is incorrect. Moreover, a lack of ODR enforcement makes a program vulnerable to the so-called VPTR exploit, in which an object's virtual function table is replaced by malicious code. Our representation of the whole program preserves all features of the source for analysis and transformation, and permits a million-line application to fit entirely in the memory of a workstation with 1 GB of RAM
Recommended from our members
Architectural Visualization of C/C++ Source Code for Program Comprehension
Structural and behavioral visualization of large-scale legacy systems to aid program comprehension is still a major challenge. The challenge is even greater when applications are implemented in flexible and expressive languages such as C and C++. In this paper, we consider visualization of static and dynamic aspects of large-scale scientific C/C++ applications. For our investigation, we reuse and integrate specialized analysis and visualization tools. Furthermore, we present a novel layout algorithm that permits a compressive architectural view of a large-scale software system. Our layout is unique in that it allows traditional program visualizations, i.e., graph structures, to be seen in relation to the application's file structure
- …