6 research outputs found

    Minimizing register usage penalty at procedure calls

    Full text link

    Optimizations In Compiler: Vectorization, Reordering, Register Allocation And Verification Of Explicitly Parallel Programs

    Get PDF
    Compiler Optimizations form a very important part of compiler development as they make a major difeerence between an average and a great compiler. There are various modules of a compiler-which opens opportunities for optimizations on various spheres. In this thesis, a comparative study of vectorization is done exposing the strengths and weaknesses of various contemporary compilers. Additionally, a study on the impact of vectorization on tiled code is performed. Different strategies for loop nest optimization is explored. An algorithm for statement reordering in loops to enhance performance has been developed. An Integer Linear Program formulation is done to improve loop parallelism, which makes use of loop unrolling and explicitly parallel directives. Finally, an attempt for optimal loop distribution is made. Following loop nest optimization chapter, an explanation of interprocedural register allocation(IPRA) for ARM32 and AArch64 is given. Additionally, a brief description of the problems for implementing IPRA for those architectures is presented. We conclude the chapter with the performance results with IPRA for those platforms. In the last chapter, a description of VoPiL, a static OpenMP verifier in LLVM, is presented. A brief description of the analysis and the results are included

    Minimizing register usage penalty at procedure calls

    No full text

    Speeding Up Thread-local Storage Access In Dynamic Libraries

    No full text
    As multi-core processors become the norm rather than the exception, multi-threaded programming is expected to expand from its current niches to more widespread use, in software components that have not traditionally been concerned about exploiting concurrency. Accessing thread-local storage (TLS) from within dynamic libraries has traditionally required calling a function to obtain the thread-local address of the variable. Such function calls are several times slower than typical addressing code that is used in executables. While instructions used in executables can assume thread-local variables are at a constant offset within the thread Static TLS block, dynamic libraries loaded during program execution may not even assume that their thread-local variables are in Static TLS blocks. Since libraries are most commonly loaded as dependencies of executables or other libraries, before a program starts running, the most common TLS case is that of constant offsets. This paper proposes an access model that enables dynamic libraries to take advantage of this fact, without giving up the ability to be loaded during program execution. This new model was implemented and tested on GNU/Linux systems, initially on the Fujitsu FR-V architecture, and later on IA32 and AMD64/EM64T, such that performance could be compared with that of the existing models. Experimental results revealed the new model consistently exceeds the old model in terms of performance, particularly in the most common case, where the speedup is often well over 2x, bringing it nearly to the same performance of access models used in plain executables.159178Sutter, H., The free lunch is over: A fundamental turn toward concurrency in software (2005) Dr. Dobb's Journal, 30 (3). , http://www.gotw.ca/publications/concurrency-ddj.htmOlukotun, K., Hammond, L., The future of microprocessors (2005) ACM Queue, 3 (7), pp. 26-34. , September(2004) Portable Operating System Interface (POSIX), the Base Specifications, (6). , Portable Applications Standards Committee of the IEEE Computer Society and The Open Group IEEE Std 1003.1 Incorporating Technical Corrigendum 1 and Technical Corrigendum 2Drepper, U., (2003) ELF Handling for Thread-Local Storage, , http://people.redhat.com/drepper/tls.pdf, February Version 0.20Levine, J.R., (1999) Linkers and Loaders, , Morgan Kaufmann, OctoberBoehm, H.J., Fast multiprocessor memory allocation and garbage collection (2000) Technical Report 165, HP LabsOliva, A., Hernandez, A., (2004) The FR-V Thread-local Storage ABI, , http://people.redhat.com/aoliva/writeups/FR-V/FDPIC-TLS-ABI.txt, December Version 0.22Oliva, A., (2005) Thread-Local Storage Descriptors for IA32 and AMD64/EM64T, , http://people.redhat.com/aoliva/writeups/TLS/RFC-TLSDESC-x86.txt, October Version 0.9.4Chow, F.C., Minimizing register usage penalty at procedure calls (1988) PLDI '88: Proceedings of the ACM SIGPLAN 1988 Conference on Programming Language Design and Implementation, pp. 85-94. , ACM PressMuchnick, S.S., (1997) Advanced Compiler Design and Implementation, , Morgan KaufmannBuettner, K., Oliva, A., Henderson, R., (2004) The FR-V FDPIC ABI, , http://people.redhat.com/aoliva/writeups/FR-V/FDPIC-ABI.txt, April Version 1.0(2001) Intel Itanium Processor-specific Application Binary Interface (ABI), , http://refspecs.freestandards.org/elf/IA64-SysV-psABI.pdf, MayTaylor, I.L., (2003) 64-bit PowerPC ELF Application Binary Interface Supplement, , http://www.linuxbase.org/spec/ELF/ppc64/PPC-elf64abi-1.7.pdf, September 1.7 EditionDrepper, U., Molnar, I., (2005) The Native POSIX Thread Library for Linux, , http://people.redhat.com/drepper/nptl-design.pdf, February(2005) GOMP - An OpenMP Implementation for GCC, , http://gcc.gnu.org/projects/gomp/, November(2005) OpenMP Application Programming Interface, , http://www.openmp.org/drupal/mp-documents/spec25.pdf, OpenMP Architecture Review Board May Version 2.
    corecore