2 research outputs found

    Assisting Static Compiler Vectorization with a Speculative Dynamic Vectorizer in an HW/SW Codesigned Environment

    Get PDF
    Compiler-based static vectorization is used widely to extract data-level parallelism from computation-intensive applications. Static vectorization is very effective in vectorizing traditional array-based applications. However, compilers' inability to do accurate interprocedural pointer disambiguation and interprocedural array dependence analysis severely limits vectorization opportunities. HW/SW codesigned processors provide an excellent opportunity to optimize the applications at runtime. The availability of dynamic application behavior at runtime helps in capturing vectorization opportunities generally missed by the compilers. This article proposes to complement the static vectorization with a speculative dynamic vectorizer in an HW/SW codesigned processor. We present a speculative dynamic vectorization algorithm that speculatively reorders ambiguous memory references to uncover vectorization opportunities. The speculative reordering of memory instructions avoids the need for accurate interprocedural pointer disambiguation and interprocedural array dependence analysis. The hardware checks for any memory dependence violation due to speculative vectorization and takes corrective action in case of violation. Our experiments show that the combined (static + dynamic) vectorization approach provides a 2× performance benefit compared to the static GCC vectorization alone, for SPECFP2006. Furthermore, the speculative dynamic vectorizer is able to vectorize 48% of the loops that ICC failed to vectorize due to conservative dependence analysis in the TSVC benchmark suite. Moreover, the dynamic vectorization scheme is as effective in vectorization of pointer-based applications as for the array-based ones, whereas compilers lose significant vectorization opportunities in pointer-based applications. Furthermore, we show that speculation is not only a luxury but also a necessity for runtime vectorization.Peer ReviewedPostprint (author's final draft

    SIMD Vectorization of Straight Line FFT Code

    No full text
    Abstract. This paper presents compiler technology that targets general purpose microprocessors augmented with SIMD execution units for exploiting data level parallelism. FFT kernels are accelerated by automatically vectorizing blocks of straight line code for processors featuring two-way short vector SIMD extensions like AMD’s 3DNow! and Intel’s SSE 2. Additionally, a special compiler backend is introduced which is able to (i) utilize particular code properties, (ii) generate optimized address computation, and (iii) apply specialized register allocation and instruction scheduling. Experiments show that automatic SIMD vectorization can achieve performance that is comparable to the optimal hand-generated code for FFT kernels. The newly developed methods have been integrated into the codelet generator of Fftw and successfully vectorized complicated code like real-to-halfcomplex non-power-of-two FFT kernels. The floatingpoint performance of Fftw’s scalar version has been more than doubled, resulting in the fastest FFT implementation to date.
    corecore