4 research outputs found

    Impact of Tile-Size Selection for Skewed Tiling

    Get PDF

    A Compiler Framework for Tiling Imperfectly-Nested Loops

    No full text
    This paper presents an integrated compiler framework for tiling a class of nontrivial imperfectlynested loops such that cache locality is improved. We develop a new memory cost model to analyze data reuse in terms of both the cache and the TLB, based on which we compute the tile size with or without array duplication. We determine whether to duplicate arrays for tiling by comparing the respective exploited reuse factors. We identify compatible loops in order to improve the profitability of tiling. The preliminary results with several benchmark programs show that the transformed programs run faster by 9% to 282%. 1 Introduction This paper considers loop tiling [17] as a technique to improve data locality. Previous tiling techniques are generally limited to perfectly-nested loops. Unfortunately, many important loops in reality are imperfectly nested. In our recent work [12], we define a class of imperfectly-nested loops, and present a set of algorithms to tile such loops with odd-even ..

    Cache based optimization of stencil computations : an algorithmic approach

    Get PDF
    We are witnessing a fundamental paradigm shift in computer design. Memory has been and is becoming more hierarchical. Clock frequency is no longer crucial for performance. The on-chip core count is doubling rapidly. The quest for performance is growing. These facts have lead to complex computer systems which bestow high demands on scientific computing problems to achieve high performance. Stencil computation is a frequent and important kernel that is affected by this complexity. Its importance stems from the wide variety of scientific and engineering applications that use it. The stencil kernel is a nearest-neighbor computation with low arithmetic intensity, thus it usually achieves only a tiny fraction of the peak performance when executed on modern computer systems. Fast on-chip memory modules were introduced as the hardware approach to alleviate the problem. There are mainly three approaches to address the problem, cache aware, cache oblivious, and automatic loop transformation approaches. In this thesis, comprehensive cache aware and cache oblivious algorithms to optimize stencil computations on structured rectangular 2D and 3D grids are presented. Our algorithms observe the challenges for high performance in the previous approaches, devise solutions for them, and carefully balance the solution building blocks against each other. The many-core systems put the scalability of memory access at stake which has lead to hierarchical main memory systems. This adds another locality challenge for performance. We tailor our frameworks to meet the new performance challenge on these architectures. Experiments are performed to evaluate the performance of our frameworks on synthetic as well as real world problems.Wir erleben gerade einen fundamentalen Paradigmenwechsel im Computer Design. Speicher wird immer mehr hierarchisch gegliedert. Die CPU Frequenz ist nicht mehr allein entscheidend fĂŒr die Rechenleistung. Die Zahl der Kerne auf einem Chip verdoppelt sich in kurzen ZeitabstĂ€nden. Das Verlangen nach mehr Leistung wĂ€chst dabei ungebremst. Dies hat komplexe Computersysteme zur Folge, die mit schwierigen Problemen aus dem Bereich des wissenschaftlichen Rechnens einhergehen um eine hohe Leistung zu erreichen. Stencil Computation ist ein hĂ€ufig eingesetzer und wichtiger Kernel, der durch diese KomplexitĂ€t beeinflusst ist. Seine Bedeutung rĂŒhrt von dessen zahlreichen wissenschaftlichen und ingenieurstechnischen Anwendungen. Der Stencil Kernel ist eine NĂ€chster-Nachbar-Berechnung von niedriger arithmetischer IntensitĂ€t. Deswegen erreicht es nur einen Bruchteil der möglichen Höchstleistung, wenn es auf modernen Computersystemen ausgefĂŒhrt wird. Es gibt im Wesentlichen drei Möglichkeiten dieses Problem anzugehen, und zwar durch cache-bewusste, cache-unbewusste und automatische SchleifentransformationsansĂ€tze. In dieser Doktorarbeit stellen wir vollstĂ€ndige cache-bewusste sowie cache-unbewusste Algorithmen zur Optimierung von Stencilberechnungen auf einem strukturierten rechteckigen 2D und 3D Gitter. Unsere Algorithmen erfĂŒllen die Erfordernisse fĂŒr eine hohe Leistung und wiegen diese sorgfĂ€ltig gegeneinander ab. Das Problem der Skalierbarkeit von Speicherzugriffen fĂŒhrte zu hierarchischen Speichersystemen. Dies stellt eine weitere Herausforderung an die Leistung dar. Wir passen unser Framework dahingehend an, um mit dieser Herausforderung auf solchen Architekturen fertig zu werden. Wir fĂŒhren Experimente durch, um die Leistung unseres Algorithmen auf synthetischen wie auch realen Problemen zu evaluieren

    High-performance and hardware-aware computing: proceedings of the first International Workshop on New Frontiers in High-performance and Hardware-aware Computing (HipHaC\u2708)

    Get PDF
    The HipHaC workshop aims at combining new aspects of parallel, heterogeneous, and reconfigurable microprocessor technologies with concepts of high-performance computing and, particularly, numerical solution methods. Compute- and memory-intensive applications can only benefit from the full hardware potential if all features on all levels are taken into account in a holistic approach
    corecore