49 research outputs found

    A Study of Energy and Locality Effects using Space-filling Curves

    Full text link
    The cost of energy is becoming an increasingly important driver for the operating cost of HPC systems, adding yet another facet to the challenge of producing efficient code. In this paper, we investigate the energy implications of trading computation for locality using Hilbert and Morton space-filling curves with dense matrix-matrix multiplication. The advantage of these curves is that they exhibit an inherent tiling effect without requiring specific architecture tuning. By accessing the matrices in the order determined by the space-filling curves, we can trade computation for locality. The index computation overhead of the Morton curve is found to be balanced against its locality and energy efficiency, while the overhead of the Hilbert curve outweighs its improvements on our test system.Comment: Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops (IPDPSW

    Sixteen space-filling curves and traversals for d-dimensional cubes and simplices

    Get PDF
    This article describes sixteen different ways to traverse d-dimensional space recursively in a way that is well-defined for any number of dimensions. Each of these traversals has distinct properties that may be beneficial for certain applications. Some of the traversals are novel, some have been known in principle but had not been described adequately for any number of dimensions, some of the traversals have been known. This article is the first to present them all in a consistent notation system. Furthermore, with this article, tools are provided to enumerate points in a regular grid in the order in which they are visited by each traversal. In particular, we cover: five discontinuous traversals based on subdividing cubes into 2^d subcubes: Z-traversal (Morton indexing), U-traversal, Gray-code traversal, Double-Gray-code traversal, and Inside-out traversal; two discontinuous traversals based on subdividing simplices into 2^d subsimplices: the Hill-Z traversal and the Maehara-reflected traversal; five continuous traversals based on subdividing cubes into 2^d subcubes: the Base-camp Hilbert curve, the Harmonious Hilbert curve, the Alfa Hilbert curve, the Beta Hilbert curve, and the Butz-Hilbert curve; four continuous traversals based on subdividing cubes into 3^d subcubes: the Peano curve, the Coil curve, the Half-coil curve, and the Meurthe curve. All of these traversals are self-similar in the sense that the traversal in each of the subcubes or subsimplices of a cube or simplex, on any level of recursive subdivision, can be obtained by scaling, translating, rotating, reflecting and/or reversing the traversal of the complete unit cube or simplex.Comment: 28 pages, 12 figures. v2: fixed a confusing typo on page 12, line

    08081 Abstracts Collection -- Data Structures

    Get PDF
    From February 17th to 22nd 2008, the Dagstuhl Seminar 08081 ``Data Structures\u27\u27 was held in the International Conference and Research Center (IBFI), Schloss Dagstuhl. It brought together 49 researchers from four continents to discuss recent developments concerning data structures in terms of research but also in terms of new technologies that impact how data can be stored, updated, and retrieved. During the seminar a fair number of participants presented their current research. There was discussion of ongoing work, and in addition an open problem session was held. This paper first describes the seminar topics and goals in general, then gives the minutes of the open problem session, and concludes with abstracts of the presentations given during the seminar. Where appropriate and available, links to extended abstracts or full papers are provided

    Cache-effiziente Block-Matrix-Löser für die Partition of Unity Methode

    Get PDF
    Die Partition of Unity Methode findet Anwendung in gitterlosen Diskretisierungsverfahren zum Lösen elliptischer partieller Differentialgleichungen. Die bei der Diskretisierung entstehenden Gleichungssysteme besitzen eine Blockstruktur, die sich mittels der Multilevel Partition of Unity Methode asymptotisch optimal lösen lassen. Ein alternatives Verfahren zum Lösen dieser Gleichungssysteme stellen die vorkonditionierten Krylow- Unterraumverfahren dar. In dieser Arbeit wird ein auf der ILU-Zerlegung basierenders CG-Verfahren für Block-Matrizen implementiert, das auf Cache-effizienten Algorithmen basiert. Der Ausgangspunkt stellt die Bibliothek TifaMMy dar. Die in den letzten Jahren entwickelte Bibliothek basiert auf inhärent Cache-effiziente Algorithmen für dicht- und dünnbesetzte Matrizen. Dabei wird eine neue Datenstruktur für Blockmatrizen (BCRS) und die nötigen Algorithmen implementiert. Die Leistung des Block-Matrix-Löser wird mit der Multilevel Partition of Unity Methode verglichen

    The Peano software---parallel, automaton-based, dynamically adaptive grid traversals

    Get PDF
    We discuss the design decisions, design alternatives, and rationale behind the third generation of Peano, a framework for dynamically adaptive Cartesian meshes derived from spacetrees. Peano ties the mesh traversal to the mesh storage and supports only one element-wise traversal order resulting from space-filling curves. The user is not free to choose a traversal order herself. The traversal can exploit regular grid subregions and shared memory as well as distributed memory systems with almost no modifications to a serial application code. We formalize the software design by means of two interacting automata—one automaton for the multiscale grid traversal and one for the application-specific algorithmic steps. This yields a callback-based programming paradigm. We further sketch the supported application types and the two data storage schemes realized before we detail high-performance computing aspects and lessons learned. Special emphasis is put on observations regarding the used programming idioms and algorithmic concepts. This transforms our report from a “one way to implement things” code description into a generic discussion and summary of some alternatives, rationale, and design decisions to be made for any tree-based adaptive mesh refinement software
    corecore