431 research outputs found

    WMTrace : a lightweight memory allocation tracker and analysis framework

    Get PDF
    The diverging gap between processor and memory performance has been a well discussed aspect of computer architecture literature for some years. The use of multi-core processor designs has, however, brought new problems to the design of memory architectures - increased core density without matched improvement in memory capacity is reduc- ing the available memory per parallel process. Multiple cores accessing memory simultaneously degrades performance as a result of resource con- tention for memory channels and physical DIMMs. These issues combine to ensure that memory remains an on-going challenge in the design of parallel algorithms which scale. In this paper we present WMTrace, a lightweight tool to trace and analyse memory allocation events in parallel applications. This tool is able to dynamically link to pre-existing application binaries requiring no source code modification or recompilation. A post-execution analysis stage enables in-depth analysis of traces to be performed allowing memory allocations to be analysed by time, size or function. The second half of this paper features a case study in which we apply WMTrace to five parallel scientific applications and benchmarks, demonstrating its effectiveness at recording high-water mark memory consumption as well as memory use per-function over time. An in-depth analysis is provided for an unstructured mesh benchmark which reveals significant memory allocation imbalance across its participating processes

    Parallel Multiscale Contact Dynamics for Rigid Non-spherical Bodies

    Get PDF
    The simulation of large numbers of rigid bodies of non-analytical shapes or vastly varying sizes which collide with each other is computationally challenging. The fundamental problem is the identification of all contact points between all particles at every time step. In the Discrete Element Method (DEM), this is particularly difficult for particles of arbitrary geometry that exhibit sharp features (e.g. rock granulates). While most codes avoid non-spherical or non-analytical shapes due to the computational complexity, we introduce an iterative-based contact detection method for triangulated geometries. The new method is an improvement over a naive brute force approach which checks all possible geometric constellations of contact and thus exhibits a lot of execution branching. Our iterative approach has limited branching and high floating point operations per processed byte. It thus is suitable for modern Single Instruction Multiple Data (SIMD) CPU hardware. As only the naive brute force approach is robust and always yields a correct solution, we propose a hybrid solution that combines the best of the two worlds to produce fast and robust contacts. In terms of the DEM workflow, we furthermore propose a multilevel tree-based data structure strategy that holds all particles in the domain on multiple scales in grids. Grids reduce the total computational complexity of the simulation. The data structure is combined with the DEM phases to form a single touch tree-based traversal that identifies both contact points between particle pairs and introduces concurrency to the system during particle comparisons in one multiscale grid sweep. Finally, a reluctant adaptivity variant is introduced which enables us to realise an improved time stepping scheme with larger time steps than standard adaptivity while we still minimise the grid administration overhead. Four different parallelisation strategies that exploit multicore architectures are discussed for the triad of methodological ingredients. Each parallelisation scheme exhibits unique behaviour depending on the grid and particle geometry at hand. The fusion of them into a task-based parallelisation workflow yields promising speedups. Our work shows that new computer architecture can push the boundary of DEM computability but this is only possible if the right data structures and algorithms are chosen

    3DRepo4Unity: Dynamic Loading of Version Controlled 3D Assets into the Unity Game Engine

    Get PDF
    In recent years, Unity has become a popular platform for the development of a broad range of visualization and VR applications. This is due to its ease of use, cross-platform compatibility and accessibility to independent developers. Despite such applications being cross-platform, their assets are generally bundled with executables, or streamed at runtime in a highly optimised, proprietary format. In this paper, we present a novel system for dynamically populating a Unity environment at runtime using open Web3D standards. Our system generates dynamic resources at runtime from a remote 3D Repo repository. This enables us to build a viewer which can easily visualize X3D-based revisions from a version controlled database in the cloud without any compile-time knowledge of the assets. We motivate the work and introduce the high-level architecture of our solution. We describe our new dynamic transcoding library with an emphasis on scalability and 3D rendering. We then perform a comparative evaluation between 3drepo.io, a state of the art X3DOM based renderer, and the new 3DRepo4Unity library on web browser platforms. Finally, we present a number of different applications that demonstrate the practicality of our chosen approach. By building on previous Web3D functionality and standards, our hope is to stimulate further discussion around and research into web formats that would enable incremental loading on other platforms

    Survey of texture mapping techniques for representing and rendering volumetric mesostructure

    Get PDF
    Representation and rendering of volumetric mesostructure using texture mapping can potentially allow the display of highly detailed, animated surfaces at a low performance cost. Given the need for consistently more detailed and dynamic worlds rendered in real-time, volumetric texture mapping now becomes an area of great importance.In this survey, we review the developments of algorithms and techniques for representing volumetric mesostructure as texture-mapped detail. Our goal is to provide researchers with an overview of novel contributions to volumetric texture mapping as a starting point for further research and developers with a comparative review of techniques, giving insight into which methods would be fitting for particular tasks.We start by defining the scope of our domain and provide background information regarding mesostructure and volumetric texture mapping. Existing techniques are assessed in terms of content representation and storage as well as quality and performance of parameterization and rendering. Finally, we provide insights to the field and opportunities for research directions in terms of real-time volumetric texture-mapped surfaces under deformation

    View space linking, solid node compression and binary space partitioning for visibility determination in 3D walk-throughs

    Get PDF
    Today\u27s 3D games consumers are expecting more and more quality in their games. To enable high quality graphics at interactive rates, games programmers employ a technique known as hidden surface removal (HSR) or polygon culling. HSR is not just applicable to games; it may also be applied to any application that requires quality and interactive rates, including medical, military and building applications. One such commonly used technique for HSR is the binary space partition (BSP) tree, which is used for 3D ‘walk-throughs’, otherwise known as 3D static environments or first person shooters. Recent developments in 3D accelerated hardware technology do not mean that HSR is becoming redundant; in fact, HSR is increasingly becoming more important to the graphics pipeline. The well established potentially visible sets (PSV) BSP tree algorithm is used as a platform for exploring three enhanced algorithms; View Space Lighting, Solid Node Compression and hardware accelerated occlusion are shown to reducing the amounts of nodes that are traversed in a BSP tree, improving tree travel efficiency. These algorithms are proven (in cases) to improve overall efficiency

    The Peano software---parallel, automaton-based, dynamically adaptive grid traversals

    Get PDF
    We discuss the design decisions, design alternatives, and rationale behind the third generation of Peano, a framework for dynamically adaptive Cartesian meshes derived from spacetrees. Peano ties the mesh traversal to the mesh storage and supports only one element-wise traversal order resulting from space-filling curves. The user is not free to choose a traversal order herself. The traversal can exploit regular grid subregions and shared memory as well as distributed memory systems with almost no modifications to a serial application code. We formalize the software design by means of two interacting automata—one automaton for the multiscale grid traversal and one for the application-specific algorithmic steps. This yields a callback-based programming paradigm. We further sketch the supported application types and the two data storage schemes realized before we detail high-performance computing aspects and lessons learned. Special emphasis is put on observations regarding the used programming idioms and algorithmic concepts. This transforms our report from a “one way to implement things” code description into a generic discussion and summary of some alternatives, rationale, and design decisions to be made for any tree-based adaptive mesh refinement software

    Sparse octree algorithms for scalable dense volumetric tracking and mapping

    Get PDF
    This thesis is concerned with the problem of Simultaneous Localisation and Mapping (SLAM), the task of localising an agent within an unknown environment and at the same time building a representation of it. In particular, we tackle the fundamental scalability limitations of dense volumetric SLAM systems. We do so by proposing a highly efficient hierarchical data-structure based on octrees together with a set of algorithms to support the most compute-intensive operations in typical volumetric reconstruction pipelines. We employ our hierarchical representation in a novel dense pipeline based on occupancy probabilities. Crucially, the complete space representation encoded by the octree enables to demonstrate a fully integrated system in which tracking, mapping and occupancy queries can be performed seamlessly on a single coherent representation. While achieving accuracy either at par or better than the current state-of-the-art, we demonstrate run-time performance of at least an order of magnitude better than currently available hierarchical data-structures. Finally, we introduce a novel multi-scale reconstruction system that exploits our octree hierarchy. By adaptively selecting the appropriate scale to match the effective sensor resolution in both integration and rendering, we demonstrate better reconstruction results and tracking accuracy compared to single-resolution grids. Furthermore, we achieve much higher computational performance by propagating information up and down the tree in a lazy fashion, which allow us to reduce the computational load when updating distant surfaces. We have released our software as an open-source library, named supereight, which is freely available for the benefit of the wider community. One of the main advantages of our library is its flexibility. By carefully providing a set of algorithmic abstractions, supereight enables SLAM practitioners to freely experiment with different map representations with no intervention on the back-end library code and crucially, preserving performance. Our work has been adopted by robotics researchers in both academia and industry.Open Acces
    corecore