370 research outputs found

    Automated cache optimisations of stencil computations for partial differential equations

    Get PDF
    This thesis focuses on numerical methods that solve partial differential equations. Our focal point is the finite difference method, which solves partial differential equations by approximating derivatives with explicit finite differences. These partial differential equation solvers consist of stencil computations on structured grids. Stencils for computing real-world practical applications are patterns often characterised by many memory accesses and non-trivial arithmetic expressions that lead to high computational costs compared to simple stencils used in much prior proof-of-concept work. In addition, the loop nests to express stencils on structured grids may often be complicated. This work is highly motivated by a specific domain of stencil computations where one of the challenges is non-aligned to the structured grid ("off-the-grid") operations. These operations update neighbouring grid points through scatter and gather operations via non-affine memory accesses, such as {A[B[i]]}. In addition to this challenge, these practical stencils often include many computation fields (need to store multiple grid copies), complex data dependencies and imperfect loop nests. In this work, we aim to increase the performance of stencil kernel execution. We study automated cache-memory-dependent optimisations for stencil computations. This work consists of two core parts with their respective contributions.The first part of our work tries to reduce the data movement in stencil computations of practical interest. Data movement is a dominant factor affecting the performance of high-performance computing applications. It has long been a target of optimisations due to its impact on execution time and energy consumption. This thesis tries to relieve this cost by applying temporal blocking optimisations, also known as time-tiling, to stencil computations. Temporal blocking is a well-known technique to enhance data reuse in stencil computations. However, it is rarely used in practical applications but rather in theoretical examples to prove its efficacy. Applying temporal blocking to scientific simulations is more complex. More specifically, in this work, we focus on the application context of seismic and medical imaging. In this area, we often encounter scatter and gather operations due to signal sources and receivers at arbitrary locations in the computational domain. These operations make the application of temporal blocking challenging. We present an approach to overcome this challenge and successfully apply temporal blocking.In the second part of our work, we extend the first part as an automated approach targeting a wide range of simulations modelled with partial differential equations. Since temporal blocking is error-prone, tedious to apply by hand and highly complex to assimilate theoretically and practically, we are motivated to automate its application and automatically generate code that benefits from it. We discuss algorithmic approaches and present a generalised compiler pipeline to automate the application of temporal blocking. These passes are written in the Devito compiler. They are used to accelerate the computation of stencil kernels in areas such as seismic and medical imaging, computational fluid dynamics and machine learning. \href{www.devitoproject.org}{Devito} is a Python package to implement optimised stencil computation (e.g., finite differences, image processing, machine learning) from high-level symbolic problem definitions. Devito builds on \href{www.sympy.org}{SymPy} and employs automated code generation and just-in-time compilation to execute optimised computational kernels on several computer platforms, including CPUs, GPUs, and clusters thereof. We show how we automate temporal blocking code generation without user intervention and often achieve better time-to-solution. We enable domain-specific optimisation through compiler passes and offer temporal blocking gains from a high-level symbolic abstraction. These automated optimisations benefit various computational kernels for solving real-world application problems.Open Acces

    Building bulk geometry from the tensor Radon transform

    Get PDF
    Using the tensor Radon transform and related numerical methods, we study how bulk geometries can be explicitly reconstructed from boundary entanglement entropies in the specific case of AdS₃/CFT₂. We find that, given the boundary entanglement entropies of a 2d CFT, this framework provides a quantitative measure that detects whether the bulk dual is geometric in the perturbative (near AdS) limit. In the case where a well-defined bulk geometry exists, we explicitly reconstruct the unique bulk metric tensor once a gauge choice is made. We then examine the emergent bulk geometries for static and dynamical scenarios in holography and in many-body systems. Apart from the physics results, our work demonstrates that numerical methods are feasible and effective in the study of bulk reconstruction in AdS/CFT

    Towards Predictive Rendering in Virtual Reality

    Get PDF
    The strive for generating predictive images, i.e., images representing radiometrically correct renditions of reality, has been a longstanding problem in computer graphics. The exactness of such images is extremely important for Virtual Reality applications like Virtual Prototyping, where users need to make decisions impacting large investments based on the simulated images. Unfortunately, generation of predictive imagery is still an unsolved problem due to manifold reasons, especially if real-time restrictions apply. First, existing scenes used for rendering are not modeled accurately enough to create predictive images. Second, even with huge computational efforts existing rendering algorithms are not able to produce radiometrically correct images. Third, current display devices need to convert rendered images into some low-dimensional color space, which prohibits display of radiometrically correct images. Overcoming these limitations is the focus of current state-of-the-art research. This thesis also contributes to this task. First, it briefly introduces the necessary background and identifies the steps required for real-time predictive image generation. Then, existing techniques targeting these steps are presented and their limitations are pointed out. To solve some of the remaining problems, novel techniques are proposed. They cover various steps in the predictive image generation process, ranging from accurate scene modeling over efficient data representation to high-quality, real-time rendering. A special focus of this thesis lays on real-time generation of predictive images using bidirectional texture functions (BTFs), i.e., very accurate representations for spatially varying surface materials. The techniques proposed by this thesis enable efficient handling of BTFs by compressing the huge amount of data contained in this material representation, applying them to geometric surfaces using texture and BTF synthesis techniques, and rendering BTF covered objects in real-time. Further approaches proposed in this thesis target inclusion of real-time global illumination effects or more efficient rendering using novel level-of-detail representations for geometric objects. Finally, this thesis assesses the rendering quality achievable with BTF materials, indicating a significant increase in realism but also confirming the remainder of problems to be solved to achieve truly predictive image generation

    Reconstruction of neuronal activity and connectivity patterns in the zebrafish olfactory bulb

    Get PDF
    In the olfactory bulb (OB), odors evoke distributed patterns of activity across glomeruli that are reorganized by networks of interneurons (INs). This reorganization results in multiple computations including a decorrelation of activity patterns across the output neurons, the mitral cells (MCs). To understand the mechanistic basis of these computations it is essential to analyze the relationship between function and structure of the underlying circuit. I combined in vivo twophoton calcium imaging with dense circuit reconstruction from complete serial block-face electron microscopy (SBEM) stacks of the larval zebrafish OB (4.5 dpf) with a voxel size of 9x9x25nm. To address bottlenecks in the workflow of SBEM, I developed a novel embedding and staining procedure that effectively reduces surface charging in SBEM and enables to acquire SBEM stacks with at least a ten-fold increase in both, signal-to-noise as well as acquisition speed. I set up a high throughput neuron reconstruction pipeline with >30 professional tracers that is available for the scientific community (ariadne-service.com). To assure efficient and accurate circuit reconstruction, I developed PyKNOSSOS, a Python software for skeleton tracing and synapse annotation, and CORE, a skeleton consolidation procedure that combines redundant reconstruction with targeted expert input. Using these procedures I reconstructed all neurons (>1000) in the larval OB. Unlike in the adult OB, INs were rare and appeared to represent specific subtypes, indicating that different sub-circuits develop sequentially. MCs were uniglomerular whereas inter-glomerular projections of INs were complex and biased towards groups of glomeruli that receive input from common types of sensory neurons. Hence, the IN network in the OB exhibits a topological organization that is governed by glomerular identity. Calcium imaging revealed that the larval OB circuitry already decorrelates activity patterns evoked by similar odors. The comparison of inter-glomerular connectivity to the functional interactions between glomeruli indicates that pattern decorrelation depends on specific, non-random inter-glomerular IN projections. Hence, the topology of IN networks in the OB appears to be an important determinant of circuit function

    PLVS: A SLAM System with Points, Lines, Volumetric Mapping, and 3D Incremental Segmentation

    Full text link
    This document presents PLVS: a real-time system that leverages sparse SLAM, volumetric mapping, and 3D unsupervised incremental segmentation. PLVS stands for Points, Lines, Volumetric mapping, and Segmentation. It supports RGB-D and Stereo cameras, which may be optionally equipped with IMUs. The SLAM module is keyframe-based, and extracts and tracks sparse points and line segments as features. Volumetric mapping runs in parallel with respect to the SLAM front-end and generates a 3D reconstruction of the explored environment by fusing point clouds backprojected from keyframes. Different volumetric mapping methods are supported and integrated in PLVS. We use a novel reprojection error to bundle-adjust line segments. This error exploits available depth information to stabilize the position estimates of line segment endpoints. An incremental and geometric-based segmentation method is implemented and integrated for RGB-D cameras in the PLVS framework. We present qualitative and quantitative evaluations of the PLVS framework on some publicly available datasets. The appendix details the adopted stereo line triangulation method and provides a derivation of the Jacobians we used for line error terms. The software is available as open-source

    Die Herausforderungen nichtlinearer Parameter und Variablen in automatischer Schleifenparallelisierung

    Get PDF
    With the rise of manycore processors, parallelism is becoming a mainstream necessity. Unfortunately, parallel programming is inherently more difficult than sequential programming; therefore, techniques for automatic parallelisation will become indispensable. We aim at extending the well-known polyhedron model, which promises this automation, beyond some of its current restrictions. Up to now, loop bounds and array subscripts in the modelled codes must be expressions linear in both the variables and the parameters. We lift this restriction and allow certain polynomial expressions instead of linear ones. With our extensions, we are able to handle more programs in all phases of the parallelisation process (dependence analysis, transformation of the program model, code generation). We extend Banerjee's classical dependence analysis to handle one non-linear parameter p, i.e., we are able to determine precisely the solutions of the system of conflict equalities for input programs with non-linear array accesses like A[p*i] in dependence of the residue class of p. We make contributions to three transformations desirable in automatic parallelisation. First, we show that using a generalised Simplex algorithm, which we have developed, schedules with non-linear parameters like theta(i)=floor(i/n) can be computed. In addition, such schedules can be expressed easily as a quantifier elimination problem but this approach turns out to be computationally less efficient with the available implementation. As a second transformation, we study parametric tiling which is used to adapt a parallelised program to the number of available processors at run time. Third, we present a localisation technique to exploit scratchpad memories on architectures on which data caching has to be handled by software. We transform a given code such that it keeps values which are reused in successive iterations of a sequential loop in the scratchpad. An access to a value written in an earlier iteration is served from the scratchpad to accelerate the access. In general, this transformation introduces non-linear loop bounds in the transformed model. Finally, we present an algorithm for generating code for arbitrary semi-algebraic iteration sets, i.e., for iteration sets described by polynomial inequalities in the variables and parameters. This is a vast generalisation of existing polyhedral code generation techniques. Although our algorithm is less efficient than polyhedral code generators, this paves the way for a code generator that can handle arbitrary parametric tilings and other transformations which introduce non-linear parameters (like non-linear schedules and the localisation we present) or even non-linear variables

    Quantum Information Approaches to Holographic Dualities

    Get PDF
    The AdS/CFT correspondence, a remarkable duality between certain gravitational theories in anti-de Sitter (AdS) spacetime and quantum field theories with conformal symmetry (CFT), has had a profound effect on the development of theoretical physics in the past two decades. Recently, many connections of AdS/CFT to quantum information theory have been found, in particular by providing gravitationally dual descriptions of various entanglement measures. Understanding these manifestions of AdS/CFT --- or more generally, the conjectured holographic principle encompassing it --- requires the combination of tools from both high-energy theory and quantum information physics. In this cumulative thesis, the convergence between these two fields is approached from two fronts: First, by calculations within the dual gravitational theory, and second, using a tensor network ansatz to describe the quantum states suspected to possess such a gravitational description. In the first approach, using the gravitational side of AdS/CFT, entanglement entropies of complicated 2+1-dimensional excited CFTs are computed, thus showing how the holographic approach provides access to systems previously out of reach of practical methods, while introducing new numerical methods that this approach necessitates. The second approach is given by tensor networks, a highly successful ansatz for computing properties of one- and two-dimensional quantum systems. Efficiently computable classes of tensor networks are tested in their ability to represent simple holographic systems, successfully reproducing both hyperbolic geometrical features as well as critical boundary states. In addition, the general properties of tensor networks on regular hyperbolic tesselations are considered, leading to new connections to models not previously considered in the context of holography. This interplay of different approaches to quantum information holography showcases the richness of this new field and suggests that a wide range of physical phenomena is accessible via the new tools now at our disposal
    • …
    corecore