19 research outputs found

    Vectorised SIMD Implementations of Morphology Algorithms

    Get PDF
    We explore vectorised implementations, exploiting single instruction multiple data (SIMD) CPU instructions on commonly used architectures, of three efficient algorithms for morphological dilation and erosion. We discuss issues specific to SIMD implementation and describe how they guide algorithm choice. We compare our implementations to a commonly used opensource SIMD accelerated machine vision library and find orders of magnitude speed-ups can be achieved for erosions using two-dimensional structuring elements

    Speeding up the K\"ohler's method of contrast thresholding

    Full text link
    K{\"o}hler's method is a useful multi-thresholding technique based on boundary contrast. However, the direct algorithm has a too high complexity-O(N 2) i.e. quadratic with the pixel numbers N-to process images at a sufficient speed for practical applications. In this paper, a new algorithm to speed up K{\"o}hler's method is introduced with a complexity in O(N M), M is the number of grey levels. The proposed algorithm is designed for parallelisation and vector processing , which are available in current processors, using OpenMP (Open Multi-Processing) and SIMD instructions (Single Instruction on Multiple Data). A fast implementation allows a gain factor of 405 in an image of 18 million pixels and a video processing in real time (gain factor of 96).Comment: IEEE CopyrightProceedings of the IEEE International Conference on Image Processing ICIP 201

    Parallel Multiscale Contact Dynamics for Rigid Non-spherical Bodies

    Get PDF
    The simulation of large numbers of rigid bodies of non-analytical shapes or vastly varying sizes which collide with each other is computationally challenging. The fundamental problem is the identification of all contact points between all particles at every time step. In the Discrete Element Method (DEM), this is particularly difficult for particles of arbitrary geometry that exhibit sharp features (e.g. rock granulates). While most codes avoid non-spherical or non-analytical shapes due to the computational complexity, we introduce an iterative-based contact detection method for triangulated geometries. The new method is an improvement over a naive brute force approach which checks all possible geometric constellations of contact and thus exhibits a lot of execution branching. Our iterative approach has limited branching and high floating point operations per processed byte. It thus is suitable for modern Single Instruction Multiple Data (SIMD) CPU hardware. As only the naive brute force approach is robust and always yields a correct solution, we propose a hybrid solution that combines the best of the two worlds to produce fast and robust contacts. In terms of the DEM workflow, we furthermore propose a multilevel tree-based data structure strategy that holds all particles in the domain on multiple scales in grids. Grids reduce the total computational complexity of the simulation. The data structure is combined with the DEM phases to form a single touch tree-based traversal that identifies both contact points between particle pairs and introduces concurrency to the system during particle comparisons in one multiscale grid sweep. Finally, a reluctant adaptivity variant is introduced which enables us to realise an improved time stepping scheme with larger time steps than standard adaptivity while we still minimise the grid administration overhead. Four different parallelisation strategies that exploit multicore architectures are discussed for the triad of methodological ingredients. Each parallelisation scheme exhibits unique behaviour depending on the grid and particle geometry at hand. The fusion of them into a task-based parallelisation workflow yields promising speedups. Our work shows that new computer architecture can push the boundary of DEM computability but this is only possible if the right data structures and algorithms are chosen

    Development and application of real-time and interactive software for complex system

    Get PDF
    Soft materials have attracted considerable interest in recent years for predicting the characteristics of phase separation and self-assembly in nanoscale structures. A popular method for demonstrating and simulating the dynamic behaviour of particles (e.g. particle tracking) and to consider effects of simulation parameters is cell dynamic simulation (CDS). This is a cellular computerisation technique that can be used to investigate different aspects of morphological topographies of soft material systems. The acquisition of quantitative data from particles is a critical requirement in order to obtain a better understanding and of characterising their dynamic behaviour. To achieve this objective particle tracking methods considering quantitative data and focusing on different properties and components of particles is essential. Despite the availability of various types of particle tracking used in experimental work, there is no method available to consider uniform computational data. In order to achieve accurate and efficient computational results for cell dynamic simulation method and particle tracking, two factors are essential: computing/calculating time-scale and simulation system size. Consequently, finding available computing algorithms and resources such as sequential algorithm for implementing a complex technique and achieving precise results is critical and rather expensive. Therefore, it is highly desirable to consider a parallel algorithm and programming model to solve time-consuming and massive computational processing issues. Hence, the gaps between the experimental and computational works and solving time consuming for expensive computational calculations need to be filled in order to investigate a uniform computational technique for particle tracking and significant enhancements in speed and execution times. The work presented in this thesis details a new particle tracking method for integrating diblock copolymers in the form of spheres with a shear flow and a novel designed GPU-based parallel acceleration approach to cell dynamic simulation (CDS). In addition, the evaluation of parallel models and architectures (CPUs and GPUs) utilising the mixtures of application program interface, OpenMP and programming model, CUDA were developed. Finally, this study presents the performance enhancements achieved with GPU-CUDA of approximately ~2 times faster than multi-threading implementation and 13~14 times quicker than optimised sequential processing for the CDS computations/workloads respectively

    Elasto-plastic deformations within a material point framework on modern GPU architectures

    Get PDF
    Plastic strain localization is an important process on Earth. It strongly influ- ences the mechanical behaviour of natural processes, such as fault mechanics, earthquakes or orogeny. At a smaller scale, a landslide is a fantastic example of elasto-plastic deformations. Such behaviour spans from pre-failure mech- anisms to post-failure propagation of the unstable material. To fully resolve the landslide mechanics, the selected numerical methods should be able to efficiently address a wide range of deformation magnitudes. Accurate and performant numerical modelling requires important compu- tational resources. Mesh-free numerical methods such as the material point method (MPM) or the smoothed-particle hydrodynamics (SPH) are particu- larly computationally expensive, when compared with mesh-based methods, such as the finite element method (FEM) or the finite difference method (FDM). Still, mesh-free methods are particularly well-suited to numerical problems involving large elasto-plastic deformations. But, the computational efficiency of these methods should be first improved in order to tackle complex three-dimensional problems, i.e., landslides. As such, this research work attempts to alleviate the computational cost of the material point method by using the most recent graphics processing unit (GPU) architectures available. GPUs are many-core processors originally designed to refresh screen pixels (e.g., for computer games) independently. This allows GPUs to delivers a massive parallelism when compared to central processing units (CPUs). To do so, this research work first investigates code prototyping in a high- level language, e.g., MATLAB. This allows to implement vectorized algorithms and benchmark numerical results of two-dimensional analysis with analytical solutions and/or experimental results in an affordable amount of time. After- wards, low-level language such as CUDA C is used to efficiently implement a GPU-based solver, i.e., ep2-3De v1.0, can resolve three-dimensional prob- lems in a decent amount of time. This part takes advantages of the massive parallelism of modern GPU architectures. In addition, a first attempt of GPU parallel computing, i.e., multi-GPU codes, is performed to increase even more the performance and to address the on-chip memory limitation. Finally, this GPU-based solver is used to investigate three-dimensional granular collapses and is compared with experimental evidences obtained in the laboratory. This research work demonstrates that the material point method is well suited to resolve small to large elasto-plastic deformations. Moreover, the computational efficiency of the method can be dramatically increased using modern GPU architectures. These allow fast, performant and accurate three- dimensional modelling of landslides, provided that the on-chip memory limi- tation is alleviated with an appropriate parallel strategy

    Manycore Algorithms for Genetic Linkage Analysis

    Get PDF
    Exact algorithms to perform linkage analysis scale exponentially with the size of the input. Beyond a critical point, the amount of work that needs to be done exceeds both available time and memory. In these circumstances, we are forced to either abbreviate the input in some manner or else use an approximation. Approximate methods, like Markov chain Monte Carlo (MCMC), though they make the problem tractable, can take an immense amount of time to converge. The problem of high convergence time is compounded by software which is single-threaded and, as computer processors are manufactured with increasing numbers of physical processing cores, are not designed to take advantage of the available processing power. In this thesis, we will describe our program SwiftLink that embodies our work adapting existing Gibbs samplers to modern computer processor architectures. The processor architectures we target are: multicore processors, that currently feature between 4–8 processor cores, and computer graphics cards (GPUs) that already feature hundreds of processor cores. We implemented parallel versions of the meiosis sampler, that mixes well with tightly linked markers but suffers from irreducibility issues, and the locus sampler which is guaranteed to be irreducible but mixes slowly with tightly linked markers. We evaluate SwiftLink’s performance on real-world datasets of large consanguineous families. We demonstrate that using four processor cores for a single analysis is 3–3.2x faster than the single-threaded implementation of SwiftLink. With respect to the existing MCMC-based programs: it achieves a 6.6–8.7x speedup compared to Morgan and a 66.4– 72.3x speedup compared to Simwalk. Utilising both a multicore processor and a GPU performs 7–7.9x faster than the single-threaded implementation, a 17.6–19x speedup compared to Morgan and a 145.5–192.3x speedup compared to Simwalk

    A Parametric Model for the Analysis and Quantification of Foveal Shapes

    Get PDF
    Recently, the advance of OCT enables a detailed examination of the human retina in-vivo for clinical routine and experimental eye research. One of the structures inside the retina of immense scientific interest is the fovea, a small retinal pit located in the central region with extraordinary visual resolution. Today, only a few investigations captured foveal morphology based on a large subject group by a detailed analysis employing mathematical models. In this work, we develop a parametric model function to describe the shape of the human fovea. Starting with a detailed discussion on the history and present of fovea research, we define the requirements for a suitable model and derive a function which can represent a broad range of foveal shapes. The model is one-dimensional in its basic form and can only account for the shape of one particular section through a fovea. Therefore, we apply a radial fitting scheme in different directions which can capture a fovea in its full three-dimensional appearance. Highly relevant foveal characteristics, derived from the model, provide valuable descriptions to quantify the fovea and allow for a detailed analysis of different foveal shapes. To put the theoretical model into practice, we develop a numerical scheme to compute model parameters from retinal \ac{oct} scans and to reconstruct the shape of an entire fovea. For the sake of scientific reproducibility, this section includes implementation details, examples and a discussion of performance considerations. Finally, we present several studies which employed the fovea model successfully. A first feasibility study verifies that the parametric model is suitable for foveal shapes occurring in a large set of healthy human eyes. In a follow-up investigation, we analyse foveal characteristics occurring in healthy humans in detail. This analysis will concern with different aspects including, e.g. an investigation of the fovea's asymmetry, a gender comparison, a left versus right eye correlation and the computation of subjects with extreme foveal shapes. Furthermore, we will show how the model was used to support investigations unrelated to the direct quantification of the fovea itself. In these investigations we employed the model to compute anatomically correct regions of interest in an analysis of the OCB and the calculation of an average fovea for an optical simulation of light rays. We will conclude with currently unpublished data that shows the fovea modelling of hunting birds which have unusual, funnel-like foveal shapes

    Simulating 3D Radiation Transport, a modern approach to discretisation and an exploration of probabilistic methods

    Get PDF
    Light, or electromagnetic radiation in general, is a profound and invaluable resource to investigate our physical world. For centuries, it was the only and it still is the main source of information to study the Universe beyond our planet. With high-resolution spectroscopic imaging, we can identify numerous atoms and molecules, and can trace their physical and chemical environments in unprecedented detail. Furthermore, radiation plays an essential role in several physical and chemical processes, ranging from radiative pressure, heating, and cooling, to chemical photo-ionisation and photo-dissociation reactions. As a result, almost all astrophysical simulations require a radiative transfer model. Unfortunately, accurate radiative transfer is very computationally expensive. Therefore, in this thesis, we aim to improve the performance of radiative transfer solvers, with a particular emphasis on line radiative transfer. First, we review the classical work on accelerated lambda iterations and acceleration of convergence, and we propose a simple but effective improvement to the ubiquitously used Ng-acceleration scheme. Next, we present the radiative transfer library, Magritte: a formal solver with a ray-tracer that can handle structured and unstructured meshes as well as smoothed-particle data. To mitigate the computational cost, it is optimised to efficiently utilise multi-node and multi-core parallelism as well as GPU offloading. Furthermore, we demonstrate a heuristic algorithm that can reduce typical input models for radiative transfer by an order of magnitude, without significant loss of accuracy. This strongly suggests the existence of more efficient representations for radiative transfer models. To investigate this, we present a probabilistic numerical method for radiative transfer that naturally allows for uncertainty quantification, providing us with a mathematical framework to study the trade-off between computational speed and accuracy. Although we cannot yet construct optimal representations for radiative transfer problems, we point out several ways in which this method can lead to more rigorous optimisation

    Lattice-Boltzmann simulations of cerebral blood flow

    Get PDF
    Computational haemodynamics play a central role in the understanding of blood behaviour in the cerebral vasculature, increasing our knowledge in the onset of vascular diseases and their progression, improving diagnosis and ultimately providing better patient prognosis. Computer simulations hold the potential of accurately characterising motion of blood and its interaction with the vessel wall, providing the capability to assess surgical treatments with no danger to the patient. These aspects considerably contribute to better understand of blood circulation processes as well as to augment pre-treatment planning. Existing software environments for treatment planning consist of several stages, each requiring significant user interaction and processing time, significantly limiting their use in clinical scenarios. The aim of this PhD is to provide clinicians and researchers with a tool to aid in the understanding of human cerebral haemodynamics. This tool employs a high performance fluid solver based on the lattice-Boltzmann method (coined HemeLB), high performance distributed computing and grid computing, and various advanced software applications useful to efficiently set up and run patient-specific simulations. A graphical tool is used to segment the vasculature from patient-specific CT or MR data and configure boundary conditions with ease, creating models of the vasculature in real time. Blood flow visualisation is done in real time using in situ rendering techniques implemented within the parallel fluid solver and aided by steering capabilities; these programming strategies allows the clinician to interactively display the simulation results on a local workstation. A separate software application is used to numerically compare simulation results carried out at different spatial resolutions, providing a strategy to approach numerical validation. This developed software and supporting computational infrastructure was used to study various patient-specific intracranial aneurysms with the collaborating interventionalists at the National Hospital for Neurology and Neuroscience (London), using three-dimensional rotational angiography data to define the patient-specific vasculature. Blood flow motion was depicted in detail by the visualisation capabilities, clearly showing vortex fluid ow features and stress distribution at the inner surface of the aneurysms and their surrounding vasculature. These investigations permitted the clinicians to rapidly assess the risk associated with the growth and rupture of each aneurysm. The ultimate goal of this work is to aid clinical practice with an efficient easy-to-use toolkit for real-time decision support
    corecore