82 research outputs found
Recommended from our members
Preparing sparse solvers for exascale computing.
Sparse solvers provide essential functionality for a wide variety of scientific applications. Highly parallel sparse solvers are essential for continuing advances in high-fidelity, multi-physics and multi-scale simulations, especially as we target exascale platforms. This paper describes the challenges, strategies and progress of the US Department of Energy Exascale Computing project towards providing sparse solvers for exascale computing platforms. We address the demands of systems with thousands of high-performance node devices where exposing concurrency, hiding latency and creating alternative algorithms become essential. The efforts described here are works in progress, highlighting current success and upcoming challenges. This article is part of a discussion meeting issue 'Numerical algorithms for high-performance computational science'
Hydrolink 2015/3. SPH (Smoothed Particle Hydrodynamics) in Hydraulics
Topic: SPH (Smoothed Particle Hydrodynamics] in Hydraulic
Efficient algebraic multigrid preconditioners on clusters of GPUs
Many scientific applications require the solution of large and sparse linear systems of equations using Krylov subspace methods; in this case, the choice of an effective preconditioner may be crucial for the convergence of the Krylov solver. Algebraic MultiGrid (AMG) methods are widely used as preconditioners, because of their optimal computational cost and their algorithmic scalability. The wide availability of GPUs, now found in many of the fastest supercomputers, poses the problem of implementing efficiently these methods on high-throughput processors. In this work we focus on the application phase of AMG preconditioners, and in particular on the choice and implementation of smoothers and coarsest-level solvers capable of exploiting the computational power of clusters of GPUs. We consider block-Jacobi smoothers using sparse approximate inverses in the solve phase associated with the local blocks. The choice of approximate inverses instead of sparse matrix factorizations is driven by the large amount of parallelism exposed by the matrix-vector product as compared to the solution of large triangular systems on GPUs. The selected smoothers and solvers are implemented within the AMG preconditioning framework provided by the MLD2P4 library, using suitable sparse matrix data structures from the PSBLAS library. Their behaviour is illustrated in terms of execution speed and scalability, on a test case concerning groundwater modelling, provided by the Jülich Supercomputing Center within the Horizon 2020 Project EoCoE
Parallel computing 2011, ParCo 2011: book of abstracts
This book contains the abstracts of the presentations at the conference Parallel Computing 2011, 30 August - 2 September 2011, Ghent, Belgiu
High-performance simulation technologies for water-related natural hazards
PhD ThesisWater-related natural hazards, such as flash floods, landslides and debris flows, usually happen
in chains. In order to better understand the underlying physical processes and more reliably
quantify the associated risk, it is essential to develop a physically-based multi-hazard modelling
system to simulate these hazards at a catchment scale. An effective multi-hazard modelling
system may be developed by solving a set of depth-averaged dynamic equations incorporating
adaptive basal resistance terms. High-performance computing achieved through implementation
on modern graphic processing units (GPUs) can be used to accelerate the model to support
efficient large-scale simulations. This thesis presents the key simulation technologies for developing
such a novel high-performance water-related natural hazards modelling system.
A new well-balanced smoothed particle hydrodynamic (SPH) model is first presented for
solving the shallow water equations (SWEs) in the context of flood inundation modelling. The
performance of the SPH model is compared with an alternative flood inundation model based
on a finite volume (FV) method in order to select a better numerical method for the current
study. The FV model performs favourably for practical applications and therefore is adopted
to develop the proposed multi-hazard model. In order to more accurately describe the rainfallrunoff
and overland flow process that often initiates a hazard chain, a first-order FV Godunovtype
model is developed to solve the SWEs, implemented with novel source term discretisation
schemes. The new model overcomes the limitations of the current prevailing numerical
schemes such as inaccurate calculations of bed slope or friction source terms and provides
much improved numerical accuracy, efficiency and stability for simulating overland flows and
surface flooding. To support large-scale simulation of flow-like landslides or debris flows, a
new formulation of depth-averaged governing equations is derived on the Cartesian coordinate
system. The new governing equations take into account the effects of non-hydrostatic pressure
and centrifugal force, which may become significant over terrains with steep and curved
topography. These equations are compatible with various basal resistance terms, effectively leading to a unified mathematical framework for describing different type of water-related natural
hazards including surface flooding, flow-like landslides and debris flows. The new depthaveraged
governing equations are then solved using an FV Godunov-type framework based on
the second-order accurate scheme. A flexible and GPU-based software framework is further
designed to provide much improved computational efficiency for large-scale simulations and
ease the future implementation of new functionalities. This provides an effective codebase
for the proposed multi-hazard modelling system and its potential is confirmed by successfully
applying to simulate flow-like landslides and dam break floods.Newcastle University and China Scholarship Council,
Henry Lester Trust and
Great Britain China Education Trus
DEVELOPMENT OF A MODULAR AGRICULTURAL ROBOTIC SPRAYER
Precision Agriculture (PA) increases farm productivity, reduces pollution, and minimizes input costs. However, the wide adoption of existing PA technologies for complex field operations, such as spraying, is slow due to high acquisition costs, low adaptability, and slow operating speed. In this study, we designed, built, optimized, and tested a Modular Agrochemical Precision Sprayer (MAPS), a robotic sprayer with an intelligent machine vision system (MVS). Our work focused on identifying and spraying on the targeted plants with low cost, high speed, and high accuracy in a remote, dynamic, and rugged environment. We first researched and benchmarked combinations of one-stage convolutional neural network (CNN) architectures with embedded or mobile hardware systems. Our analysis revealed that TensorRT-optimized SSD-MobilenetV1 on an NVIDIA Jetson Nano provided sufficient plant detection performance with low cost and power consumption. We also developed an algorithm to determine the maximum operating velocity of a chosen CNN and hardware configuration through modeling and simulation. Based on these results, we developed a CNN-based MVS for real-time plant detection and velocity estimation. We implemented Robot Operating System (ROS) to integrate each module for easy expansion. We also developed a robust dynamic targeting algorithm to synchronize the spray operation with the robot motion, which will increase productivity significantly. The research proved to be successful. We built a MAPS with three independent vision and spray modules. In the lab test, the sprayer recognized and hit all targets with only 2% wrong sprays. In the field test with an unstructured crop layout, such as a broadcast-seeded soybean field, the MAPS also successfully sprayed all targets with only a 7% incorrect spray rate
Classification of the difficulty in accelerating problems using GPUs
Scientists continually require additional processing power, as this enables them to compute larger problem sizes, use more complex models and algorithms, and solve problems previously thought computationally impractical. General-purpose computation on graphics processing units (GPGPU) can help in this regard, as there is great potential in using graphics processors to accelerate many scientific models and algorithms. However, some problems are considerably harder to accelerate than others, and it may be challenging for those new to GPGPU to ascertain the difficulty of accelerating a particular problem or seek appropriate optimisation guidance. Through what was learned in the acceleration of a hydrological uncertainty ensemble model, large numbers of k-difference string comparisons, and a radix sort, problem attributes have been identified that can assist in the evaluation of the difficulty in accelerating a problem using GPUs. The identified attributes are inherent parallelism, branch divergence, problem size, required computational parallelism, memory access pattern regularity, data transfer overhead, and thread cooperation. Using these attributes as difficulty indicators, an initial problem difficulty classification framework has been created that aids in GPU acceleration difficulty evaluation. This framework further facilitates directed guidance on suggested optimisations and required knowledge based on problem classification, which has been demonstrated for the aforementioned accelerated problems. It is anticipated that this framework, or a derivative thereof, will prove to be a useful resource for new or novice GPGPU developers in the evaluation of potential problems for GPU acceleration
Recommended from our members
Using GPU acceleration and a novel artificial neural networks approach for ultra-fast fluorescence lifetime imaging microscopy analysis
Fluorescence lifetime imaging microscopy (FLIM) which is capable of visualizing local molecular and physiological parameters in living cells, plays a significant role in biological sciences, chemistry, and medical research. In order to unveil dynamic cellular processes, it is necessary to develop high-speed FLIM technology. Thanks to the development of highly parallel time-to-digital convertor (TDC) arrays, especially when integrated with single-photon avalanche diodes (SPADs), the acquisition rate of high-resolution fluorescence lifetime imaging has been dramatically improved.
On the other hand, these technological advances and advanced data acquisition systems have generated massive data, which significantly increases the difficulty of FLIM analysis. Traditional FLIM systems rely on time-consuming iterative algorithms to retrieve the FLIM parameters. Therefore, lifetime analysis has become a bottleneck for high-speed FLIM applications, let alone real-time or video-rate FLIM systems. Although some simple algorithms have been proposed, most of them are only able to resolve a simple FLIM decay model. On the other hand, existing FLIM systems based on CPU processing do not make use of available parallel acceleration.
In order to tackle the existing problems, my study focused on introducing the state-of-art general purpose graphics processing units (GPUs) to the FLIM analysis, and building a data processing system based on both CPU and GPUs. With a large amount of parallel cores, the GPUs are able to significantly speed up lifetime analysis compared to CPU-only processing. In addition to transform the existing algorithms into GPU computing, I have developed a new high-speed and GPU friendly algorithm based on an artificial neural network (ANN). The proposed GPU-ANN-FLIM method has dramatically improved the efficiency of FLIM analysis, which is at least 1000-folder faster than some traditional algorithms, meaning that it has great potential to fuel current revolutions in high-speed high-resolution FLIM applications
Computational methods and software for the design of inertial microfluidic flow sculpting devices
The ability to sculpt inertially flowing fluid via bluff body obstacles has enormous promise for applications in bioengineering, chemistry, and manufacturing within microfluidic devices. However, the computational difficulty inherent to full scale 3-dimensional fluid flow simulations makes designing and optimizing such systems tedious, costly, and generally tasked to computational experts with access to high performance resources. The goal of this work is to construct efficient models for the design of inertial microfluidic flow sculpting devices, and implement these models in freely available, user-friendly software for the broader microfluidics community. Two software packages were developed to accomplish this: uFlow and FlowSculpt . uFlow solves the forward problem in flow sculpting, that of predicting the net deformation from an arbitrary sequence of obstacles (pillars), and includes estimations of transverse mass diffusion and particles formed by optical lithography. FlowSculpt solves the more difficult inverse problem in flow sculpting, which is to design a flow sculpting device which produces a target flow shape. Each piece of software uses efficient, experimentally validated forward models developed within this work, which are applied to deep learning techniques to explore other routes to solving the inverse problem. The models are also highly modular, capable of incorporating new microfluidic components and flow physics to the design process. It is anticipated that the microfluidics community will integrate the tools developed here into their own research, and bring new designs, components, and applications to the inertial flow sculpting platform
- …