14,461 research outputs found

    Towards a Mini-App for Smoothed Particle Hydrodynamics at Exascale

    Full text link
    The smoothed particle hydrodynamics (SPH) technique is a purely Lagrangian method, used in numerical simulations of fluids in astrophysics and computational fluid dynamics, among many other fields. SPH simulations with detailed physics represent computationally-demanding calculations. The parallelization of SPH codes is not trivial due to the absence of a structured grid. Additionally, the performance of the SPH codes can be, in general, adversely impacted by several factors, such as multiple time-stepping, long-range interactions, and/or boundary conditions. This work presents insights into the current performance and functionalities of three SPH codes: SPHYNX, ChaNGa, and SPH-flow. These codes are the starting point of an interdisciplinary co-design project, SPH-EXA, for the development of an Exascale-ready SPH mini-app. To gain such insights, a rotating square patch test was implemented as a common test simulation for the three SPH codes and analyzed on two modern HPC systems. Furthermore, to stress the differences with the codes stemming from the astrophysics community (SPHYNX and ChaNGa), an additional test case, the Evrard collapse, has also been carried out. This work extrapolates the common basic SPH features in the three codes for the purpose of consolidating them into a pure-SPH, Exascale-ready, optimized, mini-app. Moreover, the outcome of this serves as direct feedback to the parent codes, to improve their performance and overall scalability.Comment: 18 pages, 4 figures, 5 tables, 2018 IEEE International Conference on Cluster Computing proceedings for WRAp1

    A Parallel Solver for Graph Laplacians

    Full text link
    Problems from graph drawing, spectral clustering, network flow and graph partitioning can all be expressed in terms of graph Laplacian matrices. There are a variety of practical approaches to solving these problems in serial. However, as problem sizes increase and single core speeds stagnate, parallelism is essential to solve such problems quickly. We present an unsmoothed aggregation multigrid method for solving graph Laplacians in a distributed memory setting. We introduce new parallel aggregation and low degree elimination algorithms targeted specifically at irregular degree graphs. These algorithms are expressed in terms of sparse matrix-vector products using generalized sum and product operations. This formulation is amenable to linear algebra using arbitrary distributions and allows us to operate on a 2D sparse matrix distribution, which is necessary for parallel scalability. Our solver outperforms the natural parallel extension of the current state of the art in an algorithmic comparison. We demonstrate scalability to 576 processes and graphs with up to 1.7 billion edges.Comment: PASC '18, Code: https://github.com/ligmg/ligm

    Pseudorehearsal in actor-critic agents with neural network function approximation

    Full text link
    Catastrophic forgetting has a significant negative impact in reinforcement learning. The purpose of this study is to investigate how pseudorehearsal can change performance of an actor-critic agent with neural-network function approximation. We tested agent in a pole balancing task and compared different pseudorehearsal approaches. We have found that pseudorehearsal can assist learning and decrease forgetting

    Pseudorehearsal in actor-critic agents with neural network function approximation

    Get PDF
    Catastrophic forgetting has a significant negative impact in reinforcement learning. The purpose of this study is to investigate how pseudorehearsal can change performance of an actor-critic agent with neural-network function approximation. We tested agent in a pole balancing task and compared different pseudorehearsal approaches. We have found that pseudorehearsal can assist learning and decrease forgetting

    Dynamic Occupancy Grid Prediction for Urban Autonomous Driving: A Deep Learning Approach with Fully Automatic Labeling

    Full text link
    Long-term situation prediction plays a crucial role in the development of intelligent vehicles. A major challenge still to overcome is the prediction of complex downtown scenarios with multiple road users, e.g., pedestrians, bikes, and motor vehicles, interacting with each other. This contribution tackles this challenge by combining a Bayesian filtering technique for environment representation, and machine learning as long-term predictor. More specifically, a dynamic occupancy grid map is utilized as input to a deep convolutional neural network. This yields the advantage of using spatially distributed velocity estimates from a single time step for prediction, rather than a raw data sequence, alleviating common problems dealing with input time series of multiple sensors. Furthermore, convolutional neural networks have the inherent characteristic of using context information, enabling the implicit modeling of road user interaction. Pixel-wise balancing is applied in the loss function counteracting the extreme imbalance between static and dynamic cells. One of the major advantages is the unsupervised learning character due to fully automatic label generation. The presented algorithm is trained and evaluated on multiple hours of recorded sensor data and compared to Monte-Carlo simulation

    A GPU-accelerated package for simulation of flow in nanoporous source rocks with many-body dissipative particle dynamics

    Full text link
    Mesoscopic simulations of hydrocarbon flow in source shales are challenging, in part due to the heterogeneous shale pores with sizes ranging from a few nanometers to a few micrometers. Additionally, the sub-continuum fluid-fluid and fluid-solid interactions in nano- to micro-scale shale pores, which are physically and chemically sophisticated, must be captured. To address those challenges, we present a GPU-accelerated package for simulation of flow in nano- to micro-pore networks with a many-body dissipative particle dynamics (mDPD) mesoscale model. Based on a fully distributed parallel paradigm, the code offloads all intensive workloads on GPUs. Other advancements, such as smart particle packing and no-slip boundary condition in complex pore geometries, are also implemented for the construction and the simulation of the realistic shale pores from 3D nanometer-resolution stack images. Our code is validated for accuracy and compared against the CPU counterpart for speedup. In our benchmark tests, the code delivers nearly perfect strong scaling and weak scaling (with up to 512 million particles) on up to 512 K20X GPUs on Oak Ridge National Laboratory's (ORNL) Titan supercomputer. Moreover, a single-GPU benchmark on ORNL's SummitDev and IBM's AC922 suggests that the host-to-device NVLink can boost performance over PCIe by a remarkable 40\%. Lastly, we demonstrate, through a flow simulation in realistic shale pores, that the CPU counterpart requires 840 Power9 cores to rival the performance delivered by our package with four V100 GPUs on ORNL's Summit architecture. This simulation package enables quick-turnaround and high-throughput mesoscopic numerical simulations for investigating complex flow phenomena in nano- to micro-porous rocks with realistic pore geometries

    Detecting Lesion Bounding Ellipses With Gaussian Proposal Networks

    Full text link
    Lesions characterized by computed tomography (CT) scans, are arguably often elliptical objects. However, current lesion detection systems are predominantly adopted from the popular Region Proposal Networks (RPNs) that only propose bounding boxes without fully leveraging the elliptical geometry of lesions. In this paper, we present Gaussian Proposal Networks (GPNs), a novel extension to RPNs, to detect lesion bounding ellipses. Instead of directly regressing the rotation angle of the ellipse as the common practice, GPN represents bounding ellipses as 2D Gaussian distributions on the image plain and minimizes the Kullback-Leibler (KL) divergence between the proposed Gaussian and the ground truth Gaussian for object localization. We show the KL divergence loss approximately incarnates the regression loss in the RPN framework when the rotation angle is 0. Experiments on the DeepLesion dataset show that GPN significantly outperforms RPN for lesion bounding ellipse detection thanks to lower localization error. GPN is open sourced at https://github.com/baidu-research/GP
    corecore