Search CORE

11,533 research outputs found

OpenACC Based GPU Parallelization of Plane Sweep Algorithm for Geometric Intersection

Author: AB Khlopotine
JL Bentley
M McKenney
MT Goodrich
MT Goodrich
Publication venue: e-Publications@Marquette
Publication date: 01/01/2019
Field of study

Line segment intersection is one of the elementary operations in computational geometry. Complex problems in Geographic Information Systems (GIS) like finding map overlays or spatial joins using polygonal data require solving segment intersections. Plane sweep paradigm is used for finding geometric intersection in an efficient manner. However, it is difficult to parallelize due to its in-order processing of spatial events. We present a new fine-grained parallel algorithm for geometric intersection and its CPU and GPU implementation using OpenMP and OpenACC. To the best of our knowledge, this is the first work demonstrating an effective parallelization of plane sweep on GPUs. We chose compiler directive based approach for implementation because of its simplicity to parallelize sequential code. Using Nvidia Tesla P100 GPU, our implementation achieves around 40X speedup for line segment intersection problem on 40K and 80K data sets compared to sequential CGAL library

epublications@Marquette

Crossref

Recent update of the RPLUS2D/3D codes

Author: Tsai Y.-L. Peter
Publication venue
Publication date
Field of study

The development of the RPLUS2D/3D codes is summarized. These codes utilize LU algorithms to solve chemical non-equilibrium flows in a body-fitted coordinate system. The motivation behind the development of these codes is the need to numerically predict chemical non-equilibrium flows for the National AeroSpace Plane Program. Recent improvements include vectorization method, blocking algorithms for geometric flexibility, out-of-core storage for large-size problems, and an LU-SW/UP combination for CPU-time efficiency and solution quality

NASA Technical Reports Server

Efficient multicore-aware parallelization strategies for iterative stencil computations

Author: Bergen
Christen
Datta
Datta
Frigo
Hager
Kowarschik
Treibig
Wellein
Wittmann
Zeiser
Publication venue: 'Elsevier BV'
Publication date: 10/04/2010
Field of study

Stencil computations consume a major part of runtime in many scientific simulation codes. As prototypes for this class of algorithms we consider the iterative Jacobi and Gauss-Seidel smoothers and aim at highly efficient parallel implementations for cache-based multicore architectures. Temporal cache blocking is a known advanced optimization technique, which can reduce the pressure on the memory bus significantly. We apply and refine this optimization for a recently presented temporal blocking strategy designed to explicitly utilize multicore characteristics. Especially for the case of Gauss-Seidel smoothers we show that simultaneous multi-threading (SMT) can yield substantial performance improvements for our optimized algorithm.Comment: 15 pages, 10 figure

arXiv.org e-Print Archive

Crossref

A hierarchically blocked Jacobi SVD algorithm for single and multiple graphics processing units

Author: Novaković Vedran
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 27/09/2014
Field of study

We present a hierarchically blocked one-sided Jacobi algorithm for the singular value decomposition (SVD), targeting both single and multiple graphics processing units (GPUs). The blocking structure reflects the levels of GPU's memory hierarchy. The algorithm may outperform MAGMA's dgesvd, while retaining high relative accuracy. To this end, we developed a family of parallel pivot strategies on GPU's shared address space, but applicable also to inter-GPU communication. Unlike common hybrid approaches, our algorithm in a single GPU setting needs a CPU for the controlling purposes only, while utilizing GPU's resources to the fullest extent permitted by the hardware. When required by the problem size, the algorithm, in principle, scales to an arbitrary number of GPU nodes. The scalability is demonstrated by more than twofold speedup for sufficiently large matrices on a Tesla S2050 system with four GPUs vs. a single Fermi card.Comment: Accepted for publication in SIAM Journal on Scientific Computin

arXiv.org e-Print Archive

CiteSeerX

Optimization of patch antennas via multithreaded simulated annealing based design exploration

Author: Richie James
Ababei Cristinel
Publication venue: e-Publications@Marquette
Publication date: 01/10/2017
Field of study

In this paper, we present a new software framework for the optimization of the design of microstrip patch antennas. The proposed simulation and optimization framework implements a simulated annealing algorithm to perform design space exploration in order to identify the optimal patch antenna design. During each iteration of the optimization loop, we employ the popular MEEP simulation tool to evaluate explored design solutions. To speed up the design space exploration, the software framework is developed to run multiple MEEP simulations concurrently. This is achieved using multithreading to implement a manager-workers execution strategy. The number of worker threads is the same as the number of cores of the computer that is utilized. Thus, the computational runtime of the proposed software framework enables effective design space exploration. Simulations demonstrate the effectiveness of the proposed software framework

epublications@Marquette

Saint Louis University Libraries Digital Collections

Low-level Vision by Consensus in a Spatial Hierarchy of Regions

Author: Chakrabarti Ayan
Gortler Steven J.
Xiong Ying
Zickler Todd
Publication venue
Publication date: 14/04/2015
Field of study

We introduce a multi-scale framework for low-level vision, where the goal is estimating physical scene values from image data---such as depth from stereo image pairs. The framework uses a dense, overlapping set of image regions at multiple scales and a "local model," such as a slanted-plane model for stereo disparity, that is expected to be valid piecewise across the visual field. Estimation is cast as optimization over a dichotomous mixture of variables, simultaneously determining which regions are inliers with respect to the local model (binary variables) and the correct co-ordinates in the local model space for each inlying region (continuous variables). When the regions are organized into a multi-scale hierarchy, optimization can occur in an efficient and parallel architecture, where distributed computational units iteratively perform calculations and share information through sparse connections between parents and children. The framework performs well on a standard benchmark for binocular stereo, and it produces a distributional scene representation that is appropriate for combining with higher-level reasoning and other low-level cues.Comment: Accepted to CVPR 2015. Project page: http://www.ttic.edu/chakrabarti/consensus

arXiv.org e-Print Archive

Crossref

FISH: A 3D parallel MHD code for astrophysical applications

Author: Kaeppeli R.
Liebendoerfer M.
Pen U. -L.
Scheidegger S.
Whitehouse S. C.
Publication venue: 'IOP Publishing'
Publication date: 15/10/2009
Field of study

FISH is a fast and simple ideal magneto-hydrodynamics code that scales to ~10 000 processes for a Cartesian computational domain of ~1000^3 cells. The simplicity of FISH has been achieved by the rigorous application of the operator splitting technique, while second order accuracy is maintained by the symmetric ordering of the operators. Between directional sweeps, the three-dimensional data is rotated in memory so that the sweep is always performed in a cache-efficient way along the direction of contiguous memory. Hence, the code only requires a one-dimensional description of the conservation equations to be solved. This approach also enable an elegant novel parallelisation of the code that is based on persistent communications with MPI for cubic domain decomposition on machines with distributed memory. This scheme is then combined with an additional OpenMP parallelisation of different sweeps that can take advantage of clusters of shared memory. We document the detailed implementation of a second order TVD advection scheme based on flux reconstruction. The magnetic fields are evolved by a constrained transport scheme. We show that the subtraction of a simple estimate of the hydrostatic gradient from the total gradients can significantly reduce the dissipation of the advection scheme in simulations of gravitationally bound hydrostatic objects. Through its simplicity and efficiency, FISH is as well-suited for hydrodynamics classes as for large-scale astrophysical simulations on high-performance computer clusters. In preparation for the release of a public version, we demonstrate the performance of FISH in a suite of astrophysically orientated test cases.Comment: 27 pages, 11 figure

arXiv.org e-Print Archive

edoc