347 research outputs found
OpenACC Based GPU Parallelization of Plane Sweep Algorithm for Geometric Intersection
Line segment intersection is one of the elementary operations in computational geometry. Complex problems in Geographic Information Systems (GIS) like finding map overlays or spatial joins using polygonal data require solving segment intersections. Plane sweep paradigm is used for finding geometric intersection in an efficient manner. However, it is difficult to parallelize due to its in-order processing of spatial events. We present a new fine-grained parallel algorithm for geometric intersection and its CPU and GPU implementation using OpenMP and OpenACC. To the best of our knowledge, this is the first work demonstrating an effective parallelization of plane sweep on GPUs.
We chose compiler directive based approach for implementation because of its simplicity to parallelize sequential code. Using Nvidia Tesla P100 GPU, our implementation achieves around 40X speedup for line segment intersection problem on 40K and 80K data sets compared to sequential CGAL library
Efficient multicore-aware parallelization strategies for iterative stencil computations
Stencil computations consume a major part of runtime in many scientific
simulation codes. As prototypes for this class of algorithms we consider the
iterative Jacobi and Gauss-Seidel smoothers and aim at highly efficient
parallel implementations for cache-based multicore architectures. Temporal
cache blocking is a known advanced optimization technique, which can reduce the
pressure on the memory bus significantly. We apply and refine this optimization
for a recently presented temporal blocking strategy designed to explicitly
utilize multicore characteristics. Especially for the case of Gauss-Seidel
smoothers we show that simultaneous multi-threading (SMT) can yield substantial
performance improvements for our optimized algorithm.Comment: 15 pages, 10 figure
Multicore-optimized wavefront diamond blocking for optimizing stencil updates
The importance of stencil-based algorithms in computational science has
focused attention on optimized parallel implementations for multilevel
cache-based processors. Temporal blocking schemes leverage the large bandwidth
and low latency of caches to accelerate stencil updates and approach
theoretical peak performance. A key ingredient is the reduction of data traffic
across slow data paths, especially the main memory interface. In this work we
combine the ideas of multi-core wavefront temporal blocking and diamond tiling
to arrive at stencil update schemes that show large reductions in memory
pressure compared to existing approaches. The resulting schemes show
performance advantages in bandwidth-starved situations, which are exacerbated
by the high bytes per lattice update case of variable coefficients. Our thread
groups concept provides a controllable trade-off between concurrency and memory
usage, shifting the pressure between the memory interface and the CPU. We
present performance results on a contemporary Intel processor
Algorithms for solving inverse geophysical problems on parallel computing systems
For solving inverse gravimetry problems, efficient stable parallel algorithms based on iterative gradient methods are proposed. For solving systems of linear algebraic equations with block-tridiagonal matrices arising in geoelectrics problems, a parallel matrix sweep algorithm, a square root method, and a conjugate gradient method with preconditioner are proposed. The algorithms are implemented numerically on a parallel computing system of the Institute of Mathematics and Mechanics (PCS-IMM), NVIDIA graphics processors, and an Intel multi-core CPU with some new computing technologies. The parallel algorithms are incorporated into a system of remote computations entitled "Specialized Web-Portal for Solving Geophysical Problems on Multiprocessor Computers." Some problems with "quasi-model" and real data are solved. © 2013 Pleiades Publishing, Ltd
An Efficient Framework For Fast Computer Aided Design of Microwave Circuits Based on the Higher-Order 3D Finite-Element Method
In this paper, an efficient computational framework for the full-wave design by optimization of complex microwave passive devices, such as antennas, filters, and multiplexers, is described. The framework consists of a computational engine, a 3D object modeler, and a graphical user interface. The computational engine, which is based on a finite element method with curvilinear higher-order tetrahedral elements, is coupled with built-in or external gradient-based optimization procedures. For speed, a model order reduction technique is used and the gradient computation is achieved by perturbation with geometry deformation, processed on the level of the individual mesh nodes. To maximize performance, the framework is targeted to multicore CPU architectures and its extended version can also use multiple GPUs. To illustrate the accuracy and high efficiency of the framework, we provide examples of simulations of a dielectric resonator antenna and full-wave design by optimization of two diplexers involving tens of unknowns, and show that the design can be completed within the duration of a few simulations using industry-standard FEM solvers. The accuracy of the design is confirmed by measurements
Rethinking Model Selection and Decoding for Keyphrase Generation with Pre-trained Sequence-to-Sequence Models
Keyphrase Generation (KPG) is a longstanding task in NLP with widespread
applications. The advent of sequence-to-sequence (seq2seq) pre-trained language
models (PLMs) has ushered in a transformative era for KPG, yielding promising
performance improvements. However, many design decisions remain unexplored and
are often made arbitrarily. This paper undertakes a systematic analysis of the
influence of model selection and decoding strategies on PLM-based KPG. We begin
by elucidating why seq2seq PLMs are apt for KPG, anchored by an
attention-driven hypothesis. We then establish that conventional wisdom for
selecting seq2seq PLMs lacks depth: (1) merely increasing model size or
performing task-specific adaptation is not parameter-efficient; (2) although
combining in-domain pre-training with task adaptation benefits KPG, it does
partially hinder generalization. Regarding decoding, we demonstrate that while
greedy search achieves strong F1 scores, it lags in recall compared with
sampling-based methods. Based on these insights, we propose DeSel, a
likelihood-based decode-select algorithm for seq2seq PLMs. DeSel improves
greedy search by an average of 4.7% semantic F1 across five datasets. Our
collective findings pave the way for deeper future investigations into
PLM-based KPG.Comment: EMNLP 2023 camera read
A domain-specific language and matrix-free stencil code for investigating electronic properties of Dirac and topological materials
We introduce PVSC-DTM (Parallel Vectorized Stencil Code for Dirac and
Topological Materials), a library and code generator based on a domain-specific
language tailored to implement the specific stencil-like algorithms that can
describe Dirac and topological materials such as graphene and topological
insulators in a matrix-free way. The generated hybrid-parallel (MPI+OpenMP)
code is fully vectorized using Single Instruction Multiple Data (SIMD)
extensions. It is significantly faster than matrix-based approaches on the node
level and performs in accordance with the roofline model. We demonstrate the
chip-level performance and distributed-memory scalability of basic building
blocks such as sparse matrix-(multiple-) vector multiplication on modern
multicore CPUs. As an application example, we use the PVSC-DTM scheme to (i)
explore the scattering of a Dirac wave on an array of gate-defined quantum
dots, to (ii) calculate a bunch of interior eigenvalues for strong topological
insulators, and to (iii) discuss the photoemission spectra of a disordered Weyl
semimetal.Comment: 16 pages, 2 tables, 11 figure
Acceleration of Computational Geometry Algorithms for High Performance Computing Based Geo-Spatial Big Data Analysis
Geo-Spatial computing and data analysis is the branch of computer science that deals with real world location-based data. Computational geometry algorithms are algorithms that process geometry/shapes and is one of the pillars of geo-spatial computing. Real world map and location-based data can be huge in size and the data structures used to process them extremely big leading to huge computational costs. Furthermore, Geo-Spatial datasets are growing on all V’s (Volume, Variety, Value, etc.) and are becoming larger and more complex to process in-turn demanding more computational resources. High Performance Computing is a way to breakdown the problem in ways that it can run in parallel on big computers with massive processing power and hence reduce the computing time delivering the same results but much faster.This dissertation explores different techniques to accelerate the processing of computational geometry algorithms and geo-spatial computing like using Many-core Graphics Processing Units (GPU), Multi-core Central Processing Units (CPU), Multi-node setup with Message Passing Interface (MPI), Cache optimizations, Memory and Communication optimizations, load balancing, Algorithmic Modifications, Directive based parallelization with OpenMP or OpenACC and Vectorization with compiler intrinsic (AVX). This dissertation has applied at least one of the mentioned techniques to the following problems. Novel method to parallelize plane sweep based geometric intersection for GPU with directives is presented. Parallelization of plane sweep based Voronoi construction, parallelization of Segment tree construction, Segment tree queries and Segment tree-based operations has been presented. Spatial autocorrelation, computation of getis-ord hotspots are also presented. Acceleration performance and speedup results are presented in each corresponding chapter
- …