4,302 research outputs found
High Performance Algorithms for Counting Collisions and Pairwise Interactions
The problem of counting collisions or interactions is common in areas as
computer graphics and scientific simulations. Since it is a major bottleneck in
applications of these areas, a lot of research has been carried out on such
subject, mainly focused on techniques that allow calculations to be performed
within pruned sets of objects. This paper focuses on how interaction
calculation (such as collisions) within these sets can be done more efficiently
than existing approaches. Two algorithms are proposed: a sequential algorithm
that has linear complexity at the cost of high memory usage; and a parallel
algorithm, mathematically proved to be correct, that manages to use GPU
resources more efficiently than existing approaches. The proposed and existing
algorithms were implemented, and experiments show a speedup of 21.7 for the
sequential algorithm (on small problem size), and 1.12 for the parallel
proposal (large problem size). By improving interaction calculation, this work
contributes to research areas that promote interconnection in the modern world,
such as computer graphics and robotics.Comment: Accepted in ICCS 2019 and published in Springer's LNCS series.
Supplementary content at https://mjsaldanha.com/articles/1-hpc-ssp
Gunrock: GPU Graph Analytics
For large-scale graph analytics on the GPU, the irregularity of data access
and control flow, and the complexity of programming GPUs, have presented two
significant challenges to developing a programmable high-performance graph
library. "Gunrock", our graph-processing system designed specifically for the
GPU, uses a high-level, bulk-synchronous, data-centric abstraction focused on
operations on a vertex or edge frontier. Gunrock achieves a balance between
performance and expressiveness by coupling high performance GPU computing
primitives and optimization strategies with a high-level programming model that
allows programmers to quickly develop new graph primitives with small code size
and minimal GPU programming knowledge. We characterize the performance of
various optimization strategies and evaluate Gunrock's overall performance on
different GPU architectures on a wide range of graph primitives that span from
traversal-based algorithms and ranking algorithms, to triangle counting and
bipartite-graph-based algorithms. The results show that on a single GPU,
Gunrock has on average at least an order of magnitude speedup over Boost and
PowerGraph, comparable performance to the fastest GPU hardwired primitives and
CPU shared-memory graph libraries such as Ligra and Galois, and better
performance than any other GPU high-level graph library.Comment: 52 pages, invited paper to ACM Transactions on Parallel Computing
(TOPC), an extended version of PPoPP'16 paper "Gunrock: A High-Performance
Graph Processing Library on the GPU
Performance Evaluation of Distributed Computing Environments with Hadoop and Spark Frameworks
Recently, due to rapid development of information and communication
technologies, the data are created and consumed in the avalanche way.
Distributed computing create preconditions for analyzing and processing such
Big Data by distributing the computations among a number of compute nodes. In
this work, performance of distributed computing environments on the basis of
Hadoop and Spark frameworks is estimated for real and virtual versions of
clusters. As a test task, we chose the classic use case of word counting in
texts of various sizes. It was found that the running times grow very fast with
the dataset size and faster than a power function even. As to the real and
virtual versions of cluster implementations, this tendency is the similar for
both Hadoop and Spark frameworks. Moreover, speedup values decrease
significantly with the growth of dataset size, especially for virtual version
of cluster configuration. The problem of growing data generated by IoT and
multimodal (visual, sound, tactile, neuro and brain-computing, muscle and eye
tracking, etc.) interaction channels is presented. In the context of this
problem, the current observations as to the running times and speedup on Hadoop
and Spark frameworks in real and virtual cluster configurations can be very
useful for the proper scaling-up and efficient job management, especially for
machine learning and Deep Learning applications, where Big Data are widely
present.Comment: 5 pages, 1 table, 2017 IEEE International Young Scientists Forum on
Applied Physics and Engineering (YSF-2017) (Lviv, Ukraine
Tackling Exascale Software Challenges in Molecular Dynamics Simulations with GROMACS
GROMACS is a widely used package for biomolecular simulation, and over the
last two decades it has evolved from small-scale efficiency to advanced
heterogeneous acceleration and multi-level parallelism targeting some of the
largest supercomputers in the world. Here, we describe some of the ways we have
been able to realize this through the use of parallelization on all levels,
combined with a constant focus on absolute performance. Release 4.6 of GROMACS
uses SIMD acceleration on a wide range of architectures, GPU offloading
acceleration, and both OpenMP and MPI parallelism within and between nodes,
respectively. The recent work on acceleration made it necessary to revisit the
fundamental algorithms of molecular simulation, including the concept of
neighborsearching, and we discuss the present and future challenges we see for
exascale simulation - in particular a very fine-grained task parallelism. We
also discuss the software management, code peer review and continuous
integration testing required for a project of this complexity.Comment: EASC 2014 conference proceedin
GPU accelerated cone based shooting bouncing ray tracing
2019 Summer.Includes bibliographical references.Ray tracing can be used as an alternative method to solve complex Computational Electromagnetics (CEM) problems that would require significant time using traditional full-wave CEM solvers. Ray tracing is considered a high-frequency asymptotic solver, sacrificing accuracy for speed via approximation. Two prominent categories for ray tracing exist today: image theory techniques and ray launching techniques. Image theory involves the calculation of image points for each continuous plane within a structure. Ray launching ray tracing is comprised of spawning rays in numerous directions and tracking the intersections these rays have with the environment. While image theory ray tracing typically provides more accurate solutions compared to ray launching techniques, due to more exact computations, image theory is much slower than ray launching techniques due to exponential time complexity of the algorithm. This paper discusses a ray launching technique called shooting bouncing rays (SBR) ray tracing that applies NVIDIA graphics processing units (GPU) to achieve significant performance benefits for solving CEM problems. The GPUs are used as a tool to parallelize the core ray tracing algorithm and also to provide access to the NVIDIA OptiX ray tracing application programming interface (API) that efficiently traces rays within complex structures. The algorithm presented enables quick and efficient simulations to optimize the placement of communication nodes within complex structures. The processes and techniques used in the development of the solver and demonstrations of the validation and the application of the solver on various structures and its comparison to commercially available ray tracing software are presented
- …