957 research outputs found
ArborX: A Performance Portable Geometric Search Library
Searching for geometric objects that are close in space is a fundamental
component of many applications. The performance of search algorithms comes to
the forefront as the size of a problem increases both in terms of total object
count as well as in the total number of search queries performed. Scientific
applications requiring modern leadership-class supercomputers also pose an
additional requirement of performance portability, i.e. being able to
efficiently utilize a variety of hardware architectures. In this paper, we
introduce a new open-source C++ search library, ArborX, which we have designed
for modern supercomputing architectures. We examine scalable search algorithms
with a focus on performance, including a highly efficient parallel bounding
volume hierarchy implementation, and propose a flexible interface making it
easy to integrate with existing applications. We demonstrate the performance
portability of ArborX on multi-core CPUs and GPUs, and compare it to the
state-of-the-art libraries such as Boost.Geometry.Index and nanoflann
Algorithmic patterns for -matrices on many-core processors
In this work, we consider the reformulation of hierarchical ()
matrix algorithms for many-core processors with a model implementation on
graphics processing units (GPUs). matrices approximate specific
dense matrices, e.g., from discretized integral equations or kernel ridge
regression, leading to log-linear time complexity in dense matrix-vector
products. The parallelization of matrix operations on many-core
processors is difficult due to the complex nature of the underlying algorithms.
While previous algorithmic advances for many-core hardware focused on
accelerating existing matrix CPU implementations by many-core
processors, we here aim at totally relying on that processor type. As main
contribution, we introduce the necessary parallel algorithmic patterns allowing
to map the full matrix construction and the fast matrix-vector
product to many-core hardware. Here, crucial ingredients are space filling
curves, parallel tree traversal and batching of linear algebra operations. The
resulting model GPU implementation hmglib is the, to the best of the authors
knowledge, first entirely GPU-based Open Source matrix library of
this kind. We conclude this work by an in-depth performance analysis and a
comparative performance study against a standard matrix library,
highlighting profound speedups of our many-core parallel approach
A scalable H-matrix approach for the solution of boundary integral equations on multi-GPU clusters
In this work, we consider the solution of boundary integral equations by
means of a scalable hierarchical matrix approach on clusters equipped with
graphics hardware, i.e. graphics processing units (GPUs). To this end, we
extend our existing single-GPU hierarchical matrix library hmglib such that it
is able to scale on many GPUs and such that it can be coupled to arbitrary
application codes. Using a model GPU implementation of a boundary element
method (BEM) solver, we are able to achieve more than 67 percent relative
parallel speed-up going from 128 to 1024 GPUs for a model geometry test case
with 1.5 million unknowns and a real-world geometry test case with almost 1.2
million unknowns. On 1024 GPUs of the cluster Titan, it takes less than 6
minutes to solve the 1.5 million unknowns problem, with 5.7 minutes for the
setup phase and 20 seconds for the iterative solver. To the best of the
authors' knowledge, we here discuss the first fully GPU-based
distributed-memory parallel hierarchical matrix Open Source library using the
traditional H-matrix format and adaptive cross approximation with an
application to BEM problems
Visualization and inspection of the geometry of particle packings
Gegenstand dieser Dissertation ist die Entwicklung von effizienten Verfahren zur Visualisierung und
Inspektion der Geometrie von Partikelmischungen. Um das Verhalten der Simulation für die
Partikelmischung besser zu verstehen und zu überwachen, sollten nicht nur die Partikel selbst, sondern auch
spezielle von den Partikeln gebildete Bereiche, die den Simulationsfortschritt und die räumliche Verteilung
von Hotspots anzeigen können, visualisiert werden können. Dies sollte auch bei großen Packungen mit
Millionen von Partikeln zumindest mit einer interaktiven Darstellungsgeschwindigkeit möglich sein. . Da
die Simulation auf der Grafikkarte (GPU) durchgeführt wird, sollten die Visualisierungstechniken die Daten
des GPU-Speichers vollständig nutzen.
Um die Qualität von trockenen Partikelmischungen wie Beton zu verbessern, wurde der
Korngrößenverteilung große Aufmerksamkeit gewidmet, die die Raumfüllungsrate hauptsächlich
beeinflusst und daher zwei der wichtigsten Eigenschaften des Betons bestimmt: die strukturelle Robustheit
und die Haltbarkeit. Anhand der Korngrößenverteilung kann die Raumfüllungsrate durch
Computersimulationen bestimmt werden, die analytischen Ansätzen in der Praxis wegen der breiten
Größenverteilung der Partikel oft überlegen sind. Eine der weit verbreiteten Simulationsmethoden ist das
Collective Rearrangement, bei dem die Partikel zunächst an zufälligen Positionen innerhalb eines Behälters
platziert werden. Später werden Überlappungen zwischen Partikeln aufgelöst, indem überlappende Partikel
voneinander weggedrückt werden. Durch geschickte Anpassung der Behältergröße während der Simulation,
kann die Collective Rearrangement-Methode am Ende eine ziemlich dichte Partikelpackung generieren.
Es ist jedoch sehr schwierig, den gesamten Simulationsprozess ohne ein interaktives Visualisierungstool zu
optimieren oder dort Fehler zu finden.
Ausgehend von der etablierten rasterisierungsbasierten Methode zum Darstellen einer großen Menge von
Kugeln, bietet diese Dissertation zunächst schnelle und pixelgenaue Methoden zur neuartigen
Visualisierung der Überlappungen und Freiräume zwischen kugelförmigen Partikeln innerhalb eines
Behälters.. Die auf Rasterisierung basierenden Verfahren funktionieren gut für kleinere Partikelpackungen
bis ca. eine Million Kugeln. Bei größeren Packungen entstehen Probleme durch die lineare Laufzeit und
den Speicherverbrauch. Zur Lösung dieses Problems werden neue Methoden mit Hilfe von Raytracing
zusammen mit zwei neuen Arten von Bounding-Volume-Hierarchien (BVHs) bereitgestellt. Diese können
den Raytracing-Prozess deutlich beschleunigen --- die erste kann die vorhandene Datenstruktur für die
Simulation wiederverwenden und die zweite ist speichereffizienter. Beide BVHs nutzen die Idee des Loose
Octree und sind die ersten ihrer Art, die die Größe von Primitiven für interaktives Raytracing mit häufig
aktualisierten Beschleunigungsdatenstrukturen berücksichtigen. Darüber hinaus können die
Visualisierungstechniken in dieser Dissertation auch angepasst werden, um Eigenschaften wie das
Volumen bestimmter Bereiche zu berechnen.
All diese Visualisierungstechniken werden dann auf den Fall nicht-sphärischer Partikel erweitert, bei denen
ein nicht-sphärisches Partikel durch ein starres System von Kugeln angenähert wird, um die vorhandene
kugelbasierte Simulation wiederverwenden zu können. Dazu wird auch eine neue GPU-basierte Methode
zum effizienten Füllen eines nicht-kugelförmigen Partikels mit polydispersen überlappenden Kugeln
vorgestellt, so dass ein Partikel mit weniger Kugeln gefüllt werden kann, ohne die Raumfüllungsrate zu
beeinträchtigen. Dies erleichtert sowohl die Simulation als auch die Visualisierung.
Basierend auf den Arbeiten in dieser Dissertation können ausgefeiltere Algorithmen entwickelt werden, um
großskalige nicht-sphärische Partikelmischungen effizienter zu visualisieren. Weiterhin kann in Zukunft
Hardware-Raytracing neuerer Grafikkarten anstelle des in dieser Dissertation eingesetzten Software-Raytracing verwendet werden. Die neuen Techniken können auch als Grundlage für die interaktive
Visualisierung anderer partikelbasierter Simulationen verwendet werden, bei denen spezielle Bereiche wie
Freiräume oder Überlappungen zwischen Partikeln relevant sind.The aim of this dissertation is to find efficient techniques for visualizing and inspecting the geometry of
particle packings. Simulations of such packings are used e.g. in material sciences to predict properties of
granular materials. To better understand and supervise the behavior of these simulations, not only the
particles themselves but also special areas formed by the particles that can show the progress of the
simulation and spatial distribution of hot spots, should be visualized. This should be possible with a frame
rate that allows interaction even for large scale packings with millions of particles. Moreover, given the
simulation is conducted in the GPU, the visualization techniques should take full use of the data in the GPU
memory.
To improve the performance of granular materials like concrete, considerable attention has been paid to the
particle size distribution, which is the main determinant for the space filling rate and therefore affects two
of the most important properties of the concrete: the structural robustness and the durability. Given the
particle size distribution, the space filling rate can be determined by computer simulations, which are often
superior to analytical approaches due to irregularities of particles and the wide range of size distribution in
practice. One of the widely adopted simulation methods is the collective rearrangement, for which particles
are first placed at random positions inside a container, later overlaps between particles will be resolved by
letting overlapped particles push away from each other to fill empty space in the container. By cleverly
adjusting the size of the container according to the process of the simulation, the collective rearrangement
method could get a pretty dense particle packing in the end. However, it is very hard to fine-tune or debug
the whole simulation process without an interactive visualization tool.
Starting from the well-established rasterization-based method to render spheres, this dissertation first
provides new fast and pixel-accurate methods to visualize the overlaps and free spaces between spherical
particles inside a container. The rasterization-based techniques perform well for small scale particle
packings but deteriorate for large scale packings due to the large memory requirements that are hard to be
approximated correctly in advance. To address this problem, new methods based on ray tracing are provided
along with two new kinds of bounding volume hierarchies (BVHs) to accelerate the ray tracing process ---
the first one can reuse the existing data structure for simulation and the second one is more memory efficient.
Both BVHs utilize the idea of loose octree and are the first of their kind to consider the size of primitives
for interactive ray tracing with frequently updated acceleration structures. Moreover, the visualization
techniques provided in this dissertation can also be adjusted to calculate properties such as volumes of the
specific areas.
All these visualization techniques are then extended to non-spherical particles, where a non-spherical
particle is approximated by a rigid system of spheres to reuse the existing simulation. To this end a new
GPU-based method is presented to fill a non-spherical particle with polydisperse possibly overlapping
spheres efficiently, so that a particle can be filled with fewer spheres without sacrificing the space filling
rate. This eases both simulation and visualization.
Based on approaches presented in this dissertation, more sophisticated algorithms can be developed to
visualize large scale non-spherical particle mixtures more efficiently. Besides, one can try to exploit the
hardware ray tracing of more recent graphic cards instead of maintaining the software ray tracing as in this
dissertation. The new techniques can also become the basis for interactively visualizing other particle-based
simulations, where special areas such as free space or overlaps between particles are of interest
Faster Ray Tracing through Hierarchy Cut Code
We propose a novel ray reordering technique to accelerate the ray tracing
process by encoding and sorting rays prior to traversal. Instead of spatial
coordinates, our method encodes rays according to the cuts of the hierarchical
acceleration structure, which is called the hierarchy cut code. This approach
can better adapt to the acceleration structure and obtain a more reliable
encoding result. We also propose a compression scheme to decrease the sorting
overhead by a shorter sorting key. In addition, based on the phenomenon of
boundary drift, we theoretically explain the reason why existing reordering
methods cannot achieve better performance by using longer sorting keys. The
experiment demonstrates that our method can accelerate secondary ray tracing by
up to 1.81 times, outperforming the existing methods. Such result proves the
effectiveness of hierarchy cut code, and indicate that the reordering technique
can achieve greater performance improvement, which worth further research
Sparse Volumetric Deformation
Volume rendering is becoming increasingly popular as applications require realistic solid shape representations with seamless texture mapping and accurate filtering. However rendering sparse volumetric data is difficult because of the limited memory and processing capabilities of current hardware. To address these limitations, the volumetric information can be stored at progressive resolutions in the hierarchical branches of a tree structure, and sampled according to the region of interest. This means that only a partial region of the full dataset is processed, and therefore massive volumetric scenes can be rendered efficiently.
The problem with this approach is that it currently only supports static scenes. This is because it is difficult to accurately deform massive amounts of volume elements and reconstruct the scene hierarchy in real-time. Another problem is that deformation operations distort the shape where more than one volume element tries to occupy the same location, and similarly gaps occur where deformation stretches the elements further than one discrete location. It is also challenging to efficiently support sophisticated deformations at hierarchical resolutions, such as character skinning or physically based animation. These types of deformation are expensive and require a control structure (for example a cage or skeleton) that maps to a set of features to accelerate the deformation process. The problems with this technique are that the varying volume hierarchy reflects different feature sizes, and manipulating the features at the original resolution is too expensive; therefore the control structure must also hierarchically capture features according to the varying volumetric resolution.
This thesis investigates the area of deforming and rendering massive amounts of dynamic volumetric content. The proposed approach efficiently deforms hierarchical volume elements without introducing artifacts and supports both ray casting and rasterization renderers. This enables light transport to be modeled both accurately and efficiently with applications in the fields of real-time rendering and computer animation. Sophisticated volumetric deformation, including character animation, is also supported in real-time. This is achieved by automatically generating a control skeleton which is mapped to the varying feature resolution of the volume hierarchy. The output deformations are demonstrated in massive dynamic volumetric scenes
A scalable parallel finite element framework for growing geometries. Application to metal additive manufacturing
This work introduces an innovative parallel, fully-distributed finite element
framework for growing geometries and its application to metal additive
manufacturing. It is well-known that virtual part design and qualification in
additive manufacturing requires highly-accurate multiscale and multiphysics
analyses. Only high performance computing tools are able to handle such
complexity in time frames compatible with time-to-market. However, efficiency,
without loss of accuracy, has rarely held the centre stage in the numerical
community. Here, in contrast, the framework is designed to adequately exploit
the resources of high-end distributed-memory machines. It is grounded on three
building blocks: (1) Hierarchical adaptive mesh refinement with octree-based
meshes; (2) a parallel strategy to model the growth of the geometry; (3)
state-of-the-art parallel iterative linear solvers. Computational experiments
consider the heat transfer analysis at the part scale of the printing process
by powder-bed technologies. After verification against a 3D benchmark, a
strong-scaling analysis assesses performance and identifies major sources of
parallel overhead. A third numerical example examines the efficiency and
robustness of (2) in a curved 3D shape. Unprecedented parallelism and
scalability were achieved in this work. Hence, this framework contributes to
take on higher complexity and/or accuracy, not only of part-scale simulations
of metal or polymer additive manufacturing, but also in welding, sedimentation,
atherosclerosis, or any other physical problem where the physical domain of
interest grows in time
Methods for fast construction of bounding volume hierarchies
katedra počítačové grafiky a interakc
- …