2,971 research outputs found
The Iray Light Transport Simulation and Rendering System
While ray tracing has become increasingly common and path tracing is well
understood by now, a major challenge lies in crafting an easy-to-use and
efficient system implementing these technologies. Following a purely
physically-based paradigm while still allowing for artistic workflows, the Iray
light transport simulation and rendering system allows for rendering complex
scenes by the push of a button and thus makes accurate light transport
simulation widely available. In this document we discuss the challenges and
implementation choices that follow from our primary design decisions,
demonstrating that such a rendering system can be made a practical, scalable,
and efficient real-world application that has been adopted by various companies
across many fields and is in use by many industry professionals today
Extreme Scale De Novo Metagenome Assembly
Metagenome assembly is the process of transforming a set of short,
overlapping, and potentially erroneous DNA segments from environmental samples
into the accurate representation of the underlying microbiomes's genomes.
State-of-the-art tools require big shared memory machines and cannot handle
contemporary metagenome datasets that exceed Terabytes in size. In this paper,
we introduce the MetaHipMer pipeline, a high-quality and high-performance
metagenome assembler that employs an iterative de Bruijn graph approach.
MetaHipMer leverages a specialized scaffolding algorithm that produces long
scaffolds and accommodates the idiosyncrasies of metagenomes. MetaHipMer is
end-to-end parallelized using the Unified Parallel C language and therefore can
run seamlessly on shared and distributed-memory systems. Experimental results
show that MetaHipMer matches or outperforms the state-of-the-art tools in terms
of accuracy. Moreover, MetaHipMer scales efficiently to large concurrencies and
is able to assemble previously intractable grand challenge metagenomes. We
demonstrate the unprecedented capability of MetaHipMer by computing the first
full assembly of the Twitchell Wetlands dataset, consisting of 7.5 billion
reads - size 2.6 TBytes.Comment: Accepted to SC1
Hierarchical N-Body problem on graphics processor unit
Galactic simulation is an important cosmological computation, and represents a classical N-body problem suitable for implementation on vector processors. Barnes-Hut algorithm is a hierarchical N-Body method used to simulate such galactic evolution systems.
Stream processing architectures expose data locality and concurrency available in multimedia applications. On the other hand, there are numerous compute-intensive scientific or engineering applications that can potentially benefit from such computational and communication models. These applications are traditionally implemented on vector processors.
Stream architecture based graphics processor units (GPUs) present a novel computational alternative for efficiently implementing such high-performance applications. Rendering on a stream architecture sustains high performance, while user-programmable modules allow implementing complex algorithms efficiently. GPUs have evolved over the years, from being fixed-function pipelines to user programmable processors.
In this thesis, we focus on the implementation of Barnes-Hut algorithm on typical current-generation programmable GPUs. We exploit computation and communication requirements present in Barnes-Hut algorithm to expose their suitability for user-programmable GPUs. Our implementation of the Barnes-Hut algorithm is formulated as a fragment shader targeting the selected GPU. We discuss implementation details, design issues, results, and challenges encountered in programming the fragment shader
木を用いた構造化並列プログラミング
High-level abstractions for parallel programming are still immature. Computations on complicated data structures such as pointer structures are considered as irregular algorithms. General graph structures, which irregular algorithms generally deal with, are difficult to divide and conquer. Because the divide-and-conquer paradigm is essential for load balancing in parallel algorithms and a key to parallel programming, general graphs are reasonably difficult. However, trees lead to divide-and-conquer computations by definition and are sufficiently general and powerful as a tool of programming. We therefore deal with abstractions of tree-based computations. Our study has started from Matsuzaki’s work on tree skeletons. We have improved the usability of tree skeletons by enriching their implementation aspect. Specifically, we have dealt with two issues. We first have implemented the loose coupling between skeletons and data structures and developed a flexible tree skeleton library. We secondly have implemented a parallelizer that transforms sequential recursive functions in C into parallel programs that use tree skeletons implicitly. This parallelizer hides the complicated API of tree skeletons and makes programmers to use tree skeletons with no burden. Unfortunately, the practicality of tree skeletons, however, has not been improved. On the basis of the observations from the practice of tree skeletons, we deal with two application domains: program analysis and neighborhood computation. In the domain of program analysis, compilers treat input programs as control-flow graphs (CFGs) and perform analysis on CFGs. Program analysis is therefore difficult to divide and conquer. To resolve this problem, we have developed divide-and-conquer methods for program analysis in a syntax-directed manner on the basis of Rosen’s high-level approach. Specifically, we have dealt with data-flow analysis based on Tarjan’s formalization and value-graph construction based on a functional formalization. In the domain of neighborhood computations, a primary issue is locality. A naive parallel neighborhood computation without locality enhancement causes a lot of cache misses. The divide-and-conquer paradigm is known to be useful also for locality enhancement. We therefore have applied algebraic formalizations and a tree-segmenting technique derived from tree skeletons to the locality enhancement of neighborhood computations.電気通信大学201
Visualization and inspection of the geometry of particle packings
Gegenstand dieser Dissertation ist die Entwicklung von effizienten Verfahren zur Visualisierung und
Inspektion der Geometrie von Partikelmischungen. Um das Verhalten der Simulation für die
Partikelmischung besser zu verstehen und zu überwachen, sollten nicht nur die Partikel selbst, sondern auch
spezielle von den Partikeln gebildete Bereiche, die den Simulationsfortschritt und die räumliche Verteilung
von Hotspots anzeigen können, visualisiert werden können. Dies sollte auch bei großen Packungen mit
Millionen von Partikeln zumindest mit einer interaktiven Darstellungsgeschwindigkeit möglich sein. . Da
die Simulation auf der Grafikkarte (GPU) durchgeführt wird, sollten die Visualisierungstechniken die Daten
des GPU-Speichers vollständig nutzen.
Um die Qualität von trockenen Partikelmischungen wie Beton zu verbessern, wurde der
Korngrößenverteilung große Aufmerksamkeit gewidmet, die die Raumfüllungsrate hauptsächlich
beeinflusst und daher zwei der wichtigsten Eigenschaften des Betons bestimmt: die strukturelle Robustheit
und die Haltbarkeit. Anhand der Korngrößenverteilung kann die Raumfüllungsrate durch
Computersimulationen bestimmt werden, die analytischen Ansätzen in der Praxis wegen der breiten
Größenverteilung der Partikel oft überlegen sind. Eine der weit verbreiteten Simulationsmethoden ist das
Collective Rearrangement, bei dem die Partikel zunächst an zufälligen Positionen innerhalb eines Behälters
platziert werden. Später werden Überlappungen zwischen Partikeln aufgelöst, indem überlappende Partikel
voneinander weggedrückt werden. Durch geschickte Anpassung der Behältergröße während der Simulation,
kann die Collective Rearrangement-Methode am Ende eine ziemlich dichte Partikelpackung generieren.
Es ist jedoch sehr schwierig, den gesamten Simulationsprozess ohne ein interaktives Visualisierungstool zu
optimieren oder dort Fehler zu finden.
Ausgehend von der etablierten rasterisierungsbasierten Methode zum Darstellen einer großen Menge von
Kugeln, bietet diese Dissertation zunächst schnelle und pixelgenaue Methoden zur neuartigen
Visualisierung der Überlappungen und Freiräume zwischen kugelförmigen Partikeln innerhalb eines
Behälters.. Die auf Rasterisierung basierenden Verfahren funktionieren gut für kleinere Partikelpackungen
bis ca. eine Million Kugeln. Bei größeren Packungen entstehen Probleme durch die lineare Laufzeit und
den Speicherverbrauch. Zur Lösung dieses Problems werden neue Methoden mit Hilfe von Raytracing
zusammen mit zwei neuen Arten von Bounding-Volume-Hierarchien (BVHs) bereitgestellt. Diese können
den Raytracing-Prozess deutlich beschleunigen --- die erste kann die vorhandene Datenstruktur für die
Simulation wiederverwenden und die zweite ist speichereffizienter. Beide BVHs nutzen die Idee des Loose
Octree und sind die ersten ihrer Art, die die Größe von Primitiven für interaktives Raytracing mit häufig
aktualisierten Beschleunigungsdatenstrukturen berücksichtigen. Darüber hinaus können die
Visualisierungstechniken in dieser Dissertation auch angepasst werden, um Eigenschaften wie das
Volumen bestimmter Bereiche zu berechnen.
All diese Visualisierungstechniken werden dann auf den Fall nicht-sphärischer Partikel erweitert, bei denen
ein nicht-sphärisches Partikel durch ein starres System von Kugeln angenähert wird, um die vorhandene
kugelbasierte Simulation wiederverwenden zu können. Dazu wird auch eine neue GPU-basierte Methode
zum effizienten Füllen eines nicht-kugelförmigen Partikels mit polydispersen überlappenden Kugeln
vorgestellt, so dass ein Partikel mit weniger Kugeln gefüllt werden kann, ohne die Raumfüllungsrate zu
beeinträchtigen. Dies erleichtert sowohl die Simulation als auch die Visualisierung.
Basierend auf den Arbeiten in dieser Dissertation können ausgefeiltere Algorithmen entwickelt werden, um
großskalige nicht-sphärische Partikelmischungen effizienter zu visualisieren. Weiterhin kann in Zukunft
Hardware-Raytracing neuerer Grafikkarten anstelle des in dieser Dissertation eingesetzten Software-Raytracing verwendet werden. Die neuen Techniken können auch als Grundlage für die interaktive
Visualisierung anderer partikelbasierter Simulationen verwendet werden, bei denen spezielle Bereiche wie
Freiräume oder Überlappungen zwischen Partikeln relevant sind.The aim of this dissertation is to find efficient techniques for visualizing and inspecting the geometry of
particle packings. Simulations of such packings are used e.g. in material sciences to predict properties of
granular materials. To better understand and supervise the behavior of these simulations, not only the
particles themselves but also special areas formed by the particles that can show the progress of the
simulation and spatial distribution of hot spots, should be visualized. This should be possible with a frame
rate that allows interaction even for large scale packings with millions of particles. Moreover, given the
simulation is conducted in the GPU, the visualization techniques should take full use of the data in the GPU
memory.
To improve the performance of granular materials like concrete, considerable attention has been paid to the
particle size distribution, which is the main determinant for the space filling rate and therefore affects two
of the most important properties of the concrete: the structural robustness and the durability. Given the
particle size distribution, the space filling rate can be determined by computer simulations, which are often
superior to analytical approaches due to irregularities of particles and the wide range of size distribution in
practice. One of the widely adopted simulation methods is the collective rearrangement, for which particles
are first placed at random positions inside a container, later overlaps between particles will be resolved by
letting overlapped particles push away from each other to fill empty space in the container. By cleverly
adjusting the size of the container according to the process of the simulation, the collective rearrangement
method could get a pretty dense particle packing in the end. However, it is very hard to fine-tune or debug
the whole simulation process without an interactive visualization tool.
Starting from the well-established rasterization-based method to render spheres, this dissertation first
provides new fast and pixel-accurate methods to visualize the overlaps and free spaces between spherical
particles inside a container. The rasterization-based techniques perform well for small scale particle
packings but deteriorate for large scale packings due to the large memory requirements that are hard to be
approximated correctly in advance. To address this problem, new methods based on ray tracing are provided
along with two new kinds of bounding volume hierarchies (BVHs) to accelerate the ray tracing process ---
the first one can reuse the existing data structure for simulation and the second one is more memory efficient.
Both BVHs utilize the idea of loose octree and are the first of their kind to consider the size of primitives
for interactive ray tracing with frequently updated acceleration structures. Moreover, the visualization
techniques provided in this dissertation can also be adjusted to calculate properties such as volumes of the
specific areas.
All these visualization techniques are then extended to non-spherical particles, where a non-spherical
particle is approximated by a rigid system of spheres to reuse the existing simulation. To this end a new
GPU-based method is presented to fill a non-spherical particle with polydisperse possibly overlapping
spheres efficiently, so that a particle can be filled with fewer spheres without sacrificing the space filling
rate. This eases both simulation and visualization.
Based on approaches presented in this dissertation, more sophisticated algorithms can be developed to
visualize large scale non-spherical particle mixtures more efficiently. Besides, one can try to exploit the
hardware ray tracing of more recent graphic cards instead of maintaining the software ray tracing as in this
dissertation. The new techniques can also become the basis for interactively visualizing other particle-based
simulations, where special areas such as free space or overlaps between particles are of interest
QuadStack: An Efficient Representation and Direct Rendering of Layered Datasets
We introduce QuadStack, a novel algorithm for volumetric data compression and direct rendering. Our algorithm exploits the data redundancy often found in layered datasets which are common in science and engineering fields such as geology, biology, mechanical engineering, medicine, etc. QuadStack first compresses the volumetric data into vertical stacks which are then compressed into a quadtree that identifies and represents the layered structures at the internal nodes. The associated data (color, material, density, etc.) and shape of these layer structures are decoupled and encoded independently, leading to high compression rates (4× to 54× of the original voxel model memory footprint in our experiments). We also introduce an algorithm for value retrieving from the QuadStack representation and we show that the access has logarithmic complexity. Because of the fast access, QuadStack is suitable for efficient data representation and direct rendering. We show that our GPU implementation performs comparably in speed with the state-of-the-art algorithms (18-79 MRays/s in our implementation), while maintaining a significantly smaller memory footprint
Adaptive GPU-accelerated force calculation for interactive rigid molecular docking using haptics
Molecular docking systems model and simulate in silico the interactions of intermolecular binding. Haptics-assisted docking enables the user to interact with the simulation via their sense of touch but a stringent time constraint on the computation of forces is imposed due to the sensitivity of the human haptic system. To simulate high fidelity smooth and stable feedback the haptic feedback loop should run at rates of 500 Hz to 1 kHz. We present an adaptive force calculation approach that can be executed in parallel on a wide range of Graphics Processing Units (GPUs) for interactive haptics-assisted docking with wider applicability to molecular simulations. Prior to the interactive session either a regular grid or an octree is selected according to the available GPU memory to determine the set of interatomic interactions within a cutoff distance. The total force is then calculated from this set. The approach can achieve force updates in less than 2 ms for molecular structures comprising hundreds of thousands of atoms each, with performance improvements of up to 90 times the speed of current CPU-based force calculation approaches used in interactive docking. Furthermore, it overcomes several computational limitations of previous approaches such as pre-computed force grids, and could potentially be used to model receptor flexibility at haptic refresh rates
Overview of database projects
The use of entity and object oriented data modeling techniques for managing Computer Aided Design (CAD) is explored
- …