1,712 research outputs found
Parallel Rendering and Large Data Visualization
We are living in the big data age: An ever increasing amount of data is being
produced through data acquisition and computer simulations. While large scale
analysis and simulations have received significant attention for cloud and
high-performance computing, software to efficiently visualise large data sets
is struggling to keep up.
Visualization has proven to be an efficient tool for understanding data, in
particular visual analysis is a powerful tool to gain intuitive insight into
the spatial structure and relations of 3D data sets. Large-scale visualization
setups are becoming ever more affordable, and high-resolution tiled display
walls are in reach even for small institutions. Virtual reality has arrived in
the consumer space, making it accessible to a large audience.
This thesis addresses these developments by advancing the field of parallel
rendering. We formalise the design of system software for large data
visualization through parallel rendering, provide a reference implementation of
a parallel rendering framework, introduce novel algorithms to accelerate the
rendering of large amounts of data, and validate this research and development
with new applications for large data visualization. Applications built using
our framework enable domain scientists and large data engineers to better
extract meaning from their data, making it feasible to explore more data and
enabling the use of high-fidelity visualization installations to see more
detail of the data.Comment: PhD thesi
GPU Accelerated Particle Visualization with Splotch
Splotch is a rendering algorithm for exploration and visual discovery in
particle-based datasets coming from astronomical observations or numerical
simulations. The strengths of the approach are production of high quality
imagery and support for very large-scale datasets through an effective mix of
the OpenMP and MPI parallel programming paradigms. This article reports our
experiences in re-designing Splotch for exploiting emerging HPC architectures
nowadays increasingly populated with GPUs. A performance model is introduced
for data transfers, computations and memory access, to guide our re-factoring
of Splotch. A number of parallelization issues are discussed, in particular
relating to race conditions and workload balancing, towards achieving optimal
performances. Our implementation was accomplished by using the CUDA programming
paradigm. Our strategy is founded on novel schemes achieving optimized data
organisation and classification of particles. We deploy a reference simulation
to present performance results on acceleration gains and scalability. We
finally outline our vision for future work developments including possibilities
for further optimisations and exploitation of emerging technologies.Comment: 25 pages, 9 figures. Astronomy and Computing (2014
A Power-Aware Framework for Executing Streaming Programs on Networks-on-Chip
Nilesh Karavadara, Simon Folie, Michael Zolda, Vu Thien Nga Nguyen, Raimund Kirner, 'A Power-Aware Framework for Executing Streaming Programs on Networks-on-Chip'. Paper presented at the Int'l Workshop on Performance, Power and Predictability of Many-Core Embedded Systems (3PMCES'14), Dresden, Germany, 24-28 March 2014.Software developers are discovering that practices which have successfully served single-core platforms for decades do no longer work for multi-cores. Stream processing is a parallel execution model that is well-suited for architectures with multiple computational elements that are connected by a network. We propose a power-aware streaming execution layer for network-on-chip architectures that addresses the energy constraints of embedded devices. Our proof-of-concept implementation targets the Intel SCC processor, which connects 48 cores via a network-on- chip. We motivate our design decisions and describe the status of our implementation
GPU Cost Estimation for Load Balancing in Parallel Ray Tracing
Interactive ray tracing has seen enormous progress in recent years. However, advanced rendering techniques requiring many million rays per second are still not feasible at interactive speed, and are only possible by means of highly parallel ray tracing. When using compute clusters, good load balancing is crucial in order to fully exploit the available computational power, and to not suffer from the overhead involved by synchronization barriers. In this paper, we present a novel GPU method to compute a costmap: a per-pixel cost estimate of the ray tracing rendering process. We show that the cost map is a powerful tool to improve load balancing in
parallel ray tracing, and it can be used for adaptive task partitioning and enhanced dynamic load balancing. Its effectiveness has been proven in a parallel ray tracer implementation tailored for a cluster of workstations
Multicore-optimized wavefront diamond blocking for optimizing stencil updates
The importance of stencil-based algorithms in computational science has
focused attention on optimized parallel implementations for multilevel
cache-based processors. Temporal blocking schemes leverage the large bandwidth
and low latency of caches to accelerate stencil updates and approach
theoretical peak performance. A key ingredient is the reduction of data traffic
across slow data paths, especially the main memory interface. In this work we
combine the ideas of multi-core wavefront temporal blocking and diamond tiling
to arrive at stencil update schemes that show large reductions in memory
pressure compared to existing approaches. The resulting schemes show
performance advantages in bandwidth-starved situations, which are exacerbated
by the high bytes per lattice update case of variable coefficients. Our thread
groups concept provides a controllable trade-off between concurrency and memory
usage, shifting the pressure between the memory interface and the CPU. We
present performance results on a contemporary Intel processor
- …