2,307 research outputs found
The Iray Light Transport Simulation and Rendering System
While ray tracing has become increasingly common and path tracing is well
understood by now, a major challenge lies in crafting an easy-to-use and
efficient system implementing these technologies. Following a purely
physically-based paradigm while still allowing for artistic workflows, the Iray
light transport simulation and rendering system allows for rendering complex
scenes by the push of a button and thus makes accurate light transport
simulation widely available. In this document we discuss the challenges and
implementation choices that follow from our primary design decisions,
demonstrating that such a rendering system can be made a practical, scalable,
and efficient real-world application that has been adopted by various companies
across many fields and is in use by many industry professionals today
Scalable data abstractions for distributed parallel computations
The ability to express a program as a hierarchical composition of parts is an
essential tool in managing the complexity of software and a key abstraction
this provides is to separate the representation of data from the computation.
Many current parallel programming models use a shared memory model to provide
data abstraction but this doesn't scale well with large numbers of cores due to
non-determinism and access latency. This paper proposes a simple programming
model that allows scalable parallel programs to be expressed with distributed
representations of data and it provides the programmer with the flexibility to
employ shared or distributed styles of data-parallelism where applicable. It is
capable of an efficient implementation, and with the provision of a small set
of primitive capabilities in the hardware, it can be compiled to operate
directly on the hardware, in the same way stack-based allocation operates for
subroutines in sequential machines
A framework for efficient execution of data parallel irregular applications on heterogeneous systems
Exploiting the computing power of the diversity of resources available on heterogeneous
systems is mandatory but a very challenging task. The diversity of architectures, execution
models and programming tools, together with disjoint address spaces and di erent
computing capabilities, raise a number of challenges that severely impact on application
performance and programming productivity. This problem is further compounded in the
presence of data parallel irregular applications.
This paper presents a framework that addresses development and execution of data
parallel irregular applications in heterogeneous systems. A uni ed task-based programming
and execution model is proposed, together with inter and intra-device scheduling,
which, coupled with a data management system, aim to achieve performance scalability
across multiple devices, while maintaining high programming productivity. Intradevice
scheduling on wide SIMD/SIMT architectures resorts to consumer-producer kernels,
which, by allowing dynamic generation and rescheduling of new work units, enable
balancing irregular workloads and increase resource utilization.
Results show that regular and irregular applications scale well with the number of
devices, while requiring minimal programming e ort. Consumer-producer kernels are
able to sustain signi cant performance gains as long as the workload per basic work
unit is enough to compensate overheads associated with intra-device scheduling. This
not being the case, consumer kernels can still be used for the irregular application.
Comparisons with an alternative framework, StarPU, which targets regular workloads,
consistently demonstrate signi cant speedups. This is, to the best of our knowledge, the
rst published integrated approach that successfully handles irregular workloads over
heterogeneous systems.This work is funded by National Funds through the FCT - Fundação para a Ciência
e a Tecnologia (Portuguese Foundation for Science and Technology) and by ERDF -
European Regional Development Fund through the COMPETE Programme (operational
programme for competitiveness) within projects PEst-OE/EEI/UI0752/2014
and FCOMP-01-0124-FEDER-010067. Also by the School of Engineering, Universidade
do Minho within project P2SHOCS - Performance Portability on Scalable
Heterogeneous Computing Systems
A semi-automatic parallelization tool for Java based on fork-join synchronization patterns
Because of the increasing availability of multi-core machines, clusters, Grids, and combinations of these environments, there is now plenty of computational power available for executing compute intensive applications. However, because of the overwhelming and rapid advances in distributed and parallel hardware and environments, today?s programmers are not fully prepared to exploit distribution and parallelism. In this sense, the Java language has helped in handling the heterogeneity of such environments, but there is a lack of facilities and tools to easily distributing and parallelizing applications. One solution to mitigate this problem and make some progress towards producing general tools seems to be the synthesis of semi-automatic parallelism and Parallelism as a Concern (PaaC), which allows parallelizing applications along with as little modifications on sequential codes as possible. In this paper, we discuss a new approach that aims at overcoming the drawbacks of current Java-based parallel and distributed development tools, which precisely exploit these new conceptsFil: Hirsch, Matias. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico - CONICET - Tandil. Instituto Superior de Ingenieria del Software; Argentina;Fil: Zunino, Alejandro. Consejo Nacional de Invest.cientif.y Tecnicas. Ctro Cientifico Tecnologico Conicet - Tandil. Instituto Superior de Ingenieria del Software;Fil: Mateos Diaz, Cristian Maximiliano. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico - CONICET - Tandil. Instituto Superior de Ingenieria del Software
Multilayered Heterogeneous Parallelism Applied to Atmospheric Constituent Transport Simulation
Heterogeneous multicore chipsets with many levels of parallelism are becoming increasingly common in high-performance computing systems. Effective use of parallelism in these new chipsets constitutes the challenge facing a new generation of large scale scientific computing applications. This study examines methods for improving the performance of two-dimensional and three-dimensional atmospheric constituent transport simulation on the Cell Broadband Engine Architecture (CBEA). A function offloading approach is used in a 2D transport module, and a vector stream processing approach is used in a 3D transport module. Two methods for transferring incontiguous data between main memory and accelerator local storage are compared. By leveraging the heterogeneous parallelism of the CBEA, the 3D transport module achieves performance comparable to two nodes of an IBM BlueGene/P, or eight Intel Xeon cores, on a single PowerXCell 8i chip. Module performance on two CBEA systems, an IBM BlueGene/P, and an eight-core shared-memory Intel Xeon workstation are given
- …