1,705 research outputs found

    The Iray Light Transport Simulation and Rendering System

    Full text link
    While ray tracing has become increasingly common and path tracing is well understood by now, a major challenge lies in crafting an easy-to-use and efficient system implementing these technologies. Following a purely physically-based paradigm while still allowing for artistic workflows, the Iray light transport simulation and rendering system allows for rendering complex scenes by the push of a button and thus makes accurate light transport simulation widely available. In this document we discuss the challenges and implementation choices that follow from our primary design decisions, demonstrating that such a rendering system can be made a practical, scalable, and efficient real-world application that has been adopted by various companies across many fields and is in use by many industry professionals today

    Somoclu: An Efficient Parallel Library for Self-Organizing Maps

    Get PDF
    Somoclu is a massively parallel tool for training self-organizing maps on large data sets written in C++. It builds on OpenMP for multicore execution, and on MPI for distributing the workload across the nodes in a cluster. It is also able to boost training by using CUDA if graphics processing units are available. A sparse kernel is included, which is useful for high-dimensional but sparse data, such as the vector spaces common in text mining workflows. Python, R and MATLAB interfaces facilitate interactive use. Apart from fast execution, memory use is highly optimized, enabling training large emergent maps even on a single computer.Comment: 26 pages, 9 figures. The code is available at https://peterwittek.github.io/somoclu

    Analysing Astronomy Algorithms for GPUs and Beyond

    Full text link
    Astronomy depends on ever increasing computing power. Processor clock-rates have plateaued, and increased performance is now appearing in the form of additional processor cores on a single chip. This poses significant challenges to the astronomy software community. Graphics Processing Units (GPUs), now capable of general-purpose computation, exemplify both the difficult learning-curve and the significant speedups exhibited by massively-parallel hardware architectures. We present a generalised approach to tackling this paradigm shift, based on the analysis of algorithms. We describe a small collection of foundation algorithms relevant to astronomy and explain how they may be used to ease the transition to massively-parallel computing architectures. We demonstrate the effectiveness of our approach by applying it to four well-known astronomy problems: Hogbom CLEAN, inverse ray-shooting for gravitational lensing, pulsar dedispersion and volume rendering. Algorithms with well-defined memory access patterns and high arithmetic intensity stand to receive the greatest performance boost from massively-parallel architectures, while those that involve a significant amount of decision-making may struggle to take advantage of the available processing power.Comment: 10 pages, 3 figures, accepted for publication in MNRA

    Scalable data clustering using GPUs

    Get PDF
    The computational demands of multivariate clustering grow rapidly, and therefore processing large data sets, like those found in flow cytometry data, is very time consuming on a single CPU. Fortunately these techniques lend themselves naturally to large scale parallel processing. To address the computational demands, graphics processing units, specifically NVIDIA\u27s CUDA framework and Tesla architecture, were investigated as a low-cost, high performance solution to a number of clustering algorithms. C-means and Expectation Maximization with Gaussian mixture models were implemented using the CUDA framework. The algorithm implementations use a hybrid of CUDA, OpenMP, and MPI to scale to many GPUs on multiple nodes in a high performance computing environment. This framework is envisioned as part of a larger cloud-based workflow service where biologists can apply multiple algorithms and parameter sweeps to their data sets and quickly receive a thorough set of results that can be further analyzed by experts. Improvements over previous GPU-accelerated implementations range from 1.42x to 21x for C-means and 3.72x to 5.65x for the Gaussian mixture model on non-trivial data sets. Using a single NVIDIA GTX 260 speedups are on average 90x for C-means and 74x for Gaussians with flow cytometry files compared to optimized C code running on a single core of a modern Intel CPU. Using the TeraGrid Lincoln high performance cluster at NCSA C-means achieves 42% parallel efficiency and a CPU speedup of 4794x with 128 Tesla C1060 GPUs. The Gaussian mixture model achieves 72% parallel efficiency and a CPU speedup of 6286x
    corecore