2,919 research outputs found

    The Iray Light Transport Simulation and Rendering System

    Full text link
    While ray tracing has become increasingly common and path tracing is well understood by now, a major challenge lies in crafting an easy-to-use and efficient system implementing these technologies. Following a purely physically-based paradigm while still allowing for artistic workflows, the Iray light transport simulation and rendering system allows for rendering complex scenes by the push of a button and thus makes accurate light transport simulation widely available. In this document we discuss the challenges and implementation choices that follow from our primary design decisions, demonstrating that such a rendering system can be made a practical, scalable, and efficient real-world application that has been adopted by various companies across many fields and is in use by many industry professionals today

    Massively-Parallel Feature Selection for Big Data

    Full text link
    We present the Parallel, Forward-Backward with Pruning (PFBP) algorithm for feature selection (FS) in Big Data settings (high dimensionality and/or sample size). To tackle the challenges of Big Data FS PFBP partitions the data matrix both in terms of rows (samples, training examples) as well as columns (features). By employing the concepts of pp-values of conditional independence tests and meta-analysis techniques PFBP manages to rely only on computations local to a partition while minimizing communication costs. Then, it employs powerful and safe (asymptotically sound) heuristics to make early, approximate decisions, such as Early Dropping of features from consideration in subsequent iterations, Early Stopping of consideration of features within the same iteration, or Early Return of the winner in each iteration. PFBP provides asymptotic guarantees of optimality for data distributions faithfully representable by a causal network (Bayesian network or maximal ancestral graph). Our empirical analysis confirms a super-linear speedup of the algorithm with increasing sample size, linear scalability with respect to the number of features and processing cores, while dominating other competitive algorithms in its class

    Path-tracing Monte Carlo Library for 3D Radiative Transfer in Highly Resolved Cloudy Atmospheres

    Full text link
    Interactions between clouds and radiation are at the root of many difficulties in numerically predicting future weather and climate and in retrieving the state of the atmosphere from remote sensing observations. The large range of issues related to these interactions, and in particular to three-dimensional interactions, motivated the development of accurate radiative tools able to compute all types of radiative metrics, from monochromatic, local and directional observables, to integrated energetic quantities. In the continuity of this community effort, we propose here an open-source library for general use in Monte Carlo algorithms. This library is devoted to the acceleration of path-tracing in complex data, typically high-resolution large-domain grounds and clouds. The main algorithmic advances embedded in the library are those related to the construction and traversal of hierarchical grids accelerating the tracing of paths through heterogeneous fields in null-collision (maximum cross-section) algorithms. We show that with these hierarchical grids, the computing time is only weakly sensitivive to the refinement of the volumetric data. The library is tested with a rendering algorithm that produces synthetic images of cloud radiances. Two other examples are given as illustrations, that are respectively used to analyse the transmission of solar radiation under a cloud together with its sensitivity to an optical parameter, and to assess a parametrization of 3D radiative effects of clouds.Comment: Submitted to JAMES, revised and submitted again (this is v2

    Exploiting Graphics Processing Units for Massively Parallel Multi-Dimensional Indexing

    Get PDF
    Department of Computer EngineeringScientific applications process truly large amounts of multi-dimensional datasets. To efficiently navigate such datasets, various multi-dimensional indexing structures, such as the R-tree, have been extensively studied for the past couple of decades. Since the GPU has emerged as a new cost-effective performance accelerator, now it is common to leverage the massive parallelism of the GPU in various applications such as medical image processing, computational chemistry, and particle physics. However, hierarchical multi-dimensional indexing structures are inherently not well suited for parallel processing because their irregular memory access patterns make it difficult to exploit massive parallelism. Moreover, recursive tree traversal often fails due to the small run-time stack and cache memory in the GPU. First, we propose Massively Parallel Three-phase Scanning (MPTS) R-tree traversal algorithm to avoid the irregular memory access patterns and recursive tree traversal so that the GPU can access tree nodes in a sequential manner. The experimental study shows that MPTS R-tree traversal algorithm consistently outperforms traditional recursive R-Tree search algorithm for multi-dimensional range query processing. Next, we focus on reducing the query response time and extending n-ary multi-dimensional indexing structures - R-tree, so that a large number of GPU threads cooperate to process a single query in parallel. Because the number of submitted concurrent queries in scientific data analysis applications is relatively smaller than that of enterprise database systems and ray tracing in computer graphics. Hence, we propose a novel variant of R-trees Massively Parallel Hilbert R-Tree (MPHR-Tree), which is designed for a novel parallel tree traversal algorithm Massively Parallel Restart Scanning (MPRS). The MPRS algorithm traverses the MPHR-Tree in mostly contiguous memory access patterns without recursion, which offers more chances to optimize the parallel SIMD algorithm. Our extensive experimental results show that the MPRS algorithm outperforms the other stackless tree traversal algorithms, which are designed for efficient ray tracing in computer graphics community. Furthermore, we develop query co-processing scheme that makes use of both the CPU and GPU. In this approach, we store the internal and leaf nodes of upper tree in CPU host memory and GPU device memory, respectively. We let the CPU traverse internal nodes because the conditional branches in hierarchical tree structures often cause a serious warp divergence problem in the GPU. For leaf nodes, the GPU scans a large number of leaf nodes in parallel based on the selection ratio of a given range query. It is well known that the GPU is superior to the CPU for parallel scanning. The experimental results show that our proposed multi-dimensional range query co-processing scheme improves the query response time by up to 12x and query throughput by up to 4x compared to the state-of-the-art GPU tree traversal algorithm.ope

    Time-varying volume visualization

    Get PDF
    Volume rendering is a very active research field in Computer Graphics because of its wide range of applications in various sciences, from medicine to flow mechanics. In this report, we survey a state-of-the-art on time-varying volume rendering. We state several basic concepts and then we establish several criteria to classify the studied works: IVR versus DVR, 4D versus 3D+time, compression techniques, involved architectures, use of parallelism and image-space versus object-space coherence. We also address other related problems as transfer functions and 2D cross-sections computation of time-varying volume data. All the papers reviewed are classified into several tables based on the mentioned classification and, finally, several conclusions are presented.Preprin

    Approximation and Relaxation Approaches for Parallel and Distributed Machine Learning

    Get PDF
    Large scale machine learning requires tradeoffs. Commonly this tradeoff has led practitioners to choose simpler, less powerful models, e.g. linear models, in order to process more training examples in a limited time. In this work, we introduce parallelism to the training of non-linear models by leveraging a different tradeoff--approximation. We demonstrate various techniques by which non-linear models can be made amenable to larger data sets and significantly more training parallelism by strategically introducing approximation in certain optimization steps. For gradient boosted regression tree ensembles, we replace precise selection of tree splits with a coarse-grained, approximate split selection, yielding both faster sequential training and a significant increase in parallelism, in the distributed setting in particular. For metric learning with nearest neighbor classification, rather than explicitly train a neighborhood structure we leverage the implicit neighborhood structure induced by task-specific random forest classifiers, yielding a highly parallel method for metric learning. For support vector machines, we follow existing work to learn a reduced basis set with extremely high parallelism, particularly on GPUs, via existing linear algebra libraries. We believe these optimization tradeoffs are widely applicable wherever machine learning is put in practice in large scale settings. By carefully introducing approximation, we also introduce significantly higher parallelism and consequently can process more training examples for more iterations than competing exact methods. While seemingly learning the model with less precision, this tradeoff often yields noticeably higher accuracy under a restricted training time budget
    • 

    corecore