23 research outputs found

    A Bulk-Parallel Priority Queue in External Memory with STXXL

    Get PDF
    We propose the design and an implementation of a bulk-parallel external memory priority queue to take advantage of both shared-memory parallelism and high external memory transfer speeds to parallel disks. To achieve higher performance by decoupling item insertions and extractions, we offer two parallelization interfaces: one using "bulk" sequences, the other by defining "limit" items. In the design, we discuss how to parallelize insertions using multiple heaps, and how to calculate a dynamic prediction sequence to prefetch blocks and apply parallel multiway merge for extraction. Our experimental results show that in the selected benchmarks the priority queue reaches 75% of the full parallel I/O bandwidth of rotational disks and and 65% of SSDs, or the speed of sorting in external memory when bounded by computation.Comment: extended version of SEA'15 conference pape

    The dimension of C1C^1 splines of arbitrary degree on a tetrahedral partition

    No full text
    We consider the linear space of piecewise polynomials in three variables which are globally smooth, i.e., trivariate C1C^1 splines. The splines are defined on a uniform tetrahedral partition Δ\Delta, which is a natural generalization of the four-directional mesh. By using Bernstein-B{\´e}zier techniques, we establish formulae for the dimension of the C1C^1 splines of arbitrary degree

    Algorithmic ramifications of prefetching in memory hierarchy

    Get PDF
    External Memory models, most notable being the I-O Model [3], capture the effects of memory hierarchy and aid in algorithm design. More than a decade of architectural advancements have led to new features not captured in the I-O model - most notably the prefetching capability. We propose a relatively simple Prefetch model that incorporates data prefetching in the traditional I-O models and show how to design algorithms that can attain close to peak memory bandwidth. Unlike (the inverse of) memory latency, the memory bandwidth is much closer to the processing speed, thereby, intelligent use of prefetching can considerably mitigate the I-O bottleneck. For some fundamental problems, our algorithms attain running times approaching that of the idealized Random Access Machines under reasonable assumptions. Our work also explains the significantly superior performance of the I-O efficient algorithms in systems that support prefetching compared to ones that do not

    An Efficient External-Memory Sorting Algorithm

    Get PDF
    External-memory sorting is a well-versed subject, with a history going back several decades. However, current implementations of external-memory sorting algorithms are not able to fully take advantage of the power of modern hardware. The reason for this is twofold: (1) they use a computationally-expensive mergesort approach, which can create a bottleneck during processing; (2) their management of I/O resources prevents them from achieving maximum I/O bandwidth. In this paper, we outline the construction of an external-memory distribution sort algorithm that utilizes available resources much more effectively

    A custom designed density estimation method for light transport

    No full text
    We present a new Monte Carlo method for solving the global illumination problem in environments with general geometry descriptions and light emission and scattering properties. Current Monte Carlo global illumination algorithms are based on generic density estimation techniques that do not take into account any knowledge about the nature of the data points --- light and potential particle hit points --- from which a global illumination solution is to be reconstructed. We propose a novel estimator, especially designed for solving linear integral equations such as the rendering equation. The resulting single-pass global illumination algorithm promises to combine the flexibility and robustness of bi-directional path tracing with the efficiency of algorithms such as photon mapping

    Duality Between Prefetching and Queued Writing with Parallel Disks

    Get PDF
    AMS subject classifications. 68W10, 68W20, 68W40, 68M20, 68P10, 68P20, 68Q17 DOI. 10.1137/S0097539703431573Parallel disks promise to be a cost effective means for achieving high bandwidth in applications involving massive data sets, but algorithms for parallel disks can be difficult to devise. To combat this problem, we define a useful and natural duality between writing to parallel disks and the seemingly more difficult problem of prefetching. We first explore this duality for applications involving read-once accesses using parallel disks. We get a simple linear time algorithm for computing optimal prefetch schedules and analyze the efficiency of the resulting schedules for randomly placed data and for arbitrary interleaved accesses to striped sequences. Duality also provides an optimal schedule for prefetching plus caching, where blocks can be accessed multiple times. Another application of this duality gives us the first parallel disk sorting algorithms that are provably optimal up to lower-order terms. One of these algorithms is a simple and practical variant of multiway mergesort, addressing a question that had been open for some time

    Duality Between Prefetching and Queued Writing with Parallel Disks

    Get PDF
    This is the published version, made available with the permission of the publisher. Copyright © 2005 Society for Industrial and Applied Mathematics.Parallel disks promise to be a cost effective means for achieving high bandwidth in applications involving massive data sets, but algorithms for parallel disks can be difficult to devise. To combat this problem, we define a useful and natural duality between writing to parallel disks and the seemingly more difficult problem of prefetching. We first explore this duality for applications involving read-once accesses using parallel disks. We get a simple linear time algorithm for computing optimal prefetch schedules and analyze the efficiency of the resulting schedules for randomly placed data and for arbitrary interleaved accesses to striped sequences. Duality also provides an optimal schedule for prefetching plus caching, where blocks can be accessed multiple times. Another application of this duality gives us the first parallel disk sorting algorithms that are provably optimal up to lower-order terms. One of these algorithms is a simple and practical variant of multiway mergesort, addressing a question that had been open for some time

    Reflectance from images: a model-based approach for human faces

    No full text
    In this paper, we present an image-based framework that acquires the reflectance properties of a human face. A range scan of the face is not required. Based on a morphable face model, the system estimates the 3D shape, and establishes point-to-point correspondence across images taken from different viewpoints, and across different individuals' faces. This provides a common parameterization of all reconstructed surfaces that can be used to compare and transfer BRDF data between different faces. Shape estimation from images compensates deformations of the face during the measurement process, such as facial expressions. In the common parameterization, regions of homogeneous materials on the face surface can be defined a-priori. We apply analytical BRDF models to express the reflectance properties of each region, and we estimate their parameters in a least-squares fit from the image data. For each of the surface points, the diffuse component of the BRDF is locally refined, which provides high detail. We present results for multiple analytical BRDF models, rendered at novelorientations and lighting conditions

    Building a parallel pipelined external memory algorithm library

    Full text link
    corecore