12 research outputs found

    Distributed interactive ray tracing for large volume visualization

    Get PDF
    Journal ArticleWe have constructed a distributed parallel ray tracing system that interactively produces isosurface renderings from large data sets on a cluster of commodity PCs. The program was derived from the SCI Institute's interactive ray tracer (*-Ray), which utilizes small to large shared memory platforms, such as the SGI Origin series, to interact with very large-scale data sets. Making this approach work efficiently on a cluster requires attention to numerous system-level issues, especially when rendering data sets larger than the address space of each cluster node

    Memory sharing for interactive ray tracing on clusters

    Get PDF
    ManuscriptWe present recent results in the application of distributed shared memory to image parallel ray tracing on clusters. Image parallel rendering is traditionally limited to scenes that are small enough to be replicated in the memory of each node, because any processor may require access to any piece of the scene. We solve this problem by making all of a cluster's memory available through software distributed shared memory layers. With gigabit ethernet connections, this mechanism is sufficiently fast for interactive rendering of multi-gigabyte datasets. Object- and page-based distributed shared memories are compared, and optimizations for efficient memory use are discussed

    A Study in Akka-based Distributed Ray-tracing of Large Scenes

    Get PDF
    This project creates a ray-tracing and geometry distribution framework through an actor model of parallelism, which is then expanded onto a cluster of machines to show effective data distribution across a network. This is shown to be feasible, but due to problems internal to the actor framework, as well as design failures, fails to effectively and consistently increase usable memory and generate larger ray-traces, though generally scaled well. Despite this, it compares several methods of ray organization across the geometry and shows that more complex methods generally scale better with the amount of geometry. A photometric renderer was added with very little modification, showing the generality of the geometry distribution framework, and the performance benefits of alternative serialization methods are shown to outweigh the drawbacks of more difficult implementation

    Interactive isosurface ray tracing of large octree volumes

    Get PDF
    Journal ArticleWe present a technique for ray tracing isosurfaces of large compressed structured volumes. Data is first converted into a losslesscompression octree representation that occupies a fraction of the original memory footprint. An isosurface is then dynamically rendered by tracing rays through a min/max hierarchy inside interior octree nodes. By embedding the acceleration tree and scalar data in a single structure and employing optimized octree hash schemes, we achieve competitive frame rates on common multicore architectures, and render large time-variant data that could not otherwise be accomodated

    Doctor of Philosophy

    Get PDF
    dissertationHigh-performance supercomputers on the Top500 list are commonly designed around commodity CPUs. Most of the codes executed on these machines are message-passing codes using the message-passing toolkit (MPI). Thus it makes sense to look at these machines from a holistic systems architecture perspective and consider optimizations to commodity processors that make them more efficient in message-passing architectures. Described herein is a new User-Level Notification (ULN) architecture that significantly improves message-passing performance. The architecture integrates a simultaneous multithreaded (SMT) processor with a user-level network interface (NI) that can directly control the execution scheduling of threads on the processor. By allowing the network interface to control the execution of message handling code at the user level, the operating system (OS) related overhead for handling interrupts and user code dispatch related to notifications is eliminated. By using an SMT processor, message handling can be performed in one thread concurrent to user computation in other threads, thus most of the overhead of executing message handlers can be hidden. This dissertation presents measurements showing the OS overheads related to message-passing are significant in modern architectures and describes a new architecture that significantly reduces these overheads. On a communication-intensive real-world application, the ULN architecture provides a 50.9% performance improvement over a more traditional OS-based NIC and a 5.29-31.9% improvement over a best-of-class user-level NIC due to the user-level notifications

    Doctor of Philosophy

    Get PDF
    dissertationThis dissertation explores three key facets of software algorithms for custom hardware ray tracing: primitive intersection, shading, and acceleration structure construction. For the first, primitive intersection, we show how nearly all of the existing direct three-dimensional (3D) ray-triangle intersection tests are mathematically equivalent. Based on this, a genetic algorithm can automatically tune a ray-triangle intersection test for maximum speed on a particular architecture. We also analyze the components of the intersection test to determine how much floating point precision is required and design a numerically robust intersection algorithm. Next, for shading, we deconstruct Perlin noise into its basic parts and show how these can be modified to produce a gradient noise algorithm that improves the visual appearance. This improved algorithm serves as the basis for a hardware noise unit. Lastly, we show how an existing bounding volume hierarchy can be postprocessed using tree rotations to further reduce the expected cost to traverse a ray through it. This postprocessing also serves as the basis for an efficient update algorithm for animated geometry. Together, these contributions should improve the efficiency of both software- and hardware-based ray tracers

    Distributed Interactive Ray Tracing for Large Volume Visualization

    No full text
    Figure 1: Richtmyer-Meshkov Instability time steps:0,45,180,270. With 32 Linux PCs we are able to isosurface the full resolution 7.5 GB volume on the left at 6.7 frames per second and on the right at 2.1 frames per second. We have constructed a distributed parallel ray tracing system that interactively produces isosurface renderings from large data sets on a cluster of commodity PCs. The program was derived from the SCI Institute’s interactive ray tracer (*-Ray), which utilizes small to large shared memory platforms, such as the SGI Origin series, to interact with very large-scale data sets. Making this approach work efficiently on a cluster requires attention to numerous system-level issues, especially when rendering data sets larger than the address space of each cluster node. The rendering engine is an image parallel ray tracer with a supervisor/workers organization. Each node in the cluster runs a multi-threaded application. A minimal abstraction layer on top of TCP links the nodes, and enables asynchronous message handling. For large volumes, render threads obtain data bricks on demand from an object-based software distributed shared memory. Caching improves performance by reducing the amount of data transfers for a reasonable working set size. For large data sets, the cluster-based interactive ray tracer performs comparably with an SGI Origin system. We examine the parameter space of the renderer and provide experimental results for interactive rendering of large (7.5 GB) data sets
    corecore