67,056 research outputs found

    Scalable Interactive Volume Rendering Using Off-the-shelf Components

    Get PDF
    This paper describes an application of a second generation implementation of the Sepia architecture (Sepia-2) to interactive volu-metric visualization of large rectilinear scalar fields. By employingpipelined associative blending operators in a sort-last configuration a demonstration system with 8 rendering computers sustains 24 to 28 frames per second while interactively rendering large data volumes (1024x256x256 voxels, and 512x512x512 voxels). We believe interactive performance at these frame rates and data sizes is unprecedented. We also believe these results can be extended to other types of structured and unstructured grids and a variety of GL rendering techniques including surface rendering and shadow map-ping. We show how to extend our single-stage crossbar demonstration system to multi-stage networks in order to support much larger data sizes and higher image resolutions. This requires solving a dynamic mapping problem for a class of blending operators that includes Porter-Duff compositing operators

    Integrative Dynamic Reconfiguration in a Parallel Stream Processing Engine

    Get PDF
    Load balancing, operator instance collocations and horizontal scaling are critical issues in Parallel Stream Processing Engines to achieve low data processing latency, optimized cluster utilization and minimized communication cost respectively. In previous work, these issues are typically tackled separately and independently. We argue that these problems are tightly coupled in the sense that they all need to determine the allocations of workloads and migrate computational states at runtime. Optimizing them independently would result in suboptimal solutions. Therefore, in this paper, we investigate how these three issues can be modeled as one integrated optimization problem. In particular, we first consider jobs where workload allocations have little effect on the communication cost, and model the problem of load balance as a Mixed-Integer Linear Program. Afterwards, we present an extended solution called ALBIC, which support general jobs. We implement the proposed techniques on top of Apache Storm, an open-source Parallel Stream Processing Engine. The extensive experimental results over both synthetic and real datasets show that our techniques clearly outperform existing approaches

    The computation of flow past an oblique wing using the thin-layer Navier-Stokes equations

    Get PDF
    Essential aspects are presented for computing flow past an oblique wing with the thin-layer Navier-Stokes equations. A new method is developed for generating a grid system around a realistic wing. This method utilizes a series of conformal transformations. The thin-shear-layer approximation and an algebraic eddy-viscosity turbulence model are used to simplify the Reynolds-averaged Navier-Stokes equations. An implicit, factored numerical scheme and the concept of pencil data structure are utilized. For the first time, some flow fields caused by the oblique wing in a supersonic free stream are discussed, emphasizing the separated vortex flows associated with such a wing

    Architecture-Aware Optimization on a 1600-core Graphics Processor

    Get PDF
    The graphics processing unit (GPU) continues to make significant strides as an accelerator in commodity cluster computing for high-performance computing (HPC). For example, three of the top five fastest supercomputers in the world, as ranked by the TOP500, employ GPUs as accelerators. Despite this increasing interest in GPUs, however, optimizing the performance of a GPU-accelerated compute node requires deep technical knowledge of the underlying architecture. Although significant literature exists on how to optimize GPU performance on the more mature NVIDIA CUDA architecture, the converse is true for OpenCL on the AMD GPU. Consequently, we present and evaluate architecture-aware optimizations for the AMD GPU. The most prominent optimizations include (i) explicit use of registers, (ii) use of vector types, (iii) removal of branches, and (iv) use of image memory for global data. We demonstrate the efficacy of our AMD GPU optimizations by applying each optimization in isolation as well as in concert to a large-scale, molecular modeling application called GEM. Via these AMD-specific GPU optimizations, the AMD Radeon HD 5870 GPU delivers 65% better performance than with the wellknown NVIDIA-specific optimizations

    An experimental study of the turbulent boundary layer on a transport wing in subsonic and transonic flow

    Get PDF
    The upper surface boundary layer on a transport wing model was extensively surveyed with miniature yaw probes at a subsonic and a transonic cruise condition. Additional data were obtained at a second transonic test condition, for which a separated region was present at mid-semispan, aft of mid-chord. Significant variation in flow direction with distance from the surface was observed near the trailing edge except at the wing root and tip. The data collected at the transonic cruise condition show boundary layer growth associated with shock wave/boundary layer interaction, followed by recovery of the boundary layer downstream of the shock. Measurements of fluctuating surface pressure and wingtip acceleration were also obtained. The influence of flow field unsteadiness on the boundary layer data is discussed. Comparisons among the data and predictions from a variety of computational methods are presented. The computed predictions are in reasonable agreement with the experimental data in the outboard regions where 3-D effects are moderate and adverse pressure gradients are mild. In the more highly loaded mid-span region near the trailing edge, displacement thickness growth was significantly underpredicted, except when unrealistically severe adverse pressure gradients associated with inviscid calculations were used to perform boundary layer calculations
    corecore