16,648 research outputs found

    GPUs as Storage System Accelerators

    Full text link
    Massively multicore processors, such as Graphics Processing Units (GPUs), provide, at a comparable price, a one order of magnitude higher peak performance than traditional CPUs. This drop in the cost of computation, as any order-of-magnitude drop in the cost per unit of performance for a class of system components, triggers the opportunity to redesign systems and to explore new ways to engineer them to recalibrate the cost-to-performance relation. This project explores the feasibility of harnessing GPUs' computational power to improve the performance, reliability, or security of distributed storage systems. In this context, we present the design of a storage system prototype that uses GPU offloading to accelerate a number of computationally intensive primitives based on hashing, and introduce techniques to efficiently leverage the processing power of GPUs. We evaluate the performance of this prototype under two configurations: as a content addressable storage system that facilitates online similarity detection between successive versions of the same file and as a traditional system that uses hashing to preserve data integrity. Further, we evaluate the impact of offloading to the GPU on competing applications' performance. Our results show that this technique can bring tangible performance gains without negatively impacting the performance of concurrently running applications.Comment: IEEE Transactions on Parallel and Distributed Systems, 201

    Mesoscale Optoelectronic Design of Wire-Based Photovoltaic and Photoelectrochemical Devices

    Get PDF
    The overarching theme of this thesis is mesoscale optical and optoelectronic design of photovoltaic and photoelectrochemical devices. In a photovoltaic device, light absorption and charge carrier transport are coupled together on the mesoscale, and in a photoelectrochemical device, light absorption, charge carrier transport, catalysis, and solution species transport are all coupled together on the mesoscale. The work discussed herein demonstrates that simulation-based mesoscale optical and optoelectronic modeling can lead to detailed understanding of the operation and performance of these complex mesostructured devices, serve as a powerful tool for device optimization, and efficiently guide device design and experimental fabrication efforts. In-depth studies of two mesoscale wire-based device designs illustrate these principles—(i) an optoelectronic study of a tandem Si|WO3 microwire photoelectrochemical device, and (ii) an optical study of III-V nanowire arrays. The study of the monolithic, tandem, Si|WO3 microwire photoelectrochemical device begins with development and validation of an optoelectronic model with experiment. This study capitalizes on synergy between experiment and simulation to demonstrate the model’s predictive power for extractable device voltage and light-limited current density. The developed model is then used to understand the limiting factors of the device and optimize its optoelectronic performance. The results of this work reveal that high fidelity modeling can facilitate unequivocal identification of limiting phenomena, such as parasitic absorption via excitation of a surface plasmon-polariton mode, and quick design optimization, achieving over a 300% enhancement in optoelectronic performance over a nominal design for this device architecture, which would be time-consuming and challenging to do via experiment. The work on III-V nanowire arrays also starts as a collaboration of experiment and simulation aimed at gaining understanding of unprecedented, experimentally observed absorption enhancements in sparse arrays of vertically-oriented GaAs nanowires. To explain this resonant absorption in periodic arrays of high index semiconductor nanowires, a unified framework that combines a leaky waveguide theory perspective and that of photonic crystals supporting Bloch modes is developed in the context of silicon, using both analytic theory and electromagnetic simulations. This detailed theoretical understanding is then applied to a simulation-based optimization of light absorption in sparse arrays of GaAs nanowires. Near-unity absorption in sparse, 5% fill fraction arrays is demonstrated via tapering of nanowires and multiple wire radii in a single array. Finally, experimental efforts are presented towards fabrication of the optimized array geometries. A hybrid self-catalyzed and selective area MOCVD growth method is used to establish morphology control of GaP nanowire arrays. Similarly, morphology and pattern control of nanowires is demonstrated with ICP-RIE of InP. Optical characterization of the InP nanowire arrays gives proof of principle that tapering and multiple wire radii can lead to near-unity absorption in sparse arrays of InP nanowires.</p

    An input centric paradigm for program dynamic optimizations and lifetime evolvement

    Get PDF
    Accurately predicting program behaviors (e.g., memory locality, method calling frequency) is fundamental for program optimizations and runtime adaptations. Despite decades of remarkable progress, prior studies have not systematically exploited the use of program inputs, a deciding factor of program behaviors, to help in program dynamic optimizations. Triggered by the strong and predictive correlations between program inputs and program behaviors that recent studies have uncovered, the dissertation work aims to bring program inputs into the focus of program behavior analysis and program dynamic optimization, cultivating a new paradigm named input-centric program behavior analysis and dynamic optimization.;The new optimization paradigm consists of three components, forming a three-layer pyramid. at the base is program input characterization, a component for resolving the complexity in program raw inputs and extracting important features. In the middle is input-behavior modeling, a component for recognizing and modeling the correlations between characterized input features and program behaviors. These two components constitute input-centric program behavior analysis, which (ideally) is able to predict the large-scope behaviors of a program\u27s execution as soon as the execution starts. The top layer is input-centric adaptation, which capitalizes on the novel opportunities created by the first two components to facilitate proactive adaptation for program optimizations.;This dissertation aims to develop this paradigm in two stages. In the first stage, we concentrate on exploring the implications of program inputs for program behaviors and dynamic optimization. We construct the basic input-centric optimization framework based on of line training to realize the basic functionalities of the three major components of the paradigm. For the second stage, we focus on making the paradigm practical by addressing multi-facet issues in handling input complexities, transparent training data collection, predictive model evolvement across production runs. The techniques proposed in this stage together cultivate a lifelong continuous optimization scheme with cross-input adaptivity.;Fundamentally the new optimization paradigm provides a brand new solution for program dynamic optimization. The techniques proposed in the dissertation together resolve the adaptivity-proactivity dilemma that has been limiting the effectiveness of existing optimization techniques. its benefits are demonstrated through proactive dynamic optimizations in Jikes RVM and version selection using IBM XL C Compiler, yielding significant performance improvement on a set of Java and C/C++ programs. It may open new opportunities for a broad range of runtime optimizations and adaptations. The evaluation results on both Java and C/C++ applications demonstrate the new paradigm is promising in advancing the current state of program optimizations

    From Physics Model to Results: An Optimizing Framework for Cross-Architecture Code Generation

    Full text link
    Starting from a high-level problem description in terms of partial differential equations using abstract tensor notation, the Chemora framework discretizes, optimizes, and generates complete high performance codes for a wide range of compute architectures. Chemora extends the capabilities of Cactus, facilitating the usage of large-scale CPU/GPU systems in an efficient manner for complex applications, without low-level code tuning. Chemora achieves parallelism through MPI and multi-threading, combining OpenMP and CUDA. Optimizations include high-level code transformations, efficient loop traversal strategies, dynamically selected data and instruction cache usage strategies, and JIT compilation of GPU code tailored to the problem characteristics. The discretization is based on higher-order finite differences on multi-block domains. Chemora's capabilities are demonstrated by simulations of black hole collisions. This problem provides an acid test of the framework, as the Einstein equations contain hundreds of variables and thousands of terms.Comment: 18 pages, 4 figures, accepted for publication in Scientific Programmin

    Interactive Visualization of the Largest Radioastronomy Cubes

    Full text link
    3D visualization is an important data analysis and knowledge discovery tool, however, interactive visualization of large 3D astronomical datasets poses a challenge for many existing data visualization packages. We present a solution to interactively visualize larger-than-memory 3D astronomical data cubes by utilizing a heterogeneous cluster of CPUs and GPUs. The system partitions the data volume into smaller sub-volumes that are distributed over the rendering workstations. A GPU-based ray casting volume rendering is performed to generate images for each sub-volume, which are composited to generate the whole volume output, and returned to the user. Datasets including the HI Parkes All Sky Survey (HIPASS - 12 GB) southern sky and the Galactic All Sky Survey (GASS - 26 GB) data cubes were used to demonstrate our framework's performance. The framework can render the GASS data cube with a maximum render time < 0.3 second with 1024 x 1024 pixels output resolution using 3 rendering workstations and 8 GPUs. Our framework will scale to visualize larger datasets, even of Terabyte order, if proper hardware infrastructure is available.Comment: 15 pages, 12 figures, Accepted New Astronomy July 201

    PCLIPS

    Get PDF
    CLIPS is an expert system, created specifically to allow rapid implementation of an expert system. CLIPS is written in C, and thus needs a very small amount of memory to run. Parallel CLIPS (PCLIPS) is an extension to CLIPS which is intended to be used in situations where a group of expert systems are expected to run simultaneously and occasionally communicate with each other on an integrated network. PCLIPS is a coarse-grained data distribution system. Its main goal is to take information in one knowledge base and distribute it to other knowledge bases so that all the executing expert systems are able to use that knowledge to solve their disparate problems

    Beyond XSPEC: Towards Highly Configurable Analysis

    Full text link
    We present a quantitative comparison between software features of the defacto standard X-ray spectral analysis tool, XSPEC, and ISIS, the Interactive Spectral Interpretation System. Our emphasis is on customized analysis, with ISIS offered as a strong example of configurable software. While noting that XSPEC has been of immense value to astronomers, and that its scientific core is moderately extensible--most commonly via the inclusion of user contributed "local models"--we identify a series of limitations with its use beyond conventional spectral modeling. We argue that from the viewpoint of the astronomical user, the XSPEC internal structure presents a Black Box Problem, with many of its important features hidden from the top-level interface, thus discouraging user customization. Drawing from examples in custom modeling, numerical analysis, parallel computation, visualization, data management, and automated code generation, we show how a numerically scriptable, modular, and extensible analysis platform such as ISIS facilitates many forms of advanced astrophysical inquiry.Comment: Accepted by PASP, for July 2008 (15 pages
    • …
    corecore