27 research outputs found

    Understanding the efficiency of GPU algorithms for matrix-matrix multiplication

    Full text link
    Utilizing graphics hardware for general purpose numerical computations has become a topic of considerable interest. The implementation of streaming algorithms, typified by highly parallel computations with little reuse of input data, has been widely explored on GPUs. We relax the streaming model's constraint on input reuse and perform an in-depth analysis of dense matrix-matrix multiplication, which reuses each element of input matrices O(n) times. Its regular data access pattern and highly parallel computational requirements suggest matrix-matrix multiplication as an obvious candidate for efficient evaluation on GPUs but, surprisingly we find even nearoptimal GPU implementations are pronouncedly less efficient than current cache-aware CPU approaches. We find the key cause of this inefficiency is that the GPU can fetch less data and yet execute more arithmetic operations per clock than the CPU when both are operating out of their closest caches. The lack of high bandwidth access to cached data will impair the performance of GPU implementations of any computation featuring significant input reuse

    Self-refining games using player analytics

    No full text
    Data-driven simulation demands good training data drawn from a vast space of possible simulations. While fully sampling these large spaces is infeasible, we observe that in practical applications, such as gameplay, users explore only a vanishingly small subset of the dynamical state space. In this paper we present a sampling approach that takes advantage of this observation by concentrating precomputation around the states that users are most likely to encounter. We demonstrate our technique in a prototype self-refining game whose dynamics improve with play, ultimately providing realistically rendered, rich fluid dynamics in real time on a mobile device. Our results show that our analytics-driven training approach yields lower model error and fewer visual artifacts than a heuristic training strategy. Copyright © ACM

    A performance and energy evaluation of many-light rendering algorithms

    No full text
    Recently, the performance of many-light algorithms, where thousands of light sources are used to compute the lighting in a scene, has improved so much that they have reached the realm of real-time rendering. In general, the algorithm that is considered “best” is the one that is the fastest in terms of time per frame. Given that power efficiency may become or already is one of the most important optimization factors for both hardware and software vendors for graphics, we take a different route and instead measure both energy usage per frame and frame time for a number of popular many-light rendering algorithms on an Intel Iris Pro. We use Pareto frontiers for each configuration to examine the possibilities for trade-offs between rendering time and energy consumption. Furthermore, we examine the optimal algorithms at each configuration, and are able to draw generalized conclusions on when each algorithm is most efficient. We also record several other statistics on the algorithms, e.g., bandwidth, and are able to draw further conclusions with regard to energy consumption

    LED Street Light Research Project Part II: New Findings

    No full text
    <p>Many cities are converting their existing street lighting to Light Emitting Diode (LED) source luminaires due to anticipated energy savings of 40 to 80 percent, as compared to high intensity discharge (HID) source luminaires, and maintenance savings estimated to be 50 to 75 percent due to the longer life of LED luminaires. Addressable electronic lighting controls and sensors are now available that can transform a basic streetlight into an intelligent, smart city device with public safety and other benefits. The number of variables that civic officials must consider for any street lighting conversion project has increased as a result of the rate of technological advances in LED luminaires, control systems, and optional components.</p> <p>The purpose of this report is to provide an understanding of recent industry and technology changes, address common concerns raised when using LED light sources, recommend model specifications for LED luminaires and lighting controls in the public right of way, make suggestions for improving industry norms and code changes, comment on add-on features that show promise, and discuss what to expect as technology advances and the LED lighting industry matures.</p

    Concurrent Number Cruncher An Efficient Sparse Linear Solver on the GPU

    Get PDF
    Abstract. A wide class of geometry processing and PDE resolution methods needs to solve a linear system, where the non-zero pattern of the matrix is dictated by the connectivity matrix of the mesh. The advent of GPUs with their ever-growing amount of parallel horsepower makes them a tempting resource for such numerical computations. This can be helped by new APIs (CTM from ATI and CUDA from NVIDIA) which give a direct access to the multithreaded computational resources and associated memory bandwidth of GPUs; CUDA even provides a BLAS implementation but only for dense matrices (CuBLAS). However, existing GPU linear solvers are restricted to specific types of matrices, or use non-optimal compressed row storage strategies. By combining recent GPU programming techniques with supercomputing strategies (namely block compressed row storage and register blocking), we implement a sparse generalpurpose linear solver which outperforms leading-edge CPU counterparts (MKL / ACML)
    corecore