185 research outputs found

    Computational Physics on Graphics Processing Units

    Full text link
    The use of graphics processing units for scientific computations is an emerging strategy that can significantly speed up various different algorithms. In this review, we discuss advances made in the field of computational physics, focusing on classical molecular dynamics, and on quantum simulations for electronic structure calculations using the density functional theory, wave function techniques, and quantum field theory.Comment: Proceedings of the 11th International Conference, PARA 2012, Helsinki, Finland, June 10-13, 201

    Fast and Robust Parametric Estimation of Jointly Sparse Channels

    Get PDF
    We consider the joint estimation of multipath channels obtained with a set of receiving antennas and uniformly probed in the frequency domain. This scenario fits most of the modern outdoor communication protocols for mobile access or digital broadcasting among others. Such channels verify a Sparse Common Support property (SCS) which was used in a previous paper to propose a Finite Rate of Innovation (FRI) based sampling and estimation algorithm. In this contribution we improve the robustness and computational complexity aspects of this algorithm. The method is based on projection in Krylov subspaces to improve complexity and a new criterion called the Partial Effective Rank (PER) to estimate the level of sparsity to gain robustness. If P antennas measure a K-multipath channel with N uniformly sampled measurements per channel, the algorithm possesses an O(KPNlogN) complexity and an O(KPN) memory footprint instead of O(PN^3) and O(PN^2) for the direct implementation, making it suitable for K << N. The sparsity is estimated online based on the PER, and the algorithm therefore has a sense of introspection being able to relinquish sparsity if it is lacking. The estimation performances are tested on field measurements with synthetic AWGN, and the proposed algorithm outperforms non-sparse reconstruction in the medium to low SNR range (< 0dB), increasing the rate of successful symbol decodings by 1/10th in average, and 1/3rd in the best case. The experiments also show that the algorithm does not perform worse than a non-sparse estimation algorithm in non-sparse operating conditions, since it may fall-back to it if the PER criterion does not detect a sufficient level of sparsity. The algorithm is also tested against a method assuming a "discrete" sparsity model as in Compressed Sensing (CS). The conducted test indicates a trade-off between speed and accuracy.Comment: 11 pages, 9 figures, submitted to IEEE JETCAS special issue on Compressed Sensing, Sep. 201

    Low power techniques and architectures for multicarrier wireless receivers

    Get PDF

    Get Out of the Valley: Power-Efficient Address Mapping for GPUs

    Get PDF
    GPU memory systems adopt a multi-dimensional hardware structure to provide the bandwidth necessary to support 100s to 1000s of concurrent threads. On the software side, GPU-compute workloads also use multi-dimensional structures to organize the threads. We observe that these structures can combine unfavorably and create significant resource imbalance in the memory subsystem causing low performance and poor power-efficiency. The key issue is that it is highly application-dependent which memory address bits exhibit high variability. To solve this problem, we first provide an entropy analysis approach tailored for the highly concurrent memory request behavior in GPU-compute workloads. Our window-based entropy metric captures the information content of each address bit of the memory requests that are likely to co-exist in the memory system at runtime. Using this metric, we find that GPU-compute workloads exhibit entropy valleys distributed throughout the lower order address bits. This indicates that efficient GPU-address mapping schemes need to harvest entropy from broad address-bit ranges and concentrate the entropy into the bits used for channel and bank selection in the memory subsystem. This insight leads us to propose the Page Address Entropy (PAE) mapping scheme which concentrates the entropy of the row, channel and bank bits of the input address into the bank and channel bits of the output address. PAE maps straightforwardly to hardware and can be implemented with a tree of XOR-gates. PAE improves performance by 1.31 x and power-efficiency by 1.25 x compared to state-of-the-art permutation-based address mapping
    • 

    corecore