6,882 research outputs found

    Generating optimized Fourier interpolation routines for density function theory using SPIRAL

    Get PDF
    © 2015 IEEE.Upsampling of a multi-dimensional data-set is an operation with wide application in image processing and quantum mechanical calculations using density functional theory. For small up sampling factors as seen in the quantum chemistry code ONETEP, a time-shift based implementation that shifts samples by a fraction of the original grid spacing to fill in the intermediate values using a frequency domain Fourier property can be a good choice. Readily available highly optimized multidimensional FFT implementations are leveraged at the expense of extra passes through the entire working set. In this paper we present an optimized variant of the time-shift based up sampling. Since ONETEP handles threading, we address the memory hierarchy and SIMD vectorization, and focus on problem dimensions relevant for ONETEP. We present a formalization of this operation within the SPIRAL framework and demonstrate auto-generated and auto-tuned interpolation libraries. We compare the performance of our generated code against the previous best implementations using highly optimized FFT libraries (FFTW and MKL). We demonstrate speed-ups in isolation averaging 3x and within ONETEP of up to 15%

    Efficient Fast-Convolution-Based Waveform Processing for 5G Physical Layer

    Get PDF
    This paper investigates the application of fast-convolution (FC) filtering schemes for flexible and effective waveform generation and processing in the fifth generation (5G) systems. FC-based filtering is presented as a generic multimode waveform processing engine while, following the progress of 5G new radio standardization in the Third-Generation Partnership Project, the main focus is on efficient generation and processing of subband-filtered cyclic prefix orthogonal frequency-division multiplexing (CP-OFDM) signals. First, a matrix model for analyzing FC filter processing responses is presented and used for designing optimized multiplexing of filtered groups of CP-OFDM physical resource blocks (PRBs) in a spectrally well-localized manner, i.e., with narrow guardbands. Subband filtering is able to suppress interference leakage between adjacent subbands, thus supporting independent waveform parametrization and different numerologies for different groups of PRBs, as well as asynchronous multiuser operation in uplink. These are central ingredients in the 5G waveform developments, particularly at sub-6-GHz bands. The FC filter optimization criterion is passband error vector magnitude minimization subject to a given subband band-limitation constraint. Optimized designs with different guardband widths, PRB group sizes, and essential design parameters are compared in terms of interference levels and implementation complexity. Finally, extensive coded 5G radio link simulation results are presented to compare the proposed approach with other subband-filtered CP-OFDM schemes and time-domain windowing methods, considering cases with different numerologies or asynchronous transmissions in adjacent subbands. Also the feasibility of using independent transmitter and receiver processing for CP-OFDM spectrum control is demonstrated

    Evaluating parametric holonomic sequences using rectangular splitting

    Full text link
    We adapt the rectangular splitting technique of Paterson and Stockmeyer to the problem of evaluating terms in holonomic sequences that depend on a parameter. This approach allows computing the nn-th term in a recurrent sequence of suitable type using O(n1/2)O(n^{1/2}) "expensive" operations at the cost of an increased number of "cheap" operations. Rectangular splitting has little overhead and can perform better than either naive evaluation or asymptotically faster algorithms for ranges of nn encountered in applications. As an example, fast numerical evaluation of the gamma function is investigated. Our work generalizes two previous algorithms of Smith.Comment: 8 pages, 2 figure

    Throughput Scaling Of Convolution For Error-Tolerant Multimedia Applications

    Full text link
    Convolution and cross-correlation are the basis of filtering and pattern or template matching in multimedia signal processing. We propose two throughput scaling options for any one-dimensional convolution kernel in programmable processors by adjusting the imprecision (distortion) of computation. Our approach is based on scalar quantization, followed by two forms of tight packing in floating-point (one of which is proposed in this paper) that allow for concurrent calculation of multiple results. We illustrate how our approach can operate as an optional pre- and post-processing layer for off-the-shelf optimized convolution routines. This is useful for multimedia applications that are tolerant to processing imprecision and for cases where the input signals are inherently noisy (error tolerant multimedia applications). Indicative experimental results with a digital music matching system and an MPEG-7 audio descriptor system demonstrate that the proposed approach offers up to 175% increase in processing throughput against optimized (full-precision) convolution with virtually no effect in the accuracy of the results. Based on marginal statistics of the input data, it is also shown how the throughput and distortion can be adjusted per input block of samples under constraints on the signal-to-noise ratio against the full-precision convolution.Comment: IEEE Trans. on Multimedia, 201
    • …
    corecore