6,882 research outputs found
Generating optimized Fourier interpolation routines for density function theory using SPIRAL
© 2015 IEEE.Upsampling of a multi-dimensional data-set is an operation with wide application in image processing and quantum mechanical calculations using density functional theory. For small up sampling factors as seen in the quantum chemistry code ONETEP, a time-shift based implementation that shifts samples by a fraction of the original grid spacing to fill in the intermediate values using a frequency domain Fourier property can be a good choice. Readily available highly optimized multidimensional FFT implementations are leveraged at the expense of extra passes through the entire working set. In this paper we present an optimized variant of the time-shift based up sampling. Since ONETEP handles threading, we address the memory hierarchy and SIMD vectorization, and focus on problem dimensions relevant for ONETEP. We present a formalization of this operation within the SPIRAL framework and demonstrate auto-generated and auto-tuned interpolation libraries. We compare the performance of our generated code against the previous best implementations using highly optimized FFT libraries (FFTW and MKL). We demonstrate speed-ups in isolation averaging 3x and within ONETEP of up to 15%
Efficient Fast-Convolution-Based Waveform Processing for 5G Physical Layer
This paper investigates the application of fast-convolution (FC) filtering
schemes for flexible and effective waveform generation and processing in the
fifth generation (5G) systems. FC-based filtering is presented as a generic
multimode waveform processing engine while, following the progress of 5G new
radio standardization in the Third-Generation Partnership Project, the main
focus is on efficient generation and processing of subband-filtered cyclic
prefix orthogonal frequency-division multiplexing (CP-OFDM) signals. First, a
matrix model for analyzing FC filter processing responses is presented and used
for designing optimized multiplexing of filtered groups of CP-OFDM physical
resource blocks (PRBs) in a spectrally well-localized manner, i.e., with narrow
guardbands. Subband filtering is able to suppress interference leakage between
adjacent subbands, thus supporting independent waveform parametrization and
different numerologies for different groups of PRBs, as well as asynchronous
multiuser operation in uplink. These are central ingredients in the 5G waveform
developments, particularly at sub-6-GHz bands. The FC filter optimization
criterion is passband error vector magnitude minimization subject to a given
subband band-limitation constraint. Optimized designs with different guardband
widths, PRB group sizes, and essential design parameters are compared in terms
of interference levels and implementation complexity. Finally, extensive coded
5G radio link simulation results are presented to compare the proposed approach
with other subband-filtered CP-OFDM schemes and time-domain windowing methods,
considering cases with different numerologies or asynchronous transmissions in
adjacent subbands. Also the feasibility of using independent transmitter and
receiver processing for CP-OFDM spectrum control is demonstrated
Evaluating parametric holonomic sequences using rectangular splitting
We adapt the rectangular splitting technique of Paterson and Stockmeyer to
the problem of evaluating terms in holonomic sequences that depend on a
parameter. This approach allows computing the -th term in a recurrent
sequence of suitable type using "expensive" operations at the cost
of an increased number of "cheap" operations.
Rectangular splitting has little overhead and can perform better than either
naive evaluation or asymptotically faster algorithms for ranges of
encountered in applications. As an example, fast numerical evaluation of the
gamma function is investigated. Our work generalizes two previous algorithms of
Smith.Comment: 8 pages, 2 figure
Throughput Scaling Of Convolution For Error-Tolerant Multimedia Applications
Convolution and cross-correlation are the basis of filtering and pattern or
template matching in multimedia signal processing. We propose two throughput
scaling options for any one-dimensional convolution kernel in programmable
processors by adjusting the imprecision (distortion) of computation. Our
approach is based on scalar quantization, followed by two forms of tight
packing in floating-point (one of which is proposed in this paper) that allow
for concurrent calculation of multiple results. We illustrate how our approach
can operate as an optional pre- and post-processing layer for off-the-shelf
optimized convolution routines. This is useful for multimedia applications that
are tolerant to processing imprecision and for cases where the input signals
are inherently noisy (error tolerant multimedia applications). Indicative
experimental results with a digital music matching system and an MPEG-7 audio
descriptor system demonstrate that the proposed approach offers up to 175%
increase in processing throughput against optimized (full-precision)
convolution with virtually no effect in the accuracy of the results. Based on
marginal statistics of the input data, it is also shown how the throughput and
distortion can be adjusted per input block of samples under constraints on the
signal-to-noise ratio against the full-precision convolution.Comment: IEEE Trans. on Multimedia, 201
- …