Search CORE

6,882 research outputs found

Generating optimized Fourier interpolation routines for density function theory using SPIRAL

Author: Franchetti F
Kelly PHJ
Popovici T
Russell FP
Skylaris CK
Wilkinson KA
Publication venue
Publication date: 12/12/2014
Field of study

© 2015 IEEE.Upsampling of a multi-dimensional data-set is an operation with wide application in image processing and quantum mechanical calculations using density functional theory. For small up sampling factors as seen in the quantum chemistry code ONETEP, a time-shift based implementation that shifts samples by a fraction of the original grid spacing to fill in the intermediate values using a frequency domain Fourier property can be a good choice. Readily available highly optimized multidimensional FFT implementations are leveraged at the expense of extra passes through the entire working set. In this paper we present an optimized variant of the time-shift based up sampling. Since ONETEP handles threading, we address the memory hierarchy and SIMD vectorization, and focus on problem dimensions relevant for ONETEP. We present a formalization of this operation within the SPIRAL framework and demonstrate auto-generated and auto-tuned interpolation libraries. We compare the performance of our generated code against the previous best implementations using highly optimized FFT libraries (FFTW and MKL). We demonstrate speed-ups in isolation averaging 3x and within ONETEP of up to 15%

Spiral - Imperial College Digital Repository

Efficient Fast-Convolution-Based Waveform Processing for 5G Physical Layer

Author: Levanen Toni
Pajukoski Kari
Pirskanen Juho
Renfors Markku
Valkama Mikko
Valkonen Sami
Yli-Kaakinen Juha
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 04/06/2017
Field of study

This paper investigates the application of fast-convolution (FC) filtering schemes for flexible and effective waveform generation and processing in the fifth generation (5G) systems. FC-based filtering is presented as a generic multimode waveform processing engine while, following the progress of 5G new radio standardization in the Third-Generation Partnership Project, the main focus is on efficient generation and processing of subband-filtered cyclic prefix orthogonal frequency-division multiplexing (CP-OFDM) signals. First, a matrix model for analyzing FC filter processing responses is presented and used for designing optimized multiplexing of filtered groups of CP-OFDM physical resource blocks (PRBs) in a spectrally well-localized manner, i.e., with narrow guardbands. Subband filtering is able to suppress interference leakage between adjacent subbands, thus supporting independent waveform parametrization and different numerologies for different groups of PRBs, as well as asynchronous multiuser operation in uplink. These are central ingredients in the 5G waveform developments, particularly at sub-6-GHz bands. The FC filter optimization criterion is passband error vector magnitude minimization subject to a given subband band-limitation constraint. Optimized designs with different guardband widths, PRB group sizes, and essential design parameters are compared in terms of interference levels and implementation complexity. Finally, extensive coded 5G radio link simulation results are presented to compare the proposed approach with other subband-filtered CP-OFDM schemes and time-domain windowing methods, considering cases with different numerologies or asynchronous transmissions in adjacent subbands. Also the feasibility of using independent transmitter and receiver processing for CP-OFDM spectrum control is demonstrated

arXiv.org e-Print Archive

Trepo - Institutional Repository of Tampere University

Evaluating parametric holonomic sequences using rectangular splitting

Author: Johansson Fredrik
Publication venue
Publication date: 14/10/2013
Field of study

We adapt the rectangular splitting technique of Paterson and Stockmeyer to the problem of evaluating terms in holonomic sequences that depend on a parameter. This approach allows computing the

n

-th term in a recurrent sequence of suitable type using

O(n^{1/2})

"expensive" operations at the cost of an increased number of "cheap" operations. Rectangular splitting has little overhead and can perform better than either naive evaluation or asymptotically faster algorithms for ranges of

n

encountered in applications. As an example, fast numerical evaluation of the gamma function is investigated. Our work generalizes two previous algorithms of Smith.Comment: 8 pages, 2 figure

arXiv.org e-Print Archive

Crossref

Throughput Scaling Of Convolution For Error-Tolerant Multimedia Applications

Author: Anam Mohammad Ashraful
Andreopoulos Yiannis
Publication venue
Publication date: 01/01/2012
Field of study

Convolution and cross-correlation are the basis of filtering and pattern or template matching in multimedia signal processing. We propose two throughput scaling options for any one-dimensional convolution kernel in programmable processors by adjusting the imprecision (distortion) of computation. Our approach is based on scalar quantization, followed by two forms of tight packing in floating-point (one of which is proposed in this paper) that allow for concurrent calculation of multiple results. We illustrate how our approach can operate as an optional pre- and post-processing layer for off-the-shelf optimized convolution routines. This is useful for multimedia applications that are tolerant to processing imprecision and for cases where the input signals are inherently noisy (error tolerant multimedia applications). Indicative experimental results with a digital music matching system and an MPEG-7 audio descriptor system demonstrate that the proposed approach offers up to 175% increase in processing throughput against optimized (full-precision) convolution with virtually no effect in the accuracy of the results. Based on marginal statistics of the input data, it is also shown how the throughput and distortion can be adjusted per input block of samples under constraints on the signal-to-noise ratio against the full-precision convolution.Comment: IEEE Trans. on Multimedia, 201

arXiv.org e-Print Archive

UCL Discovery