789 research outputs found

    Generating optimized Fourier interpolation routines for density function theory using SPIRAL

    Get PDF
    © 2015 IEEE.Upsampling of a multi-dimensional data-set is an operation with wide application in image processing and quantum mechanical calculations using density functional theory. For small up sampling factors as seen in the quantum chemistry code ONETEP, a time-shift based implementation that shifts samples by a fraction of the original grid spacing to fill in the intermediate values using a frequency domain Fourier property can be a good choice. Readily available highly optimized multidimensional FFT implementations are leveraged at the expense of extra passes through the entire working set. In this paper we present an optimized variant of the time-shift based up sampling. Since ONETEP handles threading, we address the memory hierarchy and SIMD vectorization, and focus on problem dimensions relevant for ONETEP. We present a formalization of this operation within the SPIRAL framework and demonstrate auto-generated and auto-tuned interpolation libraries. We compare the performance of our generated code against the previous best implementations using highly optimized FFT libraries (FFTW and MKL). We demonstrate speed-ups in isolation averaging 3x and within ONETEP of up to 15%

    Large-Scale MIMO Detection for 3GPP LTE: Algorithms and FPGA Implementations

    Full text link
    Large-scale (or massive) multiple-input multiple-output (MIMO) is expected to be one of the key technologies in next-generation multi-user cellular systems, based on the upcoming 3GPP LTE Release 12 standard, for example. In this work, we propose - to the best of our knowledge - the first VLSI design enabling high-throughput data detection in single-carrier frequency-division multiple access (SC-FDMA)-based large-scale MIMO systems. We propose a new approximate matrix inversion algorithm relying on a Neumann series expansion, which substantially reduces the complexity of linear data detection. We analyze the associated error, and we compare its performance and complexity to those of an exact linear detector. We present corresponding VLSI architectures, which perform exact and approximate soft-output detection for large-scale MIMO systems with various antenna/user configurations. Reference implementation results for a Xilinx Virtex-7 XC7VX980T FPGA show that our designs are able to achieve more than 600 Mb/s for a 128 antenna, 8 user 3GPP LTE-based large-scale MIMO system. We finally provide a performance/complexity trade-off comparison using the presented FPGA designs, which reveals that the detector circuit of choice is determined by the ratio between BS antennas and users, as well as the desired error-rate performance.Comment: To appear in the IEEE Journal of Selected Topics in Signal Processin

    Realizing In-Memory Baseband Processing for Ultra-Fast and Energy-Efficient 6G

    Full text link
    To support emerging applications ranging from holographic communications to extended reality, next-generation mobile wireless communication systems require ultra-fast and energy-efficient baseband processors. Traditional complementary metal-oxide-semiconductor (CMOS)-based baseband processors face two challenges in transistor scaling and the von Neumann bottleneck. To address these challenges, in-memory computing-based baseband processors using resistive random-access memory (RRAM) present an attractive solution. In this paper, we propose and demonstrate RRAM-implemented in-memory baseband processing for the widely adopted multiple-input-multiple-output orthogonal frequency division multiplexing (MIMO-OFDM) air interface. Its key feature is to execute the key operations, including discrete Fourier transform (DFT) and MIMO detection using linear minimum mean square error (L-MMSE) and zero forcing (ZF), in one-step. In addition, RRAM-based channel estimation module is proposed and discussed. By prototyping and simulations, we demonstrate the feasibility of RRAM-based full-fledged communication system in hardware, and reveal it can outperform state-of-the-art baseband processors with a gain of 91.2×\times in latency and 671×\times in energy efficiency by large-scale simulations. Our results pave a potential pathway for RRAM-based in-memory computing to be implemented in the era of the sixth generation (6G) mobile communications.Comment: arXiv admin note: text overlap with arXiv:2205.0356

    An FPGA-based Embedded System For Fingerprint Matching Using Phase Only Correlation Algorithm

    Get PDF
    none5There is an increasing interest in inexpensive and reliable personal identification in many emerging civilian, commercial and financial applications. Traditional systems such as passwords, PINs, Badges, Smart Cards and Tokens may either be stolen or easy to guess but also to forget, in same cases they may be lost by the user who carries them; all this can lead to identified. Fingerprint-based identification is one of the most used biometric techniques in automated systems for personal identification and it is becoming socially acceptable and cost-effective, since a fingerprint is univocally related to a particular individual. Typical fingerprint identification methods employ feature-based image matching, where minutiae points in the ridge lines (i.e., ridge endings and bifurcations) are identified. Unfortunately this approach is highly influenced by fingertip surface condition. Fingerprint recognition is a complex pattern recognition problem. The efforts to make automatic the matching process based on digital representation of fingerprints, led to the development of Automatic Fingerprint Identification Systems (AFIS). Typically, there are millions of fingerprint records in a database which needs to be entirely searched for a match, to establish the identity of the individual. In order to provide a reasonable response time for each query, it will be better to develop special hardware solutions to implement matching and/or classification algorithms in a really efficient way. In this work we realised a system able to outperform modern PCs in recognising and classifying fingerprints and based on FPGA technology.Il lavoro si è classificato al II posto nell'Altera Contest 2009 Innovate Italy, gara annuale indetta da Altera tra progetti di team di giovani studenti universitari su tutto il territorio nazionale.Giovanni Danese; Mauro Giachero; Francesco Leporati; Giulia Matrone; Nelson NazzicariDanese, Giovanni; Giachero, Mauro; Leporati, Francesco; Matrone, Giulia; Nelson, Nazzicar

    Non-Uniform Time Sampling for Multiple-Frequency Harmonic Balance Computations

    Get PDF
    A time-domain harmonic balance method for the analysis of almost-periodic (multi-harmonics) flows is presented. This method relies on Fourier analysis to derive an efficient alternative to classical time marching schemes for such flows. It has recently received significant attention, especially in the turbomachinery field where the flow spectrum is essentially a combination of the blade passing frequencies. Up to now, harmonic balance methods have used a uniform time sampling of the period of interest, but in the case of several frequencies, non-necessarily multiple of each other, harmonic balance methods can face stability issues due to a bad condition number of the Fourier operator. Two algorithms are derived to find a non-uniform time sampling in order to minimize this condition number. Their behavior is studied on a wide range of frequencies, and a model problem of a 1D flow with pulsating outlet pressure, which enables to prove their efficiency. Finally, the flow in a multi-stage axial compressor is analyzed with different frequency sets. It demonstrates the stability and robustness of the present non-uniform harmonic balance method regardless of the frequency set

    Quantization errors in overlapped block digital filtering methods

    Get PDF
    In digital signal processing applications involving filtering long sequences, block filtering methods like overlapsave and overlap-add are widely used. Like all finite-precision applications, overlap-save and overlap-add methods are also affected by quantization errors. The goal of this paper is to calculate and make a quantitative comparison of the overall quantization noise resulting from the two methods in terms of power (variance) of the quantization noise. Multiple quantization noise sources are taken into consideration in the computation of the variances. The calculations reveal that the overlap-add approach is more prone to quantization noise compared to the overlap-save approach due to the addition of overlapping sections between overlap-add output blocks. Copyright © 2013 IARIA

    CPM-SC-IFDMA--A Power Efficient Transmission Scheme for Uplink LTE

    Get PDF
    In this thesis we have proposed a power efficient transmission scheme, CPM-SC-IFDMA, for uplink LTE. In uplink LTE, efficiency of the transmitter power amplier is a major concern, as the transmitter is placed in the mobile device which has limited power supply. The proposed scheme, CPM-SC-IFDMA, combines the key advantages of CPM (continuous phase modulation) with SC-IFDMA (single carrier frequency division multiple access with interleaved subcarrier mapping) in order to increase the power amplier efficiency of the transmitter. In this work, we have analyzed the bit error rate (BER) performance of the proposed scheme in LTE specied channels. The BER performance of two CPM-SC-IFDMA scheme are compared with that of a LTE specied transmission scheme, QPSK-LFDMA (QPSK modulated SC-FDMA with localized subcarrier mapping), combined with convolutional coding (CC-QPSK-LFDMA). We first show that CPM-SC-IFDMA has a much higher power efficiency than CC-QPSK-LFDMA by simulating the PAPR (peak-to-average-power-ratio) plots. Then, using the data from the PAPR plots and the conventional BER plots (BER as a function of signal-to-noise-ratio), we show that, when the net BER, obtained by compensating for the power efficiency loss, is considered, CPM-SC-IFDMA has a superior performance relative to CC-QPSK-LFDMA by up to 3.8 dB, in the LTE specified channels
    corecore