12 research outputs found

    Performance analysis of multichannel lattice equalization in coherent underwater communications

    Get PDF
    This work examines the numerical fixed-point performance of a new multichannel lattice RLS filtering algorithm using data from two underwater acoustic communication experiments. The algorithm may be an appealing choice for underwater equalization due to its robust numerical behavior and linear scaling of the computational complexity with filter order. Simple modifications to widely-used methods for carrier/timing synchronization and symbol slicing in transversal equalizers are proposed. Experimental results show that the algorithm is as accurate as the similarly array-based QR-RLS, tolerating word lengths as low as 16-20 bits with minor degradation relative to floating-point benchmarks. These features, coupled with a very modular and regular structure, are highly desirable in energyefficient hardware or embedded implementations.FC

    Efficient arithmetic for high speed DSP implementation on FPGAs

    Get PDF
    The author was sponsored by EnTegra Ltd, a company who develop hardware and software products and services for the real time implementation of DSP and RF systems. The field programmable gate array (FPGA) is being used increasingly in the field of DSP. This is due to the fact that the parallel computing power of such devices is ideal for today’s truly demanding DSP algorithms. Algorithms such as the QR-RLS update are computationally intensive and must be carried out at extremely high speeds (MHz). This means that the DSP processor is simply not an option. ASICs can be used but the expense of developing custom logic is prohibitive. The increased use of the FPGA in DSP means that there is a significant requirement for efficient arithmetic cores that utilises the resources on such devices. This thesis presents the research and development effort that was carried out to produce fixed point division and square root cores for use in a new Electronic Design Automation (EDA) tool for EnTegra, which is targeted at FPGA implementation of DSP systems. Further to this, a new technique for predicting the accuracy of CORDIC systems computing vector magnitudes and cosines/sines is presented. This work allows the most efficient CORDIC design for a specified level of accuracy to be found quickly and easily without the need to run lengthy simulations, as was the case before. The CORDIC algorithm is a technique using mainly shifts and additions to compute many arithmetic functions and is thus ideal for FPGA implementation

    Adaptive Beamforming Using the Recursive Least Squares Algorithm on an FPGA

    Get PDF
    This thesis describes the design and implementation of a five-channel beamformer using a Space-Time Adaptive Processing (STAP) filter with Recursive Least Squares (RLS) as the adaptive algorithm. The objective of the algorithm is to compute of a set of filter weights for a STAP filter, such that the channels are filtered and combined into a signal with minimized power. Two test signal sets containing a high-powered jammer signal and a noise floor are used for performance evaluation. Three goals are set for this thesis; comparison of RLS to Sample Matrix Inversion (SMI) algorithm when used in a beamformer, comparison of various architectures which implement RLS, and the implementation and test of one of the architectures for a Xilinx Virtex 6 XC6VLX240T-1 Field-Programmable Gate Array (FPGA) Simulations comparing RLS to SMI show that a beamformer using RLS performs the same as a beamformer using SMI for 3-5 antennas (channels) and 1-4 temporal taps in the STAP filter. Litterature review shows that conventional RLS is unsuitable for FPGA implementation due to numerical instability. Comparison of IQRD-RLS, FQRD-RLS and MCFQRD-RLS architectures which are claimed to be stable RLS variants, shows that IQRD-RLS is the least computationally expensive of the algorithms. IQRD-RLS is implemented using Givens rotations in a systolic array architecture. Floating point, fixed point and CORDIC-based Givens rotation algorithms are compared with regard to speed and area, and floating point is chosen. Hardware simulations reveal that the filter weights returned by IQRD-RLS exhibit a drift, and is not stable in finite-precision arithmetic. The main cause is accumulated quantization error from the forgetting factor and its inverse (λ^(+-1/2)). The IQRD-RLS systolic array is reduced to a (stable) QRD-RLS systolic array, approximately halving the number of systolic array nodes. Filter weights are not computed directly by QRD-RLS, and are instead recovered from the QRD-RLS least squares filtering error output by the method of weight flushing. Results show that the QRD-RLS systolic array using 14 mantissa bits is sufficient as it performs equivalently to conventional RLS using double precision (53 mantissa bits). If only 11 mantissa bits are used, the output power increases by 3.3 dB. The final design can operate at sample rates from 19.4 MHz to 24.6 MHz, for a mantissa precision range of 14 to 11 bits. At this rate, the QRD-RLS systolic array can converge and output filter weights in 5.3 µs, significantly faster than the target of 100 µs. It is found that the current design has fully utilized its speed potential/limit due to the recursive nature of the algorithm. Processing of signals at the desired rate of 125 MHz would require changes to the algorithm itself. The implementation size is such that a 5-channel QRD-RLS array with one tap can fit on the FPGA. Channel-interleaving is proposed as a method to reduce system size, at the expense of slower operation. All hardware is designed, simulated and tested using Simulink together with Xilinx System Generator and its co-simulation and hardware-in-the-loop features

    REAL-TIME ADAPTIVE PULSE COMPRESSION ON RECONFIGURABLE, SYSTEM-ON-CHIP (SOC) PLATFORMS

    Get PDF
    New radar applications need to perform complex algorithms and process a large quantity of data to generate useful information for the users. This situation has motivated the search for better processing solutions that include low-power high-performance processors, efficient algorithms, and high-speed interfaces. In this work, hardware implementation of adaptive pulse compression algorithms for real-time transceiver optimization is presented, and is based on a System-on-Chip architecture for reconfigurable hardware devices. This study also evaluates the performance of dedicated coprocessors as hardware accelerator units to speed up and improve the computation of computing-intensive tasks such matrix multiplication and matrix inversion, which are essential units to solve the covariance matrix. The tradeoffs between latency and hardware utilization are also presented. Moreover, the system architecture takes advantage of the embedded processor, which is interconnected with the logic resources through high-performance buses, to perform floating-point operations, control the processing blocks, and communicate with an external PC through a customized software interface. The overall system functionality is demonstrated and tested for real-time operations using a Ku-band testbed together with a low-cost channel emulator for different types of waveforms

    MIMO equalization.

    Get PDF
    Thesis (M.Sc.Eng.)-University of KwaZulu-Natal, Durban, 2005.In recent years, space-time block co'des (STBC) for multi-antenna wireless systems have emerged as attractive encoding schemes for wireless communications. These codes provide full diversity gain and achieve good performance with simple receiver structures without the additional increase in bandwidth or power requirements. When implemented over broadband channels, STBCs can be combined with orthogonal frequency division multiplexing (OFDM) or single carrier frequency domain (SC-FD) transmission schemes to achieve multi-path diversity and to decouple the broadband frequency selective channel into independent flat fading channels. This dissertation focuses on the SC-FD transmission schemes that exploit the STBC structure to provide computationally cost efficient receivers in terms of equalization and channel estimation. The main contributions in this dissertation are as follows: • The original SC-FD STBC receiver that bench marks STBC in a frequency selective channel is limited to coherent detection where the knowledge of the channel state information (CSI) is assumed at the receiver. We extend this receiver to a multiple access system. Through analysis and simulations we prove that the extended system does not incur any performance penalty. This key result implies that the SC-FD STBC scheme is suitable for multiple-user systems where higher data rates are possible. • The problem of channel estimation is considered in a time and frequency selective environment. The existing receiver is based on a recursive least squares (RLS) adaptive algorithm and provides joint equalization and interference suppression. We utilize a system with perfect channel state information (CSI) to show from simulations how various design parameters for the RLS algorithm can be selected in order to get near perfect CSI performance. • The RLS receiver has two modes of operation viz. training mode and direct decision mode. In training mode, a block of known symbols is used to make the initial estimate. To ensure convergence of the algorithm a re-training interval must be predefined. This results in an increase in the system overhead. A linear predictor that utilizes the knowled~e of the autocorrelation function for a Rayleigh fading channel is developed. The predictor is combined with. the adaptive receiver to provide a bandwidth efficient receiver by decreasing the training block size.· The simulation results show that the performance penalty for the new system is negligible. • Finally, a new Q-R based receiver is developed to provide a more robust solution to the RLS adaptive receiver. The simulation results clearly show that the new receiver outperforms the RLS based receiver at higher Doppler frequencies, where rapid channel variations result in numerical instability of the RLS algorithm. The linear predictor is also added to the new receiver which results in a more robust and bandwidth efficient receiver

    MIMO Systems

    Get PDF
    In recent years, it was realized that the MIMO communication systems seems to be inevitable in accelerated evolution of high data rates applications due to their potential to dramatically increase the spectral efficiency and simultaneously sending individual information to the corresponding users in wireless systems. This book, intends to provide highlights of the current research topics in the field of MIMO system, to offer a snapshot of the recent advances and major issues faced today by the researchers in the MIMO related areas. The book is written by specialists working in universities and research centers all over the world to cover the fundamental principles and main advanced topics on high data rates wireless communications systems over MIMO channels. Moreover, the book has the advantage of providing a collection of applications that are completely independent and self-contained; thus, the interested reader can choose any chapter and skip to another without losing continuity

    Survey of FPGA applications in the period 2000 – 2015 (Technical Report)

    Get PDF
    Romoth J, Porrmann M, Rückert U. Survey of FPGA applications in the period 2000 – 2015 (Technical Report).; 2017.Since their introduction, FPGAs can be seen in more and more different fields of applications. The key advantage is the combination of software-like flexibility with the performance otherwise common to hardware. Nevertheless, every application field introduces special requirements to the used computational architecture. This paper provides an overview of the different topics FPGAs have been used for in the last 15 years of research and why they have been chosen over other processing units like e.g. CPUs

    Matrices cellulaires reconfigurables en point flottant dédiées au traitement des signaux

    Get PDF
    RÉSUMÉ Les processeurs scalaires sont majoritairement utilisés de nos jours, pour le traitement des signaux numériques, par comparaison aux processeurs matriciels qui offrent pourtant plus de vitesse de calcul due à leur architecture parallèle traitant de nombreuses données en temps réel. Il existe une multitude d’architectures de matrices cellulaires. Cependant la grande majorité est très spécialisée pour le calcul d’une ou deux fonctions de traitement de signaux et seuls quelques processeurs matriciels sont reconfigurables afin de traiter la plupart des fonctions de traitement de signaux. Ce mémoire présente l’architecture d’un processeur matriciel construit à partir de cellules complexes de calcul appelé "Module de Traitement Universel" (UPM). Ce processeur peut servir comme un module de propriété intellectuelle (IP block) destiné à être utilisé dans un FPGA pour le traitement des signaux. Des mêmes matrices d’UPMs sont reconfigurées en vue d’effectuer la plupart des opérations de Traitement Numérique des Signaux DSP incluant des fonctions de filtrage adaptatif récursives ou non et des fonctions d’analyse spectrale. Ce processeur peut être reconfiguré pour appliquer diverses transformées, filtres adaptatifs, filtres en treillis, en générations de fonctions, corrélations et en calcul de fonctions récursives qui peuvent être exécutées à grande vitesse. Pour une plus grande précision la conception est faite de manière à traiter les données en arithmétique point flottant. Afin de permettre le calcul de fonctions récursives l’unité de traitement UPM est construite avec un module de contrôle de récursivité. En outre l’UPM est conçu de manière à être mis en cascade afin d’augmenter l’ordre des opérations de traitement. La conception logicielle de matrice 2x2 UPMs et 6x4 UPMs, qui sont programmées en langage Verilog-HDL, est simulée et testée avec les mêmes cellules reconfigurées en plusieurs fonctions telles que le filtrage adaptatif, l’analyse spectrale et le calcul de fonctions récursives. La même matrice de cellules à été simulée sur Matlab Simulink sous différentes configurations.----------ABSTRACT Scalar processors are commonly used today in contrast with array processors which offer a higher computation speed due to their parallel architecture dealing with a great number of data in real time. Several cellular arrays architectures exist. However, the vast majority is highly specialized for the computation of one or two signal processing functions and only a few are reconfigurable to handle most of the of signal processing functions. This thesis presents the architecture of an array processor constructed using building blocks which are complex computation cells named Universal Processing Module (UPM). This array processor may serve as an intellectual property (IP block) to be used in FPGA technology and dedicated to signal processing. The same UPMs matrices are reconfigured to perform most of digital signal processing DSP operations including adaptive recursive and non recursive filtering, and spectral analysis functions. This processor can be reconfigured in order to compute transforms, adaptive filters, lattice filters, function generations, correlations and recursive functions, all performed at high speed. For greater accuracy the processor is constructed in floating point arithmetic. In order to enable computation of recursive functions, the UPM is built with a recursion control module. This processing element can also be indefinitely with the intention to increase filtering order. The software design of a 2x2 UPMs and a 6x4 UPMs arrays which is programmed in Verilog-HDL language, is simulated and tested using same cells reconfigured in order to compute DSP algorithms such as adaptive filtering, spectral analysis and recursive functions. The same matrix of cell is simulated on Matlab Simulink through different configuration. The processor is tested with all proposed reconfigurations and offers an acceptable computing precision

    Architectures matérielles pour la technologie W-CDMA étendue aux systèmes multi-antennes

    Get PDF
    Depuis une dizaine d 'années, l'avènement des techniques multi-antennes (ou MIMO) pour les communications sans fil , mobiles ou fixes , a révolutionné les possibilités offertes pour de nombreux domaines d 'application des télécommunications. La disposition de plusieurs antennes de part et d 'autre du lien augmente considérablement la capacité des systèmes sans fil. Cependant, les algorithmes numériques à mettre en oeuvre pour réaliser ces systèmes sont autrement complexes et constituent un challenge quant à la définition d'architectures matérielles performantes. L'objectif du travail présent repose précisément sur la définition optimale de solutions architecturales, dans un contexte CDMA, pour contrer cette problématique. Le premier aspect de ce travail porte sur une étude approfondie des algorithmes spatio-temporels et des méthodes de conception en vue d'une implantation matérielle efficace. De nombreux schémas de détection sont proposés dans la littérature et sont applicables suivant trois critères qui sont: la qualité de service, le débit binaire et la complexité algorithmique. Cette dernière constitue une contrainte forte pour une mise en application à faible coût de terminaux mobiles intégrant ces applications. Aussi, il est nécessaire de disposer d'outils performants pour simuler, évaluer et affiner (prototypage rapide) ces nouveaux systèmes, candidats probables pour les télécommunications de quatrième génération. Le second aspect concerne la réalisation d'un transcepteur multi-antennes sans codage de canal, intégrant la technologie d'accès multiple par répartition de codes dans le cas d'un canal large bande. Un système mono-antenne WCDMA, généralisable à un nombre quelconque d'antennes, a été intégré et simulé au sein de la plate-forme de prototypage rapide Lyrtech. L'architecture développée intègre les principaux modules du traitement en bande de base, à savoir le filtrage de Nyquist, la détection des multiples trajets suivie de l'étape de détection. Le prototype MIMO-WCDMA développé est caractérisé par sa flexibilité suivant le nombre de voies e~trantes, le format d'entrée des échantillons, les caractéristiques du canal sans fil et la technologie ciblée (ASIC, FPGA). Le troisième aspect se veut plus prospectif en détaillant de nouveaux mécanismes pour réduire le coût matériel des systèmes multi-antennes. Le principe d'allocation adaptative de la virgule fixe est présenté dans le but d'adapter le codage des données suivant les caractéristiques du canal sans fil et de minimiser en conséquence la complexité du circuit. D'autre part, le concept d'architectures adaptatives est proposé afin de minimiser l'énergie consommée au sein d 'un système embarqué suivant le contexte d'application
    corecore