121 research outputs found

    Using reconfigurable computing technology to accelerate matrix decomposition and applications

    Get PDF
    Matrix decomposition plays an increasingly significant role in many scientific and engineering applications. Among numerous techniques, Singular Value Decomposition (SVD) and Eigenvalue Decomposition (EVD) are widely used as factorization tools to perform Principal Component Analysis for dimensionality reduction and pattern recognition in image processing, text mining and wireless communications, while QR Decomposition (QRD) and sparse LU Decomposition (LUD) are employed to solve the dense or sparse linear system of equations in bioinformatics, power system and computer vision. Matrix decompositions are computationally expensive and their sequential implementations often fail to meet the requirements of many time-sensitive applications. The emergence of reconfigurable computing has provided a flexible and low-cost opportunity to pursue high-performance parallel designs, and the use of FPGAs has shown promise in accelerating this class of computation. In this research, we have proposed and implemented several highly parallel FPGA-based architectures to accelerate matrix decompositions and their applications in data mining and signal processing. Specifically, in this dissertation we describe the following contributions: • We propose an efficient FPGA-based double-precision floating-point architecture for EVD, which can efficiently analyze large-scale matrices. • We implement a floating-point Hestenes-Jacobi architecture for SVD, which is capable of analyzing arbitrary sized matrices. • We introduce a novel deeply pipelined reconfigurable architecture for QRD, which can be dynamically configured to perform either Householder transformation or Givens rotation in a manner that takes advantage of the strengths of each. • We design a configurable architecture for sparse LUD that supports both symmetric and asymmetric sparse matrices with arbitrary sparsity patterns. • By further extending the proposed hardware solution for SVD, we parallelize a popular text mining tool-Latent Semantic Indexing with an FPGA-based architecture. • We present a configurable architecture to accelerate Homotopy l1-minimization, in which the modification of the proposed FPGA architecture for sparse LUD is used at its core to parallelize both Cholesky decomposition and rank-1 update. Our experimental results using an FPGA-based acceleration system indicate the efficiency of our proposed novel architectures, with application and dimension-dependent speedups over an optimized software implementation that range from 1.5ÃÂ to 43.6ÃÂ in terms of computation time

    Turbo Bayesian Compressed Sensing

    Get PDF
    Compressed sensing (CS) theory specifies a new signal acquisition approach, potentially allowing the acquisition of signals at a much lower data rate than the Nyquist sampling rate. In CS, the signal is not directly acquired but reconstructed from a few measurements. One of the key problems in CS is how to recover the original signal from measurements in the presence of noise. This dissertation addresses signal reconstruction problems in CS. First, a feedback structure and signal recovery algorithm, orthogonal pruning pursuit (OPP), is proposed to exploit the prior knowledge to reconstruct the signal in the noise-free situation. To handle the noise, a noise-aware signal reconstruction algorithm based on Bayesian Compressed Sensing (BCS) is developed. Moreover, a novel Turbo Bayesian Compressed Sensing (TBCS) algorithm is developed for joint signal reconstruction by exploiting both spatial and temporal redundancy. Then, the TBCS algorithm is applied to a UWB positioning system for achieving mm-accuracy with low sampling rate ADCs. Finally, hardware implementation of BCS signal reconstruction on FPGAs and GPUs is investigated. Implementation on GPUs and FPGAs of parallel Cholesky decomposition, which is a key component of BCS, is explored. Simulation results on software and hardware have demonstrated that OPP and TBCS outperform previous approaches, with UWB positioning accuracy improved by 12.8x. The accelerated computation helps enable real-time application of this work

    MFPA: Mixed-Signal Field Programmable Array for Energy-Aware Compressive Signal Processing

    Get PDF
    Compressive Sensing (CS) is a signal processing technique which reduces the number of samples taken per frame to decrease energy, storage, and data transmission overheads, as well as reducing time taken for data acquisition in time-critical applications. The tradeoff in such an approach is increased complexity of signal reconstruction. While several algorithms have been developed for CS signal reconstruction, hardware implementation of these algorithms is still an area of active research. Prior work has sought to utilize parallelism available in reconstruction algorithms to minimize hardware overheads; however, such approaches are limited by the underlying limitations in CMOS technology. Herein, the MFPA (Mixed-signal Field Programmable Array) approach is presented as a hybrid spin-CMOS reconfigurable fabric specifically designed for implementation of CS data sampling and signal reconstruction. The resulting fabric consists of 1) slice-organized analog blocks providing amplifiers, transistors, capacitors, and Magnetic Tunnel Junctions (MTJs) which are configurable to achieving square/square root operations required for calculating vector norms, 2) digital functional blocks which feature 6-input clockless lookup tables for computation of matrix inverse, and 3) an MRAM-based nonvolatile crossbar array for carrying out low-energy matrix-vector multiplication operations. The various functional blocks are connected via a global interconnect and spin-based analog-to-digital converters. Simulation results demonstrate significant energy and area benefits compared to equivalent CMOS digital implementations for each of the functional blocks used: this includes an 80% reduction in energy and 97% reduction in transistor count for the nonvolatile crossbar array, 80% standby power reduction and 25% reduced area footprint for the clockless lookup tables, and roughly 97% reduction in transistor count for a multiplier built using components from the analog blocks. Moreover, the proposed fabric yields 77% energy reduction compared to CMOS when used to implement CS reconstruction, in addition to latency improvements

    Rapid Digital Architecture Design of Computationally Complex Algorithms

    Get PDF
    Traditional digital design techniques hardly keep up with the rising abundance of programmable circuitry found on recent Field-Programmable Gate Arrays. Therefore, the novel Rapid Data Type-Agnostic Digital Design Methodology (RDAM) elevates the design perspective of digital design engineers away from the register-transfer level to the algorithmic level. It is founded on the capabilities of High-Level Synthesis tools. By consequently working with data type-agnostic source codes, the RDAM brings significant simplifications to the fixed-point conversion of algorithms and the design of complex-valued architectures. Signal processing applications from the field of Compressed Sensing illustrate the efficacy of the RDAM in the context of multi-user wireless communications. For instance, a complex-valued digital architecture of Orthogonal Matching Pursuit with rank-1 updating has successfully been implemented and tested

    Adaptive Baseband Pro cessing and Configurable Hardware for Wireless Communication

    Get PDF
    The world of information is literally at one’s fingertips, allowing access to previously unimaginable amounts of data, thanks to advances in wireless communication. The growing demand for high speed data has necessitated theuse of wider bandwidths, and wireless technologies such as Multiple-InputMultiple-Output (MIMO) have been adopted to increase spectral efficiency.These advanced communication technologies require sophisticated signal processing, often leading to higher power consumption and reduced battery life.Therefore, increasing energy efficiency of baseband hardware for MIMO signal processing has become extremely vital. High Quality of Service (QoS)requirements invariably lead to a larger number of computations and a higherpower dissipation. However, recognizing the dynamic nature of the wirelesscommunication medium in which only some channel scenarios require complexsignal processing, and that not all situations call for high data rates, allowsthe use of an adaptive channel aware signal processing strategy to provide adesired QoS. Information such as interference conditions, coherence bandwidthand Signal to Noise Ratio (SNR) can be used to reduce algorithmic computations in favorable channels. Hardware circuits which run these algorithmsneed flexibility and easy reconfigurability to switch between multiple designsfor different parameters. These parameters can be used to tune the operations of different components in a receiver based on feedback from the digitalbaseband. This dissertation focuses on the optimization of digital basebandcircuitry of receivers which use feedback to trade power and performance. Aco-optimization approach, where designs are optimized starting from the algorithmic stage through the hardware architectural stage to the final circuitimplementation is adopted to realize energy efficient digital baseband hardwarefor mobile 4G devices. These concepts are also extended to the next generation5G systems where the energy efficiency of the base station is improved.This work includes six papers that examine digital circuits in MIMO wireless receivers. Several key blocks in these receiver include analog circuits thathave residual non-linearities, leading to signal intermodulation and distortion.Paper-I introduces a digital technique to detect such non-linearities and calibrate analog circuits to improve signal quality. The concept of a digital nonlinearity tuning system developed in Paper-I is implemented and demonstratedin hardware. The performance of this implementation is tested with an analogchannel select filter, and results are presented in Paper-II. MIMO systems suchas the ones used in 4G, may employ QR Decomposition (QRD) processors tosimplify the implementation of tree search based signal detectors. However,the small form factor of the mobile device increases spatial correlation, whichis detrimental to signal multiplexing. Consequently, a QRD processor capableof handling high spatial correlation is presented in Paper-III. The algorithm and hardware implementation are optimized for carrier aggregation, which increases requirements on signal processing throughput, leading to higher powerdissipation. Paper-IV presents a method to perform channel-aware processingwith a simple interpolation strategy to adaptively reduce QRD computationcount. Channel properties such as coherence bandwidth and SNR are used toreduce multiplications by 40% to 80%. These concepts are extended to usetime domain correlation properties, and a full QRD processor for 4G systemsfabricated in 28 nm FD-SOI technology is presented in Paper-V. The designis implemented with a configurable architecture and measurements show thatcircuit tuning results in a highly energy efficient processor, requiring 0.2 nJ to1.3 nJ for each QRD. Finally, these adaptive channel-aware signal processingconcepts are examined in the scope of the next generation of communicationsystems. Massive MIMO systems increase spectral efficiency by using a largenumber of antennas at the base station. Consequently, the signal processingat the base station has a high computational count. Paper-VI presents a configurable detection scheme which reduces this complexity by using techniquessuch as selective user detection and interpolation based signal processing. Hardware is optimized for resource sharing, resulting in a highly reconfigurable andenergy efficient uplink signal detector

    A fast engineering approach to high efficiency power amplifier linearization for avionics applications

    Get PDF
    This PhD thesis provides a fast engineering approach to the design of digital predistortion (DPD) linearizers from several perspectives: i) enhancing the off-line training performance of open-loop DPD, ii) providing robustness and reducing the computational complexity of the parameters identification subsystem and, iii) importing machine learning techniques to favor the automatic tuning of power amplifiers (PAs) and DPD linearizers with several free-parameters to maximize power efficiency while meeting the linearity specifications. One of the essential parts of unmanned aerial vehicles (UAV) is the avionics, being the radio control one of the earliest avionics present in the UAV. Unlike the control signal, for transferring user data (such as images, video, etc.) real-time from the drone to the ground station, large transmission rates are required. The PA is a key element in the transmitter chain to guarantee the data transmission (video, photo, etc.) over a long range from the ground station. The more linear output power, the better the coverage or alternatively, with the same coverage, better SNR allows the use of high-order modulation schemes and thus higher transmission rates are achieved. In the context of UAV wireless communications, the power consumption, size and weight of the payload is of significant importance. Therefore, the PA design has to take into account the compromise among bandwidth, output power, linearity and power efficiency (very critical in battery-supplied devices). The PA can be designed to maximize its power efficiency or its linearity, but not both. Therefore, a way to deal with this inherent trade-off is to design high efficient amplification topologies and let the PA linearizers take care of the linearity requirements. Among the linearizers, DPD linearization is the preferred solution to both academia and industry, for its high flexibility and linearization performance. In order to save as many computational and power resources as possible, the implementation of an open-loop DPD results a very attractive solution for UAV applications. This thesis contributes to the PA linearization, especially on off-line training for open-loop DPD, by presenting two different methods for reducing the design and operating costs of an open-loop DPD, based on the analysis of the DPD function. The first method focuses on the input domain analysis, proposing mesh-selecting (MeS) methods to accurately select the proper samples for a computationally efficient DPD parameter estimation. Focusing in the MeS method with better performance, the memory I-Q MeS method is combined with feature extraction dimensionality reduction technique to allow a computational complexity reduction in the identification subsystem by a factor of 65, in comparison to using the classical QR-LS solver and consecutive samples selection. In addition, the memory I-Q MeS method has been proved to be of crucial interest when training artificial neural networks (ANN) for DPD purposes, by significantly reducing the ANN training time. The second method involves the use of machine learning techniques in the DPD design procedure to enlarge the capacity of the DPD algorithm when considering a high number of free parameters to tune. On the one hand, the adaLIPO global optimization algorithm is used to find the best parameter configuration of a generalized memory polynomial behavioral model for DPD. On the other hand, a methodology to conduct a global optimization search is proposed to find the optimum values of a set of key circuit and system level parameters, that properly combined with DPD linearization and crest factor reduction techniques, can exploit at best dual-input PAs in terms of maximizing power efficiency along wide bandwidths while being compliant with the linearity specifications. The advantages of these proposed techniques have been validated through experimental tests and the obtained results are analyzed and discussed along this thesis.Aquesta tesi doctoral proporciona unes pautes per al disseny de linealitzadors basats en predistorsió digital (DPD) des de diverses perspectives: i) millorar el rendiment del DPD en llaç obert, ii) proporcionar robustesa i reduir la complexitat computacional del subsistema d'identificació de paràmetres i, iii) incorporació de tècniques d'aprenentatge automàtic per afavorir l'auto-ajustament d'amplificadors de potència (PAs) i linealitzadors DPD amb diversos graus de llibertat per poder maximitzar l’eficiència energètica i al mateix temps acomplir amb les especificacions de linealitat. Una de les parts essencials dels vehicles aeris no tripulats (UAV) _es l’aviònica, sent el radiocontrol un dels primers sistemes presents als UAV. Per transferir dades d'usuari (com ara imatges, vídeo, etc.) en temps real des del dron a l’estació terrestre, es requereixen taxes de transmissió grans. El PA _es un element clau de la cadena del transmissor per poder garantir la transmissió de dades a grans distàncies de l’estació terrestre. A major potència de sortida, més cobertura o, alternativament, amb la mateixa cobertura, millor relació senyal-soroll (SNR) la qual cosa permet l’ús d'esquemes de modulació d'ordres superiors i, per tant, aconseguir velocitats de transmissió més altes. En el context de les comunicacions sense fils en UAVs, el consum de potència, la mida i el pes de la càrrega útil són de vital importància. Per tant, el disseny del PA ha de tenir en compte el compromís entre ample de banda, potència de sortida, linealitat i eficiència energètica (molt crític en dispositius alimentats amb bateries). El PA es pot dissenyar per maximitzar la seva eficiència energètica o la seva linealitat, però no totes dues. Per tant, per afrontar aquest compromís s'utilitzen topologies amplificadores d'alta eficiència i es deixa que el linealitzador s'encarregui de garantir els nivells necessaris de linealitat. Entre els linealitzadors, la linealització DPD és la solució preferida tant per al món acadèmic com per a la indústria, per la seva alta flexibilitat i rendiment. Per tal d'estalviar tant recursos computacionals com consum de potència, la implementació d'un DPD en lla_c obert resulta una solució molt atractiva per a les aplicacions UAV. Aquesta tesi contribueix a la linealització del PA, especialment a l'entrenament fora de línia de linealitzadors DPD en llaç obert, presentant dos mètodes diferents per reduir el cost computacional i augmentar la fiabilitat dels DPDs en llaç obert. El primer mètode se centra en l’anàlisi de l’estadística del senyal d'entrada, proposant mètodes de selecció de malla (MeS) per seleccionar les mostres més significatives per a una estimació computacionalment eficient dels paràmetres del DPD. El mètode proposat IQ MeS amb memòria es pot combinar amb tècniques de reducció del model del DPD i d'aquesta manera poder aconseguir una reducció de la complexitat computacional en el subsistema d’identificació per un factor de 65, en comparació amb l’ús de l'algoritme clàssic QR-LS i selecció de mostres d'entrenament consecutives. El segon mètode consisteix en l’ús de tècniques d'aprenentatge automàtic pel disseny del DPD quan es considera un gran nombre de graus de llibertat (paràmetres) per sintonitzar. D'una banda, l'algorisme d’optimització global adaLIPO s'utilitza per trobar la millor configuració de paràmetres d'un model polinomial amb memòria generalitzat per a DPD. D'altra banda, es proposa una estratègia per l’optimització global d'un conjunt de paràmetres clau per al disseny a nivell de circuit i sistema, que combinats amb linealització DPD i les tècniques de reducció del factor de cresta, poden maximitzar l’eficiència de PAs d'entrada dual de gran ample de banda, alhora que compleixen les especificacions de linealitat. Els avantatges d'aquestes tècniques proposades s'han validat mitjançant proves experimentals i els resultats obtinguts s'analitzen i es discuteixen al llarg d'aquesta tesi

    Sensor Signal and Information Processing II

    Get PDF
    In the current age of information explosion, newly invented technological sensors and software are now tightly integrated with our everyday lives. Many sensor processing algorithms have incorporated some forms of computational intelligence as part of their core framework in problem solving. These algorithms have the capacity to generalize and discover knowledge for themselves and learn new information whenever unseen data are captured. The primary aim of sensor processing is to develop techniques to interpret, understand, and act on information contained in the data. The interest of this book is in developing intelligent signal processing in order to pave the way for smart sensors. This involves mathematical advancement of nonlinear signal processing theory and its applications that extend far beyond traditional techniques. It bridges the boundary between theory and application, developing novel theoretically inspired methodologies targeting both longstanding and emergent signal processing applications. The topic ranges from phishing detection to integration of terrestrial laser scanning, and from fault diagnosis to bio-inspiring filtering. The book will appeal to established practitioners, along with researchers and students in the emerging field of smart sensors processing

    Design of large polyphase filters in the Quadratic Residue Number System

    Full text link

    ワイヤレス通信のための先進的な信号処理技術を用いた非線形補償法の研究

    Get PDF
    The inherit nonlinearity in analogue front-ends of transmitters and receivers have had primary impact on the overall performance of the wireless communication systems, as it gives arise of substantial distortion when transmitting and processing signals with such circuits. Therefore, the nonlinear compensation (linearization) techniques become essential to suppress the distortion to an acceptable extent in order to ensure sufficient low bit error rate. Furthermore, the increasing demands on higher data rate and ubiquitous interoperability between various multi-coverage protocols are two of the most important features of the contemporary communication system. The former demand pushes the communication system to use wider bandwidth and the latter one brings up severe coexistence problems. Having fully considered the problems raised above, the work in this Ph.D. thesis carries out extensive researches on the nonlinear compensations utilizing advanced digital signal processing techniques. The motivation behind this is to push more processing tasks to the digital domain, as it can potentially cut down the bill of materials (BOM) costs paid for the off-chip devices and reduce practical implementation difficulties. The work here is carried out using three approaches: numerical analysis & computer simulations; experimental tests using commercial instruments; actual implementation with FPGA. The primary contributions for this thesis are summarized as the following three points: 1) An adaptive digital predistortion (DPD) with fast convergence rate and low complexity for multi-carrier GSM system is presented. Albeit a legacy system, the GSM, however, has a very strict requirement on the out-of-band emission, thus it represents a much more difficult hurdle for DPD application. It is successfully implemented in an FPGA without using any other auxiliary processor. A simplified multiplier-free NLMS algorithm, especially suitable for FPGA implementation, for fast adapting the LUT is proposed. Many design methodologies and practical implementation issues are discussed in details. Experimental results have shown that the DPD performed robustly when it is involved in the multichannel transmitter. 2) The next generation system (5G) will unquestionably use wider bandwidth to support higher throughput, which poses stringent needs for using high-speed data converters. Herein the analog-to-digital converter (ADC) tends to be the most expensive single device in the whole transmitter/receiver systems. Therefore, conventional DPD utilizing high-speed ADC becomes unaffordable, especially for small base stations (micro, pico and femto). A digital predistortion technique utilizing spectral extrapolation is proposed in this thesis, wherein with band-limited feedback signal, the requirement on ADC speed can be significantly released. Experimental results have validated the feasibility of the proposed technique for coping with band-limited feedback signal. It has been shown that adequate linearization performance can be achieved even if the acquisition bandwidth is less than the original signal bandwidth. The experimental results obtained by using LTE-Advanced signal of 320 MHz bandwidth are quite satisfactory, and to the authors’ knowledge, this is the first high-performance wideband DPD ever been reported. 3) To address the predicament that mobile operators do not have enough contiguous usable bandwidth, carrier aggregation (CA) technique is developed and imported into 4G LTE-Advanced. This pushes the utilization of concurrent dual-band transmitter/receiver, which reduces the hardware expense by using a single front-end. Compensation techniques for the respective concurrent dual-band transmitter and receiver front-ends are proposed to combat the inter-band modulation distortion, and simultaneously reduce the distortion for the both lower-side band and upper-side band signals.電気通信大学201
    corecore