13 research outputs found

    Power-Efficient Hardware Architecture for Computing Split-Radix FFT on Highly Sparse Spectrum

    Get PDF
    RÉSUMÉ Le problème du transfert des signaaux du domaine temporel au domaine fréquentiel d'une manière efficace, lorsque le contenu du spectre de fréquences a une faible densité, est le sujet de cette thèse. La technique bien connue de la transformée de Fourier rapide (FFT) est l'algorithme de traitement de signal privilégié pour observer le contenu fréquentiel des signaux entrants à des émetteurs-récepteurs de télécommunication, tels que la radio cognitive, ou la radio définie par logiciel qu‟on utilise habituellement pour l‟analyse du spectre dans une bande de fréquences. Cela peut représenter un lourd fardeau de calcul sur des processeurs lorsque la FFT ordinaire est mise en oeuvre, ce qui peut impliquer une consommation d'énergie considérable. L'alimentation en énergie est une ressource limitée dans les appareils mobiles et, par conséquent, cette ressource peut être critique pour des dispositifs de télécommunications mobiles. Dans le but de développer un processeur économe en énergie pour les applications de transformation temps-fréquence, un algorithme de transformée de Fourier plus efficace, en termes du nombre de multiplications et d'additions complexes, est sélectionné. En effet, la Split-Radix Fast Fourier Transform (SRFFT) offre une performance meilleure que la FFT classique en termes de réduction du nombre de multiplications complexes nécessaires et elle peut donc conduire à une consommation d'énergie réduite. En appliquent le concept d'élagage des calculs inutiles, c'est-à-dire des multiplications complexes avec entrées ou sorties à zéro, tout au long de l'algorithme, on peut réduire la consommation d'énergie.Ainsi, une architecture matérielle énergétiquement efficace est développée pour le calcul de la SRFFT. Cette architecture est basée sur l'élagage des calculs inutiles. En fait, pour tirer parti du potentiel de la SRFFT, une nouvelle architecture d'un processeur de SRFFT configurable est d'abord conçue, puis l'architecture est développée afin d'éliminer les calculs inutiles. Cela se fait par l'utilisation appropriée d'une matrice d'élagage.----------ABSTRACT The problem of transferring a time domain signal into the frequency domain in an efficient manner, when the frequency contents are sparsely distributed, is the research topic covered in this thesis. The well-known Fast Fourier Transform (FFT) is the most common signal processing algorithm for observing the frequency contents of incoming signals in telecommunication transceivers. It is notably used in cognitive or software defined radio which usually demands for monitoring the spectrum in a wide frequency band. This may imply a heavy computation burden on processors when the ordinary FFT algorithm is implemented, and hence yield considerable power consumption. Power and energy supply is a limited resource in mobile devices and therefore, efficient execution of the Fourier transform has turned out to be critical for mobile telecommunication devices.With the purpose of developing a power-efficient processor for time-frequency transformation, the most computationally efficient Fourier transform algorithm is selected among the existing Fourier transform algorithms upon studying them in terms of required arithmetic operations, i.e. complex multiplications and additions. Indeed, the Split-Radix Fast Fourier Transform (SRFFT) offers a performance that is better than conventional FFT in terms of reduced number of complex multiplications and hence, can reduce power consumption.Appling the concept of pruning of the unnecessary computations, i.e. complex multiplications with either zero inputs or outputs, throughout the whole algorithm may reduce the power consumption even further

    Design and Implementation of Software Defined Radios on a Homogeneous Multi-Processor Architecture

    Get PDF
    In the wireless communications domain, multi-mode and multi-standard platforms are becoming increasingly the central focus of system architects. In fact, mobile terminal users require more and more mobility and throughput, pushing towards a fully integrated radio system able to support different communication protocols running concurrently on the platform. A new concept of radio system was introduced to meet the users' expectations. Flexible radio platforms have became an indispensable requirement to meet the expectations of the users today and in the future. This thesis deals with issues related to the design of flexible radio platforms. In particular, the flexibility of the radio system is achieved through the concept of software defined radios (SDRs). The research work focuses on the utilization of homogeneous multi-processor (MP) architectures as a feasible way to efficiently implement SDR platforms. In fact, platforms based on MP architectures are able to deliver high performance together with a high degree of flexibility. Moreover, homogeneous MP platforms are able to reduce design and verification costs as well as provide a high scalability in terms of software and hardware. However, homogeneous MP architectures provide less computational efficiency when compared to heterogeneous solutions. This thesis can be divided into two parts: the first part is related to the implementation of a reference platform while the second part of the thesis introduces the design and implementation of flexible, high performance, power and energy efficient algorithms for wireless communications. The proposed reference platform, Ninesilica, is a homogeneous MP architecture composed of a 3x3 mesh of processing nodes (PNs), interconnected by a hierarchical Network-on-Chip (NoC). Each PN hosts as Processing Element (PE) a processor core. To improve the computational efficiency of the platform, different power and energy saving techniques have been investigated. In the design, implementation and mapping of the algorithms, the following constraints were considered: energy and power efficiency, high scalability of the platform, portability of the solutions across similar platforms, and parallelization efficiency. Ninesilica architecture together with the proposed algorithm implementations showed that homogeneous MP architectures are highly scalable platforms, both in terms of hardware and software. Furthermore, Ninesilica architecture demonstrated that homogeneous MPs are able to achieve high parallelization efficiency as well as high energy and power savings, meeting the requirements of SDRs as well as enabling cognitive radios. Ninesilica can be utilized as a stand-alone block or as an elementary building block to realize clustered many-core architectures. Moreover, the obtained results, in terms of parallelization efficiency as well as power and energy efficiency are independent of the type of PE utilized, ensuring the portability of the results to similar architectures based on a different type of processing element

    Design and Implementation of Software Defined Radios on a Homogeneous Multi-Processor Architecture

    Get PDF
    In the wireless communications domain, multi-mode and multi-standard platforms are becoming increasingly the central focus of system architects. In fact, mobile terminal users require more and more mobility and throughput, pushing towards a fully integrated radio system able to support different communication protocols running concurrently on the platform. A new concept of radio system was introduced to meet the users' expectations. Flexible radio platforms have became an indispensable requirement to meet the expectations of the users today and in the future. This thesis deals with issues related to the design of flexible radio platforms. In particular, the flexibility of the radio system is achieved through the concept of software defined radios (SDRs). The research work focuses on the utilization of homogeneous multi-processor (MP) architectures as a feasible way to efficiently implement SDR platforms. In fact, platforms based on MP architectures are able to deliver high performance together with a high degree of flexibility. Moreover, homogeneous MP platforms are able to reduce design and verification costs as well as provide a high scalability in terms of software and hardware. However, homogeneous MP architectures provide less computational efficiency when compared to heterogeneous solutions. This thesis can be divided into two parts: the first part is related to the implementation of a reference platform while the second part of the thesis introduces the design and implementation of flexible, high performance, power and energy efficient algorithms for wireless communications. The proposed reference platform, Ninesilica, is a homogeneous MP architecture composed of a 3x3 mesh of processing nodes (PNs), interconnected by a hierarchical Network-on-Chip (NoC). Each PN hosts as Processing Element (PE) a processor core. To improve the computational efficiency of the platform, different power and energy saving techniques have been investigated. In the design, implementation and mapping of the algorithms, the following constraints were considered: energy and power efficiency, high scalability of the platform, portability of the solutions across similar platforms, and parallelization efficiency. Ninesilica architecture together with the proposed algorithm implementations showed that homogeneous MP architectures are highly scalable platforms, both in terms of hardware and software. Furthermore, Ninesilica architecture demonstrated that homogeneous MPs are able to achieve high parallelization efficiency as well as high energy and power savings, meeting the requirements of SDRs as well as enabling cognitive radios. Ninesilica can be utilized as a stand-alone block or as an elementary building block to realize clustered many-core architectures. Moreover, the obtained results, in terms of parallelization efficiency as well as power and energy efficiency are independent of the type of PE utilized, ensuring the portability of the results to similar architectures based on a different type of processing element

    Low power FFT processor design considerations for OFDM communications

    Full text link
    Today\u27s emerging communication technologies require fast processing as well as efficient use of resources. This project specifically addresses the power-efficient design of an FFT processor as it relates to OFDM communications such as cognitive radio. The Fast Fourier Transform (FFT) processor is what enables the efficient modulation in OFDM. As the FFT processor is the most computationally intensive component in OFDM communication, the power efficiency improvement of this component can have great impacts on the overall system. These impacts are significant considering the number of mobile and remote communication devices that rely on limited battery-powered operation. This project explores current FFT processor algorithms and architectures as well as optimization techniques that aim to reduce the power consumption of these devices. A floating point as well as a fixed point dynamically size-configurable FFT processor was designed in VHDL for FPGA applications, and power-saving modifications were implemented while analyzing the results

    Digital and Mixed Domain Hardware Reduction Algorithms and Implementations for Massive MIMO

    Get PDF
    Emerging 5G and 6G based wireless communications systems largely rely on multiple-input-multiple-output (MIMO) systems to reduce inherently extensive path losses, facilitate high data rates, and high spatial diversity. Massive MIMO systems used in mmWave and sub-THz applications consists of hundreds perhaps thousands of antenna elements at base stations. Digital beamforming techniques provide the highest flexibility and better degrees of freedom for phased antenna arrays as compared to its analog and hybrid alternatives but has the highest hardware complexity. Conventional digital beamformers at the receiver require a dedicated analog to digital converter (ADC) for every antenna element, leading to ADCs for elements. The number of ADCs is the key deterministic factor for the power consumption of an antenna array system. The digital hardware consists of fast Fourier transform (FFT) cores with a multiplier complexity of (N log2N) for an element system to generate multiple beams. It is required to reduce the mixed and digital hardware complexities in MIMO systems to reduce the cost and the power consumption, while maintaining high performance. The well-known concept has been in use for ADCs to achieve reduced complexities. An extension of the architecture to multi-dimensional domain is explored in this dissertation to implement a single port ADC to replace ADCs in an element system, using the correlation of received signals in the spatial domain. This concept has applications in conventional uniform linear arrays (ULAs) as well as in focal plane array (FPA) receivers. Our analysis has shown that sparsity in the spatio-temporal frequency domain can be exploited to reduce the number of ADCs from N to where . By using the limited field of view of practical antennas, multiple sub-arrays are combined without interferences to achieve a factor of K increment in the information carrying capacity of the ADC systems. Applications of this concept include ULAs and rectangular array systems. Experimental verifications were done for a element, 1.8 - 2.1 GHz wideband array system to sample using ADCs. This dissertation proposes that frequency division multiplexing (FDM) receiver outputs at an intermediate frequency (IF) can pack multiple (M) narrowband channels with a guard band to avoid interferences. The combined output is then sampled using a single wideband ADC and baseband channels are retrieved in the digital domain. Measurement results were obtained by employing a element, 28 GHz antenna array system to combine channels together to achieve a 75% reduction of ADC requirement. Implementation of FFT cores in the digital domain is not always exact because of the finite precision. Therefore, this dissertation explores the possibility of approximating the discrete Fourier transform (DFT) matrix to achieve reduced hardware complexities at an allowable cost of accuracy. A point approximate DFT (ADFT) core was implemented on digital hardware using radix-32 to achieve savings in cost, size, weight and power (C-SWaP) and synthesized for ASIC at 45-nm technology

    Spectrum Optimisation in Wireless Communication Systems: Technology Evaluation, System Design and Practical Implementation

    Get PDF
    Two key technology enablers for next generation networks are examined in this thesis, namely Cognitive Radio (CR) and Spectrally Efficient Frequency Division Multiplexing (SEFDM). The first part proposes the use of traffic prediction in CR systems to improve the Quality of Service (QoS) for CR users. A framework is presented which allows CR users to capture a frequency slot in an idle licensed channel occupied by primary users. This is achieved by using CR to sense and select target spectrum bands combined with traffic prediction to determine the optimum channel-sensing order. The latter part of this thesis considers the design, practical implementation and performance evaluation of SEFDM. The key challenge that arises in SEFDM is the self-created interference which complicates the design of receiver architectures. Previous work has focused on the development of sophisticated detection algorithms, however, these suffer from an impractical computational complexity. Consequently, the aim of this work is two-fold; first, to reduce the complexity of existing algorithms to make them better-suited for application in the real world; second, to develop hardware prototypes to assess the feasibility of employing SEFDM in practical systems. The impact of oversampling and fixed-point effects on the performance of SEFDM is initially determined, followed by the design and implementation of linear detection techniques using Field Programmable Gate Arrays (FPGAs). The performance of these FPGA based linear receivers is evaluated in terms of throughput, resource utilisation and Bit Error Rate (BER). Finally, variants of the Sphere Decoding (SD) algorithm are investigated to ameliorate the error performance of SEFDM systems with targeted reduction in complexity. The Fixed SD (FSD) algorithm is implemented on a Digital Signal Processor (DSP) to measure its computational complexity. Modified sorting and decomposition strategies are then applied to this FSD algorithm offering trade-offs between execution speed and BER

    Convergence of packet communications over the evolved mobile networks; signal processing and protocol performance

    Get PDF
    In this thesis, the convergence of packet communications over the evolved mobile networks is studied. The Long Term Evolution (LTE) process is dominating the Third Generation Partnership Project (3GPP) in order to bring technologies to the markets in the spirit of continuous innovation. The global markets of mobile information services are growing towards the Mobile Information Society. The thesis begins with the principles and theories of the multiple-access transmission schemes, transmitter receiver techniques and signal processing algorithms. Next, packet communications and Internet protocols are referred from the IETF standards with the characteristics of mobile communications in the focus. The mobile network architecture and protocols bind together the evolved packet system of Internet communications to the radio access network technologies. Specifics of the traffic models are shortly visited for their statistical meaning in the radio performance analysis. Radio resource management algorithms and protocols, also procedures, are covered addressing their relevance for the system performance. Throughout these Chapters, the commonalities and differentiators of the WCDMA, WCDMA/HSPA and LTE are covered. The main outcome of the thesis is the performance analysis of the LTE technology beginning from the early discoveries to the analysis of various system features and finally converging to an extensive system analysis campaign. The system performance is analysed with the characteristics of voice over the Internet and best effort traffic of the Internet. These traffic classes represent the majority of the mobile traffic in the converged packet networks, and yet they are simple enough for a fair and generic analysis of technologies. The thesis consists of publications and inventions created by the author that proposed several improvements to the 3G technologies towards the LTE. In the system analysis, the LTE showed by the factor of at least 2.5 to 3 times higher system measures compared to the WCDMA/HSPA reference. The WCDMA/HSPA networks are currently available with over 400 million subscribers and showing increasing growth, in the meanwhile the first LTE roll-outs are scheduled to begin in 2010. Sophisticated 3G LTE mobile devices are expected to appear fluently for all consumer segments in the following years

    Efficient computation of CPW2000 using a CPU-GPU heterogeneous platform

    Get PDF
    Dissertação de mestrado em Engenharia de InformáticaThe modelling and simulation of complex systems in natural science usually require powerfull and expensive computational resources. The study of the plane wave properties in crystals, based on quantum mechanichs pose challenging questions to computer scientists to improve the e ciency of the numerical methods and algorithms. Numerical libraries had a signi cant boost in recent years, taking advantage of multi-threaded environments. This dissertation work addresses e ciency improvements in a plane wave package, CPW2000, developed by a physicist scientist, targeted to a heterogeneous platform with multicore CPU and CUDA enabled GPU devices. The performance botlenecks were previously identifed as being the module functions with FFT computations, and the study started with the application analysis and pro ling. This study shows that (i)over 90% of the code execution time was spent in two functions, DGEMM and FFT, (ii) code ef- ciency of current numerical libraries is hard to improve, and (iii) DGEMM function calls were spread in the code, while FFT was concentrated in a single function. These features were adequately explored to develop a new code version where parts of the code are computed on a multicore CPU with others taking advantage of the GPU multistreaming and parallel computing power. Experimental results show that CPU-GPU combined solutions o er near 10x speedup on the program routines that we proposed to improve, giving us a promising future work.A modelação e simulação de sistemas complexos em áreas científicas geralmente necessita de enormes e dispendiosos recursos computacionais de processamento. O estudo das propriedades de cristais em ondas planas, com base na mecânica quântica, oferece alguns desafios aos cientistas da computação para melhorar a eficiência dos métodos numéricos e algoritmos. As bibliotecas numéricas evoluíram muito tirando vantagem de ambientes multi-threading de computação. O trabalho apresentado nesta dissertação baseia-se na melhoria da eficiência de um programa de ondas planas, o CPW2000, desenvolvido por um investigador da área da física, orientado para uma plataforma heterogénea de computação com um CPU multicore e um GPU com suporte à plataforma CUDA. As principais causas da deterioração da eficiência foram identificadas no módulo que contêm os cálculos de FFT, e o estudo começou com a análise dos tempos de execução de cada componente da aplicação. Este estudo mostra que (i) mais de 90% do tempo total de computação é dividido por duas funções, DGEMM e FFT, (ii) é difícil de melhorar a eficiência das bibliotecas numéricas atuais, e (iii) que as funções DGEMM estão distribuídas pela aplicação enquanto as funções FFT estão concentradas numa função. Estas características foram devidamente exploradas de forma a desenvolver código em que partes deste executa num CPU multicore e outras aproveitam o paralelismo e multistreaming presente nos GPU. Resultados experimentais mostram que as soluções combinadas de CPU-GPU oferecem uma melhoria de aproximadamente 10x nas funções que nos propusemos a melhorar a eficiência, culminando num trabalho futuro promissor
    corecore