Search CORE

1,670 research outputs found

Large-Scale MIMO Detection for 3GPP LTE: Algorithms and FPGA Implementations

Author: Cavallaro Joseph R.
Dick Chris
Studer Christoph
Wang Guohui
Wu Michael
Yin Bei
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

Large-scale (or massive) multiple-input multiple-output (MIMO) is expected to be one of the key technologies in next-generation multi-user cellular systems, based on the upcoming 3GPP LTE Release 12 standard, for example. In this work, we propose - to the best of our knowledge - the first VLSI design enabling high-throughput data detection in single-carrier frequency-division multiple access (SC-FDMA)-based large-scale MIMO systems. We propose a new approximate matrix inversion algorithm relying on a Neumann series expansion, which substantially reduces the complexity of linear data detection. We analyze the associated error, and we compare its performance and complexity to those of an exact linear detector. We present corresponding VLSI architectures, which perform exact and approximate soft-output detection for large-scale MIMO systems with various antenna/user configurations. Reference implementation results for a Xilinx Virtex-7 XC7VX980T FPGA show that our designs are able to achieve more than 600 Mb/s for a 128 antenna, 8 user 3GPP LTE-based large-scale MIMO system. We finally provide a performance/complexity trade-off comparison using the presented FPGA designs, which reveals that the detector circuit of choice is determined by the ratio between BS antennas and users, as well as the desired error-rate performance.Comment: To appear in the IEEE Journal of Selected Topics in Signal Processin

arXiv.org e-Print Archive

CiteSeerX

Repository for Publications and Research Data

On the synthesis and processing of high quality audio signals by parallel computers

Author: Bailey Nicholas James
How To Cite
Nicholas James Bailey
Publication venue
Publication date: 01/01/1991
Field of study

This work concerns the application of new computer architectures to the creation and manipulation of high-quality audio bandwidth signals. The configuration of both the hardware and software in such systems falls under consideration in the three major sections which present increasing levels of algorithmic concurrency. In the first section, the programs which are described are distributed in identical copies across an array of processing elements; these programs run autonomously, generating data independently, but with control parameters peculiar to each copy: this type of concurrency is referred to as isonomic}The central section presents a structure which distributes tasks across an arbitrary network of processors; the flow of control in such a program is quasi- indeterminate, and controlled on a demand basis by the rate of completion of the slave tasks and their irregular interaction with the master. Whilst that interaction is, in principle, deterministic, it is also data-dependent; the dynamic nature of task allocation demands that no a priori knowledge of the rate of task completion be required. This type of concurrency is called dianomic? Finally, an architecture is described which will support a very high level of algorithmic concurrency. The programs which make efficient use of such a machine are designed not by considering flow of control, but by considering flow of data. Each atomic algorithmic unit is made as simple as possible, which results in the extensive distribution of a program over very many processing elements. Programs designed by considering only the optimum data exchange routes are said to exhibit systolic^ concurrency. Often neglected in the study of system design are those provisions necessary for practical implementations. It was intended to provide users with useful application programs in fulfilment of this study; the target group is electroacoustic composers, who use digital signal processing techniques in the context of musical composition. Some of the algorithms in use in this field are highly complex, often requiring a quantity of processing for each sample which exceeds that currently available even from very powerful computers. Consequently, applications tend to operate not in 'real-time' (where the output of a system responds to its input apparently instantaneously), but by the manipulation of sounds recorded digitally on a mass storage device. The first two sections adopt existing, public-domain software, and seek to increase its speed of execution significantly by parallel techniques, with the minimum compromise of functionality and ease of use. Those chosen are the general- purpose direct synthesis program CSOUND, from M.I.T., and a stand-alone phase vocoder system from the C.D.P..(^4) In each case, the desired aim is achieved: to increase speed of execution by two orders of magnitude over the systems currently in use by composers. This requires substantial restructuring of the programs, and careful consideration of the best computer architectures on which they are to run concurrently. The third section examines the rationale behind the use of computers in music, and begins with the implementation of a sophisticated electronic musical instrument capable of a degree of expression at least equal to its acoustic counterparts. It seems that the flexible control of such an instrument demands a greater computing resource than the sound synthesis part. A machine has been constructed with the intention of enabling the 'gestural capture' of performance information in real-time; the structure of this computer, which has one hundred and sixty high-performance microprocessors running in parallel, is expounded; and the systolic programming techniques required to take advantage of such an array are illustrated in the Occam programming language

Durham e-Theses

CiteSeerX

Combining Synthesis of Cardiorespiratory Signals and Artifacts with Deep Learning for Robust Vital Sign Estimation

Author: Silva Diogo Filipe Pereira Fontes Fernandes
Publication venue
Publication date: 01/01/2019
Field of study

Healthcare has been remarkably morphing on the account of Big Data. As Machine Learning (ML) consolidates its place in simpler clinical chores, more complex Deep Learning (DL) algorithms have struggled to keep up, despite their superior capabilities. This is mainly attributed to the need for large amounts of data for training, which the scientific community is unable to satisfy. The number of promising DL algorithms is considerable, although solutions directly targeting the shortage of data lack. Currently, dynamical generative models are the best bet, but focus on single, classical modalities and tend to complicate significantly with the amount of physiological effects they can simulate. This thesis aims at providing and validating a framework, specifically addressing the data deficit in the scope of cardiorespiratory signals. Firstly, a multimodal statistical synthesizer was designed to generate large, annotated artificial signals. By expressing data through coefficients of pre-defined, fitted functions and describing their dependence with Gaussian copulas, inter- and intra-modality associations were learned. Thereafter, new coefficients are sampled to generate artificial, multimodal signals with the original physiological dynamics. Moreover, normal and pathological beats along with artifacts were included by employing Markov models. Secondly, a convolutional neural network (CNN) was conceived with a novel sensor-fusion architecture and trained with synthesized data under real-world experimental conditions to evaluate how its performance is affected. Both the synthesizer and the CNN not only performed at state of the art level but also innovated with multiple types of generated data and detection error improvements, respectively. Cardiorespiratory data augmentation corrected performance drops when not enough data is available, enhanced the CNN’s ability to perform on noisy signals and to carry out new tasks when introduced to, otherwise unavailable, types of data. Ultimately, the framework was successfully validated showing potential to leverage future DL research on Cardiology into clinical standards

Repositório da Universidade Nova de Lisboa

Recommended from our members

Efficient architectures and power modelling of multiresolution analysis algorithms on FPGA

Author: Sazish Abdul Naser
Publication venue: Brunel University School of Engineering and Design PhD Theses
Publication date: 01/01/2011
Field of study

This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.In the past two decades, there has been huge amount of interest in Multiresolution Analysis Algorithms (MAAs) and their applications. Processing some of their applications such as medical imaging are computationally intensive, power hungry and requires large amount of memory which cause a high demand for efficient algorithm implementation, low power architecture and acceleration. Recently, some MAAs such as Finite Ridgelet Transform (FRIT) Haar Wavelet Transform (HWT) are became very popular and they are suitable for a number of image processing applications such as detection of line singularities and contiguous edges, edge detection (useful for compression and feature detection), medical image denoising and segmentation. Efficient hardware implementation and acceleration of these algorithms particularly when addressing large problems are becoming very chal-lenging and consume lot of power which leads to a number of issues including mobility, reliability concerns. To overcome the computation problems, Field Programmable Gate Arrays (FPGAs) are the technology of choice for accelerating computationally intensive applications due to their high performance. Addressing the power issue requires optimi- sation and awareness at all level of abstractions in the design flow. The most important achievements of the work presented in this thesis are summarised here. Two factorisation methodologies for HWT which are called HWT Factorisation Method1 and (HWTFM1) and HWT Factorasation Method2 (HWTFM2) have been explored to increase number of zeros and reduce hardware resources. In addition, two novel efficient and optimised architectures for proposed methodologies based on Distributed Arithmetic (DA) principles have been proposed. The evaluation of the architectural results have shown that the proposed architectures results have reduced the arithmetics calculation (additions/subtractions) by 33% and 25% respectively compared to direct implementa-tion of HWT and outperformed existing results in place. The proposed HWTFM2 is implemented on advanced and low power FPGA devices using Handel-C language. The FPGAs implementation results have outperformed other existing results in terms of area and maximum frequency. In addition, a novel efficient architecture for Finite Radon Trans-form (FRAT) has also been proposed. The proposed architecture is integrated with the developed HWT architecture to build an optimised architecture for FRIT. Strategies such as parallelism and pipelining have been deployed at the architectural level for efficient im-plementation on different FPGA devices. The proposed FRIT architecture performance has been evaluated and the results outperformed some other existing architecture in place. Both FRAT and FRIT architectures have been implemented on FPGAs using Handel-C language. The evaluation of both architectures have shown that the obtained results out-performed existing results in place by almost 10% in terms of frequency and area. The proposed architectures are also applied on image data (256 £ 256) and their Peak Signal to Noise Ratio (PSNR) is evaluated for quality purposes. Two architectures for cyclic convolution based on systolic array using parallelism and pipelining which can be used as the main building block for the proposed FRIT architec-ture have been proposed. The first proposed architecture is a linear systolic array with pipelining process and the second architecture is a systolic array with parallel process. The second architecture reduces the number of registers by 42% compare to first architec-ture and both architectures outperformed other existing results in place. The proposed pipelined architecture has been implemented on different FPGA devices with vector size (N) 4,8,16,32 and word-length (W=8). The implementation results have shown a signifi-cant improvement and outperformed other existing results in place. Ultimately, an in-depth evaluation of a high level power macromodelling technique for design space exploration and characterisation of custom IP cores for FPGAs, called func-tional level power modelling approach have been presented. The mathematical techniques that form the basis of the proposed power modeling has been validated by a range of custom IP cores. The proposed power modelling is scalable, platform independent and compares favorably with existing approaches. A hybrid, top-down design flow paradigm integrating functional level power modelling with commercially available design tools for systematic optimisation of IP cores has also been developed. The in-depth evaluation of this tool enables us to observe the behavior of different custom IP cores in terms of power consumption and accuracy using different design methodologies and arithmetic techniques on virous FPGA platforms. Based on the results achieved, the proposed model accuracy is almost 99% true for all IP core's Dynamic Power (DP) components.Thomas Gerald Gray Charitable Trus

Brunel University Research Archive

Design & Implementation of FPGA-based Multi-standard Software Radio Receiver

Author: Alam Muhammad Mahtab
Awan Mehmood
Publication venue: Aalborg University
Publication date: 01/01/2007
Field of study

VBN

Solution of partial differential equations on vector and parallel computers

Author: Ortega J. M.
Voigt R. G.
Publication venue
Publication date
Field of study

The present status of numerical methods for partial differential equations on vector and parallel computers was reviewed. The relevant aspects of these computers are discussed and a brief review of their development is included, with particular attention paid to those characteristics that influence algorithm selection. Both direct and iterative methods are given for elliptic equations as well as explicit and implicit methods for initial boundary value problems. The intent is to point out attractive methods as well as areas where this class of computer architecture cannot be fully utilized because of either hardware restrictions or the lack of adequate algorithms. Application areas utilizing these computers are briefly discussed

NASA Technical Reports Server

Mathematical Systems Theory: Conceptual Framework and Application Examples

Author: Pichler Franz
Publication venue: 'Universidade da Coruna'
Publication date: 01/01/2010
Field of study

Repositorio da Universidade da Coruña

Energy efficient hardware acceleration of multimedia processing tools

Author: Kinane Andrew
Publication venue: Dublin City University. School of Electronic Engineering
Publication date: 01/01/2006
Field of study

The world of mobile devices is experiencing an ongoing trend of feature enhancement and generalpurpose multimedia platform convergence. This trend poses many grand challenges, the most pressing being their limited battery life as a consequence of delivering computationally demanding features. The envisaged mobile application features can be considered to be accelerated by a set of underpinning hardware blocks Based on the survey that this thesis presents on modem video compression standards and their associated enabling technologies, it is concluded that tight energy and throughput constraints can still be effectively tackled at algorithmic level in order to design re-usable optimised hardware acceleration cores. To prove these conclusions, the work m this thesis is focused on two of the basic enabling technologies that support mobile video applications, namely the Shape Adaptive Discrete Cosine Transform (SA-DCT) and its inverse, the SA-IDCT. The hardware architectures presented in this work have been designed with energy efficiency in mind. This goal is achieved by employing high level techniques such as redundant computation elimination, parallelism and low switching computation structures. Both architectures compare favourably against the relevant pnor art in the literature. The SA-DCT/IDCT technologies are instances of a more general computation - namely, both are Constant Matrix Multiplication (CMM) operations. Thus, this thesis also proposes an algorithm for the efficient hardware design of any general CMM-based enabling technology. The proposed algorithm leverages the effective solution search capability of genetic programming. A bonus feature of the proposed modelling approach is that it is further amenable to hardware acceleration. Another bonus feature is an early exit mechanism that achieves large search space reductions .Results show an improvement on state of the art algorithms with future potential for even greater savings

Irish Universities

DCU Online Research Access Service

Phone based heart and lung functions monitor

Author: Silva João Filipe Trindade da
Publication venue
Publication date: 01/01/2011
Field of study

Tese de Mestrado Integrado. Engenharia Informática e Computação. Faculdade de Engenharia. Universidade do Porto. 201

Repositório Aberto da Universidade do Porto

Conformação de pulso de formas de onda OFDM para a interface aérea 5G

Author: Luque Quispe Jaime Junior, 1987-
Publication venue: [s.n.]
Publication date: 06/01/2021
Field of study

Orientador: Luís Geraldo Pedroso MeloniDissertação (mestrado) - Universidade Estadual de Campinas, Faculdade de Engenharia Elétrica e de ComputaçãoResumo: As formas de onda com multiplexação ortogonal por divisão de freqüência (OFDM) foram utilizadas com sucesso na interface aérea 3GPP LTE para superar a seletividade do canal e proporcionar uma boa eficiência espectral e altas taxas de transmissão de dados. O próximo sistema de comunicações 5G tem como objetivo oferecer suporte a mais serviços do que o antecessor, como comunicações de banda larga móveis, comunicações de tipo máquina e comunicações de baixa latência, e considera muitos outros cenários de aplicação, como o uso de espectro fragmentado. Esta diversidade de serviços com diferentes requisitos não pode ser suportada pela OFDM convencional, pois OFDM configura toda a largura de banda com parâmetros que atendem a um serviço em particular. Além disso, pode ocorrer interferência interportadora (ICI) quando a OFDM convencional é usada com multiplexação assíncrona de múltiplos usuários e isso é devido às altas emissões fora de banda (OOB) das subportadoras e à violação da condição de ortogonalidade do sinal. Portanto, para atender aos requisitos das futuras aplicações sem fio 5G, o desenvolvimento de uma interface aérea inovadora com novas capacidades torna-se necessário, em particular, uma nova forma de onda mais espectralmente ágil do que OFDM capaz de suportar múltiplas configurações, suprimindo efetivamente a interferência entre usuários, e com integração direta com as camadas superiores. Este trabalho centra-se em duas técnicas de conformação de pulsos para reduzir a emissões fora de banda e melhorar o desempenho de formas de onda baseadas em OFDM. A conformação de pulsos pode permitir o uso de parametrizações múltiplas dentro da forma de onda e abandonar os paradigmas rígidos de ortogonalidade e sincronismo com uma degradação de desempenho causada por interferência intersymbol (ISI) e ICI relativamente baixa. A primeira parte aborda um método de modelagem de pulso baseado na filtragem por subportadora para reduzir a emissão fora de banda no transmissor e interferência de canal adjacente (ACI) no receptor. Ele pode ser implementado usando funções de janela e alguns formatos de janela são apresentados nesta parte. O primeiro usa o prefixo cíclico (CP) existente dos símbolos para suavizar as transições abruptas do sinal, portanto, os grandes lóbulos espectrais sinc causados pelos filtros retangulares. Isso garante a compatibilidade retroativa em sistemas que usam OFDM com prefixo cíclico (CP-OFDM). O formato da segunda janela estende o comprimento do CP para reter a capacidade da forma de onda para combater a propagação do atraso do canal. Os efeitos no desempenho do ISI e ICI são estudados em termos de relação de sinal para interferência (SIR) e taxa de erro de bit (BER) usando formas de onda LTE em um cenário de espectro fragmentado multi-usuário. A segunda parte deste trabalho aborda o desenho e análise de filtros para a contenção espectral flexível em transceptores com filtragem baseada em sub-banda. Este filtro, chamado aqui semi-equiripple, exibe melhor atenuação na banda de rejeição para reduzir as interferências entre subbandas do que os filtros equiripple e filtros sinc baseados em janelamento e também possui boas características de resposta ao impulso para reduzir o ISI. O projeto de filtros baseia-se no algoritmo Parks-McClellan para obter diferentes taxas de decaimento da banda de parada e atende a especificações arbitrárias de máscaras de emissão de espectro (SEM) com baixa distorção dentro da banda. Portanto, pode ser útil para obter baixas emissões fora da banda e configurar sub-bandas com parâmetros independentes, uma vez que a interferência assíncrona é contida pelos filtros. São estudadas três distorções de ISI no filtro: espalhamento de símbolos relacionado à causalidade do filtro, ecos de símbolos devido a ondulações na banda e amplificação de ISI devido a amostras de valores anômalas nas caudas de sua resposta de impulso. O desempenho do filtro é avaliado em termos de densidade de espectro de potência (PSD) e conformidade com SEMs, taxa de erro de modulação (MER) e operação em um esquema assíncrono multi-serviço usando uma única forma de onda. O SIR e o efeito da filtragem na precisão da modulação são avaliados usando formas de onda OFDM ISDB-T e LTE. Estruturas de hardware flexíveis também são propostas para implementações reais. Os resultados mostram que esses métodos de conformação de pulso permitem que a forma de onda explore os fragmentos de espectro disponíveis e ofereça suporte a múltiplos serviços sem uma penalidade de desempenho significativa, o que pode permitir uma interface aérea mais flexívelAbstract: Orthogonal frequency division multiplexing (OFDM) waveforms have been used successfully in the 3GPP Long Term Evolution (LTE) air interface to overcome the channel selectivity and to provide good spectrum efficiency and high transmission data rates. The forthcoming 5G communication system aims to support more services than its predecessor, such as enhanced mobile broadband, machine-type communications and low latency communications, and considers many other application scenarios such as the fragmented spectrum use. This diversity of services with different requirements cannot be supported by conventional OFDM since OFDM configures the entire bandwidth with parameters attending one service in particular. Also, substantial intercarrier interference (ICI) can occur when conventional OFDM is used with asynchronous multiuser multiplexing and this is due to the high out-of-band (OOB) emissions of the subcarriers and the violation of the signal orthogonality constraint. Therefore, to meet the requirements of future 5G wireless applications, the development of an innovative air interface with new capabilities becomes necessary, in particular, a new waveform more spectrally agile than OFDM capable of supporting multiple configurations, suppressing the inter-user interference effectively, and with straightforward integration with the upper layers. This work focuses on two pulse shaping techniques to reduce the OOB emission and improve the in-band and OOB performances of OFDM-based waveforms. Pulse shaping can enable the use of multiple parameterizations within the waveform and abandon the strict paradigms of orthogonality and synchronism with relatively low performance degradation caused by intersymbol interference (ISI) and ICI. The first part addresses a pulse shaping method based on per-subcarrier filtering to reduce both OOB emission in the transmitter and adjacent channel interference (ACI) in the receiver. It can be implemented using window functions and some window formats are presented in this part. The first uses the existing cyclic prefix (CP) of OFDM symbols to smooth abrupt transitions of the signal, thus the large sinc spectral sidelobes caused by the rectangular filters. This guarantees backwards compatibility in systems using conventional cyclic prefixed OFDM (CP-OFDM). The second window format extends the CP length to retain the waveform ability to combat channel delay spread. The effects on performance of ISI and ICI are studied in terms of the signal to interference ratio (SIR) and bit error rate (BER) using LTE waveforms in a multi-user fragmented spectrum scenario. The second part of this work addresses the design and analysis of a filters for flexible spectral containment in subband-based filtering transceivers. This filter, called here semi-equiripple, exhibits better stopband attenuation to reduce the inter-subband interferences than equiripple and windowed truncated sinc filters and also has good impulse response characteristics to reduce ISI. The design is based on the Parks-McClellan algorithm to obtain different stopband decay rates and meet arbitrary spectrum emission masks (SEM) specifications with low in-band distortion. Therefore, it can be useful to achieve low OOB emission and configure subbands with independent parameters since the asynchronous interference is contained by the filters. Three ISI distortions in the filter are studied: symbol spreading related to the filter causality, symbol echoes due to in-band ripples, and ISI amplification due to outlier samples in the tails of its impulse response. The performance of the filter is assessed in terms of the power spectrum density (PSD) and compliance with tight SEMs, modulation error rate (MER) and operation in a multi-service asynchronous scheme using a single waveform. The SIR and the effect of filtering on the modulation accuracy are evaluated using OFDM ISDB-T and LTE waveforms. Flexible hardware structures are also proposed for actual implementations. The results show that these pulse shaping methods enable the waveform to exploit the available spectrum fragments and support multiple services without significant performance penalty, which can allow a more flexible air interfaceMestradoTelecomunicações e TelemáticaMestre em Engenharia ElétricaCAPE

Repositorio da Producao Cientifica e Intelectual da Unicamp