211 research outputs found

    Domain specific high performance reconfigurable architecture for a communication platform

    Get PDF

    Reconfigurable architectures for beyond 3G wireless communication systems

    Get PDF

    A hardware implementation of a Viterbi decoder for a (3,2/3) TCM code

    Get PDF
    The report details the design of a dedicated Viterbi decoder chip set for an Ungerboek (3,2/3) Trellis Coded Modulation code. It was the specific intention of the thesis to design a system that could be implemented on standard Field Programmable Gate Arrays (FPGA) yet still be able to cope with high bit rates. The focus of the research was to both evaluate and modify the existing VLSI design techniques and to develop new techniques to make this possible. Trellis Coded Modulation refers to a specific sub-class of convolutional codes that ire an example of coded modulation. In coded modulation there is a direct link between the encoding and modulation processes aimed at improving the performance of the code by introducing redundancy in the signal set used to transmit the code. Ungerboek developed a technique for mapping the encoded words onto points in the signal set, called mapping by set partitioning, that maximises the Euclidian distance between adjacent codewords, and hence maximises the minimum distance between any two output sequences in the code. The Viterbi algorithm is a maximum likelihood decoder for convolutional codes such as TCM. The operation of the Viterbi algorithm is based on using soft decision decoding to produce an estimate of how well the received sequence corresponds with any of the allowed code sequences. The code sequences which most closely matches the received sequence is then decoded to form the output of the decoder. A central problem in implementing systems using TCM with Viterbi decoding is that although the encoder is a relatively simple device, the decoder is not. The complexity of the Viterbi decoder for any given TCM scheme will be the major drawback in implementing the scheme. As such techniques for reducing the complexity of Viterbi decoders are of interest to developers of communication systems. The algorithms describing the implementation and operation of the Viterbi algorithm can be categorised into three main layers. The top layer holds the theoretical algorithm itself, in the second layer are the set of algorithms that describe the broad techniques used to manipulate the theoretical algorithm into a form in which it can be implemented, and the third layer of algorithms describe the implementations themselves. The work contained in this thesis concentrates on the second two layers of algorithms

    Ultra low-power, high-performance accelerator for speech recognition

    Get PDF
    Automatic Speech Recognition (ASR) is undoubtedly one of the most important and interesting applications in the cutting-edge era of Deep-learning deployment, especially in the mobile segment. Fast and accurate ASR comes at a high energy cost, requiring huge memory storage and computational power, which is not affordable for the tiny power budget of mobile devices. Hardware acceleration can reduce power consumption of ASR systems as well as reducing its memory pressure, while delivering high-performance. In this thesis, we present a customized accelerator for large-vocabulary, speaker-independent, continuous speech recognition. A state-of-the-art ASR system consists of two major components: acoustic-scoring using DNN and speech-graph decoding using Viterbi search. As the first step, we focus on the Viterbi search algorithm, that represents the main bottleneck in the ASR system. The accelerator includes some innovative techniques to improve the memory subsystem, which is the main bottleneck for performance and power, such as a prefetching scheme and a novel bandwidth saving technique tailored to the needs of ASR. Furthermore, as the speech graph is vast taking more than 1-Gigabyte memory space, we propose to change its representation by partitioning it into several sub-graphs and perform an on-the-fly composition during the Viterbi run-time. This approach together with some simple yet efficient compression techniques result in 31x memory footprint reduction, providing 155x real-time speedup and orders of magnitude power and energy saving compared to CPUs and GPUs. In the next step, we propose a novel hardware-based ASR system that effectively integrates a DNN accelerator for the pruned/quantized models with the Viterbi accelerator. We show that, when either pruning or quantizing the DNN model used for acoustic scoring, ASR accuracy is maintained but the execution time of the ASR system is increased by 33%. Although pruning and quantization improves the efficiency of the DNN, they result in a huge increase of activity in the Viterbi search since the output scores of the pruned model are less reliable. In order to avoid the aforementioned increase in Viterbi search workload, our system loosely selects the N-best hypotheses at every time step, exploring only the N most likely paths. Our final solution manages to efficiently combine both DNN and Viterbi accelerators using all their optimizations, delivering 222x real-time ASR with a small power budget of 1.26 Watt, small memory footprint of 41 MB, and a peak memory bandwidth of 381 MB/s, being amenable for low-power mobile platforms.Los sistemas de reconocimiento automático del habla (ASR por sus siglas en inglés, Automatic Speech Recognition) son sin lugar a dudas una de las aplicaciones más relevantes en el área emergente de aprendizaje profundo (Deep Learning), specialmente en el segmento de los dispositivos móviles. Realizar el reconocimiento del habla de forma rápida y precisa tiene un elevado coste en energía, requiere de gran capacidad de memoria y de cómputo, lo cual no es deseable en sistemas móviles que tienen severas restricciones de consumo energético y disipación de potencia. El uso de arquitecturas específicas en forma de aceleradores hardware permite reducir el consumo energético de los sistemas de reconocimiento del habla, al tiempo que mejora el rendimiento y reduce la presión en el sistema de memoria. En esta tesis presentamos un acelerador específicamente diseñado para sistemas de reconocimiento del habla de gran vocabulario, independientes del orador y que funcionan en tiempo real. Un sistema de reconocimiento del habla estado del arte consiste principalmente en dos componentes: el modelo acústico basado en una red neuronal profunda (DNN, Deep Neural Network) y la búsqueda de Viterbi basada en un grafo que representa el lenguaje. Como primer objetivo nos centramos en la búsqueda de Viterbi, ya que representa el principal cuello de botella en los sistemas ASR. El acelerador para el algoritmo de Viterbi incluye técnicas innovadoras para mejorar el sistema de memoria, que es el mayor cuello de botella en rendimiento y energía, incluyendo técnicas de pre-búsqueda y una nueva técnica de ahorro de ancho de banda a memoria principal específicamente diseñada para sistemas ASR. Además, como el grafo que representa el lenguaje requiere de gran capacidad de almacenamiento en memoria (más de 1 GB), proponemos cambiar su representación y dividirlo en distintos grafos que se componen en tiempo de ejecución durante la búsqueda de Viterbi. De esta forma conseguimos reducir el almacenamiento en memoria principal en un factor de 31x, alcanzar un rendimiento 155 veces superior a tiempo real y reducir el consumo energético y la disipación de potencia en varios órdenes de magnitud comparado con las CPUs y las GPUs. En el siguiente paso, proponemos un novedoso sistema hardware para reconocimiento del habla que integra de forma efectiva un acelerador para DNNs podadas y cuantizadas con el acelerador de Viterbi. Nuestros resultados muestran que podar y/o cuantizar el DNN para el modelo acústico permite mantener la precisión pero causa un incremento en el tiempo de ejecución del sistema completo de hasta el 33%. Aunque podar/cuantizar mejora la eficiencia del DNN, éstas técnicas producen un gran incremento en la carga de trabajo de la búsqueda de Viterbi ya que las probabilidades calculadas por el DNN son menos fiables, es decir, se reduce la confianza en las predicciones del modelo acústico. Con el fin de evitar un incremento inaceptable en la carga de trabajo de la búsqueda de Viterbi, nuestro sistema restringe la búsqueda a las N hipótesis más probables en cada paso de la búsqueda. Nuestra solución permite combinar de forma efectiva un acelerador de DNNs con un acelerador de Viterbi incluyendo todas las optimizaciones de poda/cuantización. Nuestro resultados experimentales muestran que dicho sistema alcanza un rendimiento 222 veces superior a tiempo real con una disipación de potencia de 1.26 vatios, unos requisitos de memoria modestos de 41 MB y un uso de ancho de banda a memoria principal de, como máximo, 381 MB/s, ofreciendo una solución adecuada para dispositivos móviles

    Realizing Software Defined Radio - A Study in Designing Mobile Supercomputers.

    Full text link
    The physical layer of most wireless protocols is traditionally implemented in custom hardware to satisfy the heavy computational requirements while keeping power consumption to a minimum. These implementations are time consuming to design and difficult to verify. A programmable hardware platform capable of supporting software implementations of the physical layer, or Software Defined Radio (SDR), has a number of advantages. These include support for multiple protocols, faster time-to-market, higher chip volumes, and support for late implementation changes. The challenge is to achieve this under the power budget of a mobile device. Wireless communications belong to an emerging class of applications with the processing requirements of a supercomputer but the power constraints of a mobile device -- mobile supercomputing. This thesis presents a set of design proposals for building a programmable wireless communication solution. In order to design a solution that can meet the lofty requirements of SDR, this thesis takes an application-centric design approach -- evaluate and optimize all aspects of the design based on the characteristics of wireless communication protocols. This includes a DSP processor architecture optimized for wireless baseband processing, wireless algorithm optimizations, and language and compilation tool support for the algorithm software and the processor hardware. This thesis first analyzes the software characteristics of SDR. Based on the analysis, this thesis proposes the Signal-Processing On-Demand Architecture (SODA), a fully programmable multi-core architecture that can support the computation requirements of third generation wireless protocols, while operating within the power budget of a mobile device. This thesis then presents wireless algorithm implementations and optimizations for the SODA processor architecture. A signal processing language extension (SPEX) is proposed to help the software development efforts of wireless communication protocols on SODA-like multi-core architecture. And finally, the SPIR compiler is proposed to automatically map SPEX code onto the multi-core processor hardware.Ph.D.Computer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/61760/1/linyz_1.pd

    Optimisation of Iterative Multi-user Receivers using Analytical Tools

    No full text
    The objective of this thesis is to develop tools for the analysis and optimization of an iterative receiver. These tools can be applied to most soft-in soft-out (SISO) receiver components. For illustration purposes we consider a multi-user DS-CDMA system with forward error correction that employs iterative multi-user detection based on soft interference cancellation and single user decoding. Optimized power levels combined with adaptive scheduling allows for efficient utilization of receiver resources for heavily loaded systems.¶ Metric transfer analysis has been shown to be an accurate method of predicting the convergence behavior of iterative receivers. EXtrinsic Information (EXIT), fidelity (FT) and variance (VT) transfer analysis are well-known methods, however the relationship between the different approaches has not been explored in detail. We compare the metrics numerically and analytically and derive functions to closely approximate the relationship between them. The result allows for easy translation between EXIT, FT and VT methods. Furthermore, we extend the JJ function, which describes mutual information as a function of variance, to fidelity and symbol error variance, the Rayleigh fading channel model and a channel estimate. ...

    Soft detection and decoding in wideband CDMA systems

    Get PDF
    A major shift is taking place in the world of telecommunications towards a communications environment where a range of new data services will be available for mobile users. This shift is already visible in several areas of wireless communications, including cellular systems, wireless LANs, and satellite systems. The provision of flexible high-quality wireless data services requires a new approach on both the radio interface specification and the design and the implementation of the various transceiver algorithms. On the other hand, when the processing power available in the receivers increases, more complex receiver algorithms become feasible. The general problem addressed in this thesis is the application of soft detection and decoding algorithms in the wideband code division multiple access (WCDMA) receivers, both in the base stations and in the mobile terminals, so that good performance is achieved but that the computational complexity remains acceptable. In particular, two applications of soft detection and soft decoding are studied: coded multiuser detection in the CDMA base station and improved RAKE-based reception employing soft detection in the mobile terminal. For coded multiuser detection, we propose a novel receiver structure that utilizes the decoding information for multiuser detection. We analyze the performance and derive lower bounds for the capacity of interference cancellation CDMA receivers when using channel coding to improve the reliability of tentative decisions. For soft decision and decoding techniques in the CDMA downlink, we propose a modified maximal ratio combining (MRC) scheme that is more suitable for RAKE receivers in WCDMA mobile terminals than the conventional MRC scheme. We also introduce an improved soft-output RAKE detector that is especially suitable for low spreading gains and high-order modulation schemes. Finally we analyze the gain obtained through the use of Brennan's MRC scheme and our modified MRC scheme. Throughout this thesis Bayesian networks are utilized to develop algorithms for soft detection and decoding problems. This approach originates from the initial stages of this research, where Bayesian networks and algorithms using such graphical models (e.g. the so-called sum-product algorithm) were used to identify new receiver algorithms. In the end, this viewpoint may not be easily noticeable in the final form of the algorithms, mainly because the practical efficiency considerations forced us to select simplified variants of the algorithms. However, this viewpoint is important to emphasize the underlying connection between the apparently different soft detection and decision algorithms described in this thesis.reviewe
    corecore