Search CORE

211 research outputs found

Domain specific high performance reconfigurable architecture for a communication platform

Author: Ahmed Imran
Publication venue: The University of Edinburgh
Publication date: 01/01/2007
Field of study

Edinburgh Research Archive

Reconfigurable architectures for beyond 3G wireless communication systems

Author: Zhan Cheng
Publication venue: The University of Edinburgh
Publication date: 01/01/2007
Field of study

Edinburgh Research Archive

A hardware implementation of a Viterbi decoder for a (3,2/3) TCM code

Author: Horwitz Michael Richard
Publication venue: Department of Electrical Engineering
Publication date: 01/01/1997
Field of study

The report details the design of a dedicated Viterbi decoder chip set for an Ungerboek (3,2/3) Trellis Coded Modulation code. It was the specific intention of the thesis to design a system that could be implemented on standard Field Programmable Gate Arrays (FPGA) yet still be able to cope with high bit rates. The focus of the research was to both evaluate and modify the existing VLSI design techniques and to develop new techniques to make this possible. Trellis Coded Modulation refers to a specific sub-class of convolutional codes that ire an example of coded modulation. In coded modulation there is a direct link between the encoding and modulation processes aimed at improving the performance of the code by introducing redundancy in the signal set used to transmit the code. Ungerboek developed a technique for mapping the encoded words onto points in the signal set, called mapping by set partitioning, that maximises the Euclidian distance between adjacent codewords, and hence maximises the minimum distance between any two output sequences in the code. The Viterbi algorithm is a maximum likelihood decoder for convolutional codes such as TCM. The operation of the Viterbi algorithm is based on using soft decision decoding to produce an estimate of how well the received sequence corresponds with any of the allowed code sequences. The code sequences which most closely matches the received sequence is then decoded to form the output of the decoder. A central problem in implementing systems using TCM with Viterbi decoding is that although the encoder is a relatively simple device, the decoder is not. The complexity of the Viterbi decoder for any given TCM scheme will be the major drawback in implementing the scheme. As such techniques for reducing the complexity of Viterbi decoders are of interest to developers of communication systems. The algorithms describing the implementation and operation of the Viterbi algorithm can be categorised into three main layers. The top layer holds the theoretical algorithm itself, in the second layer are the set of algorithms that describe the broad techniques used to manipulate the theoretical algorithm into a form in which it can be implemented, and the third layer of algorithms describe the implementations themselves. The work contained in this thesis concentrates on the second two layers of algorithms

Cape Town University OpenUCT

Quality of service improvement for WiMAX physical layer

Author: Bashar Faouq Khalil Musa
بشار فاروق خليل موسى
Publication venue: جامعة القدس
Publication date: 02/07/2011
Field of study

Al-Quds University Digital Repository

Ultra low-power, high-performance accelerator for speech recognition

Author: Yazdani Aminabadi Reza
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2019
Field of study

Automatic Speech Recognition (ASR) is undoubtedly one of the most important and interesting applications in the cutting-edge era of Deep-learning deployment, especially in the mobile segment. Fast and accurate ASR comes at a high energy cost, requiring huge memory storage and computational power, which is not affordable for the tiny power budget of mobile devices. Hardware acceleration can reduce power consumption of ASR systems as well as reducing its memory pressure, while delivering high-performance. In this thesis, we present a customized accelerator for large-vocabulary, speaker-independent, continuous speech recognition. A state-of-the-art ASR system consists of two major components: acoustic-scoring using DNN and speech-graph decoding using Viterbi search. As the first step, we focus on the Viterbi search algorithm, that represents the main bottleneck in the ASR system. The accelerator includes some innovative techniques to improve the memory subsystem, which is the main bottleneck for performance and power, such as a prefetching scheme and a novel bandwidth saving technique tailored to the needs of ASR. Furthermore, as the speech graph is vast taking more than 1-Gigabyte memory space, we propose to change its representation by partitioning it into several sub-graphs and perform an on-the-fly composition during the Viterbi run-time. This approach together with some simple yet efficient compression techniques result in 31x memory footprint reduction, providing 155x real-time speedup and orders of magnitude power and energy saving compared to CPUs and GPUs. In the next step, we propose a novel hardware-based ASR system that effectively integrates a DNN accelerator for the pruned/quantized models with the Viterbi accelerator. We show that, when either pruning or quantizing the DNN model used for acoustic scoring, ASR accuracy is maintained but the execution time of the ASR system is increased by 33%. Although pruning and quantization improves the efficiency of the DNN, they result in a huge increase of activity in the Viterbi search since the output scores of the pruned model are less reliable. In order to avoid the aforementioned increase in Viterbi search workload, our system loosely selects the N-best hypotheses at every time step, exploring only the N most likely paths. Our final solution manages to efficiently combine both DNN and Viterbi accelerators using all their optimizations, delivering 222x real-time ASR with a small power budget of 1.26 Watt, small memory footprint of 41 MB, and a peak memory bandwidth of 381 MB/s, being amenable for low-power mobile platforms.Los sistemas de reconocimiento automático del habla (ASR por sus siglas en inglés, Automatic Speech Recognition) son sin lugar a dudas una de las aplicaciones más relevantes en el área emergente de aprendizaje profundo (Deep Learning), specialmente en el segmento de los dispositivos móviles. Realizar el reconocimiento del habla de forma rápida y precisa tiene un elevado coste en energía, requiere de gran capacidad de memoria y de cómputo, lo cual no es deseable en sistemas móviles que tienen severas restricciones de consumo energético y disipación de potencia. El uso de arquitecturas específicas en forma de aceleradores hardware permite reducir el consumo energético de los sistemas de reconocimiento del habla, al tiempo que mejora el rendimiento y reduce la presión en el sistema de memoria. En esta tesis presentamos un acelerador específicamente diseñado para sistemas de reconocimiento del habla de gran vocabulario, independientes del orador y que funcionan en tiempo real. Un sistema de reconocimiento del habla estado del arte consiste principalmente en dos componentes: el modelo acústico basado en una red neuronal profunda (DNN, Deep Neural Network) y la búsqueda de Viterbi basada en un grafo que representa el lenguaje. Como primer objetivo nos centramos en la búsqueda de Viterbi, ya que representa el principal cuello de botella en los sistemas ASR. El acelerador para el algoritmo de Viterbi incluye técnicas innovadoras para mejorar el sistema de memoria, que es el mayor cuello de botella en rendimiento y energía, incluyendo técnicas de pre-búsqueda y una nueva técnica de ahorro de ancho de banda a memoria principal específicamente diseñada para sistemas ASR. Además, como el grafo que representa el lenguaje requiere de gran capacidad de almacenamiento en memoria (más de 1 GB), proponemos cambiar su representación y dividirlo en distintos grafos que se componen en tiempo de ejecución durante la búsqueda de Viterbi. De esta forma conseguimos reducir el almacenamiento en memoria principal en un factor de 31x, alcanzar un rendimiento 155 veces superior a tiempo real y reducir el consumo energético y la disipación de potencia en varios órdenes de magnitud comparado con las CPUs y las GPUs. En el siguiente paso, proponemos un novedoso sistema hardware para reconocimiento del habla que integra de forma efectiva un acelerador para DNNs podadas y cuantizadas con el acelerador de Viterbi. Nuestros resultados muestran que podar y/o cuantizar el DNN para el modelo acústico permite mantener la precisión pero causa un incremento en el tiempo de ejecución del sistema completo de hasta el 33%. Aunque podar/cuantizar mejora la eficiencia del DNN, éstas técnicas producen un gran incremento en la carga de trabajo de la búsqueda de Viterbi ya que las probabilidades calculadas por el DNN son menos fiables, es decir, se reduce la confianza en las predicciones del modelo acústico. Con el fin de evitar un incremento inaceptable en la carga de trabajo de la búsqueda de Viterbi, nuestro sistema restringe la búsqueda a las N hipótesis más probables en cada paso de la búsqueda. Nuestra solución permite combinar de forma efectiva un acelerador de DNNs con un acelerador de Viterbi incluyendo todas las optimizaciones de poda/cuantización. Nuestro resultados experimentales muestran que dicho sistema alcanza un rendimiento 222 veces superior a tiempo real con una disipación de potencia de 1.26 vatios, unos requisitos de memoria modestos de 41 MB y un uso de ancho de banda a memoria principal de, como máximo, 381 MB/s, ofreciendo una solución adecuada para dispositivos móviles

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Tesis Doctorals en Xarxa

VLSI design and implementation of a Turbo Decoder

Author: Dielissen J.T.M.H.
Publication venue
Publication date: 01/01/2000
Field of study

Repository TU/e

Pure OAI Repository

Realizing Software Defined Radio - A Study in Designing Mobile Supercomputers.

Author: Lin Yuan
Publication venue
Publication date: 01/01/2008
Field of study

The physical layer of most wireless protocols is traditionally implemented in custom hardware to satisfy the heavy computational requirements while keeping power consumption to a minimum. These implementations are time consuming to design and difficult to verify. A programmable hardware platform capable of supporting software implementations of the physical layer, or Software Defined Radio (SDR), has a number of advantages. These include support for multiple protocols, faster time-to-market, higher chip volumes, and support for late implementation changes. The challenge is to achieve this under the power budget of a mobile device. Wireless communications belong to an emerging class of applications with the processing requirements of a supercomputer but the power constraints of a mobile device -- mobile supercomputing. This thesis presents a set of design proposals for building a programmable wireless communication solution. In order to design a solution that can meet the lofty requirements of SDR, this thesis takes an application-centric design approach -- evaluate and optimize all aspects of the design based on the characteristics of wireless communication protocols. This includes a DSP processor architecture optimized for wireless baseband processing, wireless algorithm optimizations, and language and compilation tool support for the algorithm software and the processor hardware. This thesis first analyzes the software characteristics of SDR. Based on the analysis, this thesis proposes the Signal-Processing On-Demand Architecture (SODA), a fully programmable multi-core architecture that can support the computation requirements of third generation wireless protocols, while operating within the power budget of a mobile device. This thesis then presents wireless algorithm implementations and optimizations for the SODA processor architecture. A signal processing language extension (SPEX) is proposed to help the software development efforts of wireless communication protocols on SODA-like multi-core architecture. And finally, the SPIR compiler is proposed to automatically map SPEX code onto the multi-core processor hardware.Ph.D.Computer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/61760/1/linyz_1.pd

CiteSeerX

Deep Blue Documents at the University of Michigan

Optimisation of Iterative Multi-user Receivers using Analytical Tools

Author: Shepherd David Peter
Publication venue
Publication date: 01/01/2008
Field of study

The objective of this thesis is to develop tools for the analysis and optimization of an iterative receiver. These tools can be applied to most soft-in soft-out (SISO) receiver components. For illustration purposes we consider a multi-user DS-CDMA system with forward error correction that employs iterative multi-user detection based on soft interference cancellation and single user decoding. Optimized power levels combined with adaptive scheduling allows for efficient utilization of receiver resources for heavily loaded systems.¶ Metric transfer analysis has been shown to be an accurate method of predicting the convergence behavior of iterative receivers. EXtrinsic Information (EXIT), fidelity (FT) and variance (VT) transfer analysis are well-known methods, however the relationship between the different approaches has not been explored in detail. We compare the metrics numerically and analytically and derive functions to closely approximate the relationship between them. The result allows for easy translation between EXIT, FT and VT methods. Furthermore, we extend the

J

function, which describes mutual information as a function of variance, to fidelity and symbol error variance, the Rayleigh fading channel model and a channel estimate. ...

The Australian National University

Soft detection and decoding in wideband CDMA systems

Author: Kettunen Kimmo
Publication venue: Teknillinen korkeakoulu
Publication date: 01/01/2003
Field of study

A major shift is taking place in the world of telecommunications towards a communications environment where a range of new data services will be available for mobile users. This shift is already visible in several areas of wireless communications, including cellular systems, wireless LANs, and satellite systems. The provision of flexible high-quality wireless data services requires a new approach on both the radio interface specification and the design and the implementation of the various transceiver algorithms. On the other hand, when the processing power available in the receivers increases, more complex receiver algorithms become feasible. The general problem addressed in this thesis is the application of soft detection and decoding algorithms in the wideband code division multiple access (WCDMA) receivers, both in the base stations and in the mobile terminals, so that good performance is achieved but that the computational complexity remains acceptable. In particular, two applications of soft detection and soft decoding are studied: coded multiuser detection in the CDMA base station and improved RAKE-based reception employing soft detection in the mobile terminal. For coded multiuser detection, we propose a novel receiver structure that utilizes the decoding information for multiuser detection. We analyze the performance and derive lower bounds for the capacity of interference cancellation CDMA receivers when using channel coding to improve the reliability of tentative decisions. For soft decision and decoding techniques in the CDMA downlink, we propose a modified maximal ratio combining (MRC) scheme that is more suitable for RAKE receivers in WCDMA mobile terminals than the conventional MRC scheme. We also introduce an improved soft-output RAKE detector that is especially suitable for low spreading gains and high-order modulation schemes. Finally we analyze the gain obtained through the use of Brennan's MRC scheme and our modified MRC scheme. Throughout this thesis Bayesian networks are utilized to develop algorithms for soft detection and decoding problems. This approach originates from the initial stages of this research, where Bayesian networks and algorithms using such graphical models (e.g. the so-called sum-product algorithm) were used to identify new receiver algorithms. In the end, this viewpoint may not be easily noticeable in the final form of the algorithms, mainly because the practical efficiency considerations forced us to select simplified variants of the algorithms. However, this viewpoint is important to emphasize the underlying connection between the apparently different soft detection and decision algorithms described in this thesis.reviewe

CiteSeerX

Aaltodoc Publication Archive

Recommended from our members

ENERGY EFFICIENCY EXPLORATION OF COARSE-GRAIN RECONFIGURABLE ARCHITECTURE WITH EMERGING NONVOLATILE MEMORY

Author: Liu Xiaobin
Publication venue: ScholarWorks@UMass Amherst
Publication date: 18/03/2015
Field of study

With the rapid growth in consumer electronics, people expect thin, smart and powerful devices, e.g. Google Glass and other wearable devices. However, as portable electronic products become smaller, energy consumption becomes an issue that limits the development of portable systems due to battery lifetime. In general, simply reducing device size cannot fully address the energy issue. To tackle this problem, we propose an on-chip interconnect infrastructure and pro- gram storage structure for a coarse-grained reconfigurable architecture (CGRA) with emerging non-volatile embedded memory (MRAM). The interconnect is composed of a matrix of time-multiplexed switchboxes which can be dynamically reconfigured with the goal of energy reduction. The number of processors performing computation can also be adapted. The use of MRAM provides access to high-density storage and lower memory energy consumption versus more standard SRAM technologies. The combination of CGRA, MRAM, and flexible on-chip interconnection is considered for signal processing. This application domain is of interest based on its time-varying computing demands. To evaluate CGRA architectural features, prototype architectures have been pro- totyped in a field-programmable gate array (FPGA). Measurements of energy, power, instruction count, and execution time performance are considered for a scalable num- ber of processors. Applications such as adaptive Viterbi decoding and Reed Solomon coding are used for evaluation. To complete this thesis, a time-scheduled switchbox was integrated into our CGRA model. This model was prototyped on an FPGA. It is shown that energy consumption can be reduced by about 30% if dynamic design reconfiguration is performed

ScholarWorks@UMass Amherst