1,080 research outputs found

    KAVUAKA: a low-power application-specific processor architecture for digital hearing aids

    Get PDF
    The power consumption of digital hearing aids is very restricted due to their small physical size and the available hardware resources for signal processing are limited. However, there is a demand for more processing performance to make future hearing aids more useful and smarter. Future hearing aids should be able to detect, localize, and recognize target speakers in complex acoustic environments to further improve the speech intelligibility of the individual hearing aid user. Computationally intensive algorithms are required for this task. To maintain acceptable battery life, the hearing aid processing architecture must be highly optimized for extremely low-power consumption and high processing performance.The integration of application-specific instruction-set processors (ASIPs) into hearing aids enables a wide range of architectural customizations to meet the stringent power consumption and performance requirements. In this thesis, the application-specific hearing aid processor KAVUAKA is presented, which is customized and optimized with state-of-the-art hearing aid algorithms such as speaker localization, noise reduction, beamforming algorithms, and speech recognition. Specialized and application-specific instructions are designed and added to the baseline instruction set architecture (ISA). Among the major contributions are a multiply-accumulate (MAC) unit for real- and complex-valued numbers, architectures for power reduction during register accesses, co-processors and a low-latency audio interface. With the proposed MAC architecture, the KAVUAKA processor requires 16 % less cycles for the computation of a 128-point fast Fourier transform (FFT) compared to related programmable digital signal processors. The power consumption during register file accesses is decreased by 6 %to 17 % with isolation and by-pass techniques. The hardware-induced audio latency is 34 %lower compared to related audio interfaces for frame size of 64 samples.The final hearing aid system-on-chip (SoC) with four KAVUAKA processor cores and ten co-processors is integrated as an application-specific integrated circuit (ASIC) using a 40 nm low-power technology. The die size is 3.6 mm2. Each of the processors and co-processors contains individual customizations and hardware features with a varying datapath width between 24-bit to 64-bit. The core area of the 64-bit processor configuration is 0.134 mm2. The processors are organized in two clusters that share memory, an audio interface, co-processors and serial interfaces. The average power consumption at a clock speed of 10 MHz is 2.4 mW for SoC and 0.6 mW for the 64-bit processor.Case studies with four reference hearing aid algorithms are used to present and evaluate the proposed hardware architectures and optimizations. The program code for each processor and co-processor is generated and optimized with evolutionary algorithms for operation merging,instruction scheduling and register allocation. The KAVUAKA processor architecture is com-pared to related processor architectures in terms of processing performance, average power consumption, and silicon area requirements

    Residue Number Systems: a Survey

    Get PDF

    Implementation of Carrier Phase Recovery Circuits for Optical Communication

    Get PDF
    Fiber-optic links form a vital part of our increasingly connected world, and as the number of Internet users and the network traffic increases, reducing the power dissipation of these links becomes more important. A considerable part of the total link power is dissipated in the digital signal processing (DSP) subsystems, which show a growing complexity as more advanced modulation formats are introduced. Since DSP designers can no longer take reduced power dissipation with each new CMOS process node for granted, the design of more efficient DSPalgorithms in conjunction with circuit implementation strategies focused on power efficiency is required.One part of the DSP for a coherent fiber-optic link is the carrier phase recovery (CPR) unit, which can account for a significant portion of the DSP power dissipation, especially for shorter links. A wide range of CPR algorithms is available, but reliable estimates of their power efficiency is missing, making accurate comparisons impossible. Furthermore, much of the current literature does not account for the limited precision arithmetic of the DSP.In this thesis, we develop circuit implementations based on a range of suggested CPR algorithms, focusing on power efficiency. These circuits allow us to contrast different CPR solutions based not only on power dissipation, but also on the quality of the phase estimation, including fixed-point arithmetic aspects. We also show how different parameter settings affect the power efficiency and the implementation penalty. Additionally, the thesis includes a description of our field-programmable gate-array fiber-emulation environment, which can be used to study rare phenomena in DSP implementations, or to reach very low bit-error rates. We use this environment to evaluate the cycle-slip probability of a CPR implementation

    Studies on Implementation of . . . High Throughput and Low Power Consumption

    Get PDF
    In this thesis we discuss design and implementation of frequency selective digital filters with high throughput and low power consumption. The thesis includes proposed arithmetic transformations of lattice wave digital filters that aim at increasing the throughput and reduce the power consumption of the filter implementation. The thesis also includes two case studies where digital filters with high throughput and low power consumption are required. A method for obtaining high throughput as well as reduced power consumption of digital filters is arithmetic transformation of the filter structure. In this thesis arithmetic transformations of first- and second-order Richards’ allpass sections composed by symmetric two-port adaptors and implemented using carry-save arithmetic are proposed. Such filter sections can be used for implementation of lattice wave digital filters and bireciprocal lattice wave digital filters. The latter structures are efficient for implementation of interpolators and decimators by factors of two. Th

    Serial-data computation in VLSI

    Get PDF

    Towards adaptive balanced computing (ABC) using reconfigurable functional caches (RFCs)

    Get PDF
    The general-purpose computing processor performs a wide range of functions. Although the performance of general-purpose processors has been steadily increasing, certain software technologies like multimedia and digital signal processing applications demand ever more computing power. Reconfigurable computing has emerged to combine the versatility of general-purpose processors with the customization ability of ASICs. The basic premise of reconfigurability is to provide better performance and higher computing density than fixed configuration processors. Most of the research in reconfigurable computing is dedicated to on-chip functional logic. If computing resources are adaptable to the computing requirement, the maximum performance can be achieved. To overcome the gap between processor and memory technology, the size of on-chip cache memory has been consistently increasing. The larger cache memory capacity, though beneficial in general, does not guarantee a higher performance for all the applications as they may not utilize all of the cache efficiently. To utilize on-chip resources effectively and to accelerate the performance of multimedia applications specifically, we propose a new architecture---Adaptive Balanced Computing (ABC). ABC uses dynamic resource configuration of on-chip cache memory by integrating Reconfigurable Functional Caches (RFC). RFC can work as a conventional cache or as a specialized computing unit when necessary. In order to convert a cache memory to a computing unit, we include additional logic to embed multi-bit output LUTs into the cache structure. We add the reconfigurability of cache memory to a conventional processor with minimal modification to the load/store microarchitecture and with minimal compiler assistance. ABC architecture utilizes resources more efficiently by reconfiguring the cache memory to computing units dynamically. The area penalty for this reconfiguration is about 50--60% of the memory cell cache array-only area with faster cache access time. In a base array cache (parallel decoding caches), the area penalty is 10--20% of the data array with 1--2% increase in the cache access time. However, we save 27% for FIR and 44% for DCT/IDCT in area with respect to memory cell array cache and about 80% for both applications with respect to base array cache if we were to implement all these units separately (such as ASICs). The simulations with multimedia and DSP applications (DCT/IDCT and FIR/IIR) show that the resource configuration with the RFC speedups ranging from 1.04X to 3.94X in overall applications and from 2.61X to 27.4X in the core computations. The simulations with various parameters indicate that the impact of reconfiguration can be minimized if an appropriate cache organization is selected

    Image Processing Using FPGAs

    Get PDF
    This book presents a selection of papers representing current research on using field programmable gate arrays (FPGAs) for realising image processing algorithms. These papers are reprints of papers selected for a Special Issue of the Journal of Imaging on image processing using FPGAs. A diverse range of topics is covered, including parallel soft processors, memory management, image filters, segmentation, clustering, image analysis, and image compression. Applications include traffic sign recognition for autonomous driving, cell detection for histopathology, and video compression. Collectively, they represent the current state-of-the-art on image processing using FPGAs

    Low power digital signal processing

    Get PDF
    • …
    corecore