1,285 research outputs found

    Characterization of Speech Recognition Systems on GPU Architectures

    Get PDF
    This master thesis characterizes the performance and energy bottlenecks of speech recognition systems when running on modern GPU, with the aim of providing useful information for designing future GPU architectures, as well as proposing a GPU configuration more well-suited for speech recognition

    Domain specific high performance reconfigurable architecture for a communication platform

    Get PDF

    A long constraint length VLSI Viterbi decoder for the DSN

    Get PDF
    A Viterbi decoder, capable of decoding convolutional codes with constraint lengths up to 15, is under development for the Deep Space Network (DSN). The objective is to complete a prototype of this decoder by late 1990, and demonstrate its performance using the (15, 1/4) encoder in Galileo. The decoder is expected to provide 1 to 2 dB improvement in bit SNR, compared to the present (7, 1/2) code and existing Maximum Likelihood Convolutional Decoder (MCD). The decoder will be fully programmable for any code up to constraint length 15, and code rate 1/2 to 1/6. The decoder architecture and top-level design are described

    A low-power, high-performance speech recognition accelerator

    Get PDF
    © 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Automatic Speech Recognition (ASR) is becoming increasingly ubiquitous, especially in the mobile segment. Fast and accurate ASR comes at high energy cost, not being affordable for the tiny power-budgeted mobile devices. Hardware acceleration reduces energy-consumption of ASR systems, while delivering high-performance. In this paper, we present an accelerator for largevocabulary, speaker-independent, continuous speech-recognition. It focuses on the Viterbi search algorithm representing the main bottleneck in an ASR system. The proposed design consists of innovative techniques to improve the memory subsystem, since memory is the main bottleneck for performance and power in these accelerators' design. It includes a prefetching scheme tailored to the needs of ASR systems that hides main memory latency for a large fraction of the memory accesses, negligibly impacting area. Additionally, we introduce a novel bandwidth-saving technique that removes off-chip memory accesses by 20 percent. Finally, we present a power saving technique that significantly reduces the leakage power of the accelerators scratchpad memories, providing between 8.5 and 29.2 percent reduction in entire power dissipation. Overall, the proposed design outperforms implementations running on the CPU by orders of magnitude, and achieves speedups between 1.7x and 5.9x for different speech decoders over a highly optimized CUDA implementation running on Geforce-GTX-980 GPU, while reducing the energy by 123-454x.Peer ReviewedPostprint (author's final draft

    A Viterbi decoder with low-power trace-back memory structure for wireless pervasive communications

    Get PDF
    This paper presents a new trace-back memory structure for Viterbi decoders that reduces power consumption by 63% compared to the conventional RAM based design. Instead of the intensive read and write operations as required in RAM based designs, the new memory is based on an array of registers connected with trace-back signals that decode the output bits on the fly. The structure is used together with appropriate clock and power-aware control signals. Based on a 0.35 /spl mu/m CMOS implementation the trace-back back memory consumes energy of 182 pJ

    Advanced modulation technology development for earth station demodulator applications. Coded modulation system development

    Get PDF
    A jointly optimized coded modulation system is described which was designed, built, and tested by COMSAT Laboratories for NASA LeRC which provides a bandwidth efficiency of 2 bits/s/Hz at an information rate of 160 Mbit/s. A high speed rate 8/9 encoder with a Viterbi decoder and an Octal PSK modem are used to achieve this. The BER performance is approximately 1 dB from the theoretically calculated value for this system at a BER of 5 E-7 under nominal conditions. The system operates in burst mode for downlink applications and tests have demonstrated very little degradation in performance with frequency and level offset. Unique word miss rate measurements were conducted which demonstrate reliable acquisition at low values of Eb/No. Codec self tests have verified the performance of this subsystem in a stand alone mode. The codec is capable of operation at a 200 Mbit/s information rate as demonstrated using a codec test set which introduces noise digitally. The measured performance is within 0.2 dB of the computer simulated predictions. A gate array implementation of the most time critical element of the high speed Viterbi decoder was completed. This gate array add-compare-select chip significantly reduces the power consumption and improves the manufacturability of the decoder. This chip has general application in the implementation of high speed Viterbi decoders

    Compressed television transmission: A market survey

    Get PDF
    NASA's compressed television transmission technology is described, and its potential market is considered; a market that encompasses teleconferencing, remote medical diagnosis, patient monitoring, transit station surveillance, as well as traffic management and control. In addition, current and potential television transmission systems and their costs and potential manufacturers are considered

    A high-speed, low-power interleaved trace-back memory for Viterbi decoder

    Get PDF
    This paper presents a high-speed, low-power trace-back memory structure for a Viterbi decoder. The new memory is based on an array of registers connected with trace-back signals that decode the output bits on the fly. The trace-back memory is internally interleaved such that high-speed characteristic is achieved while low-power consumption is maintained. The structure is used together with appropriate clock and power-aware control signals. The design is 100% portable and is suitable for a SoftIP approach. Based on the AMS 0.35 /spl mu/m CMOS implementation the trace-back memory is estimated to consume energy of 232 pJ, which is 53.6% less than a conventional RAM based design, with a maximum throughput of 1.1 Gbps

    An ultra low-power hardware accelerator for automatic speech recognition

    Get PDF
    Automatic Speech Recognition (ASR) is becoming increasingly ubiquitous, especially in the mobile segment. Fast and accurate ASR comes at a high energy cost which is not affordable for the tiny power budget of mobile devices. Hardware acceleration can reduce power consumption of ASR systems, while delivering high-performance. In this paper, we present an accelerator for large-vocabulary, speaker-independent, continuous speech recognition. It focuses on the Viterbi search algorithm, that represents the main bottleneck in an ASR system. The proposed design includes innovative techniques to improve the memory subsystem, since memory is identified as the main bottleneck for performance and power in the design of these accelerators. We propose a prefetching scheme tailored to the needs of an ASR system that hides main memory latency for a large fraction of the memory accesses with a negligible impact on area. In addition, we introduce a novel bandwidth saving technique that removes 20% of the off-chip memory accesses issued during the Viterbi search. The proposed design outperforms software implementations running on the CPU by orders of magnitude and achieves 1.7x speedup over a highly optimized CUDA implementation running on a high-end Geforce GTX 980 GPU, while reducing by two orders of magnitude (287x) the energy required to convert the speech into text.Peer ReviewedPostprint (author's final draft

    An Iteratively Decodable Tensor Product Code with Application to Data Storage

    Full text link
    The error pattern correcting code (EPCC) can be constructed to provide a syndrome decoding table targeting the dominant error events of an inter-symbol interference channel at the output of the Viterbi detector. For the size of the syndrome table to be manageable and the list of possible error events to be reasonable in size, the codeword length of EPCC needs to be short enough. However, the rate of such a short length code will be too low for hard drive applications. To accommodate the required large redundancy, it is possible to record only a highly compressed function of the parity bits of EPCC's tensor product with a symbol correcting code. In this paper, we show that the proposed tensor error-pattern correcting code (T-EPCC) is linear time encodable and also devise a low-complexity soft iterative decoding algorithm for EPCC's tensor product with q-ary LDPC (T-EPCC-qLDPC). Simulation results show that T-EPCC-qLDPC achieves almost similar performance to single-level qLDPC with a 1/2 KB sector at 50% reduction in decoding complexity. Moreover, 1 KB T-EPCC-qLDPC surpasses the performance of 1/2 KB single-level qLDPC at the same decoder complexity.Comment: Hakim Alhussien, Jaekyun Moon, "An Iteratively Decodable Tensor Product Code with Application to Data Storage
    corecore