77,905 research outputs found

    Effects of noise suppression and envelope dynamic range compression on the intelligibility of vocoded sentences for a tonal language

    Get PDF
    Vocoder simulation studies have suggested that the carrier signal type employed affects the intelligibility of vocoded speech. The present work further assessed how carrier signal type interacts with additional signal processing, namely, single-channel noise suppression and envelope dynamic range compression, in determining the intelligibility of vocoder simulations. In Experiment 1, Mandarin sentences that had been corrupted by speech spectrum-shaped noise (SSN) or two-talker babble (2TB) were processed by one of four single-channel noise-suppression algorithms before undergoing tone-vocoded (TV) or noise-vocoded (NV) processing. In Experiment 2, dynamic ranges of multiband envelope waveforms were compressed by scaling of the mean-removed envelope waveforms with a compression factor before undergoing TV or NV processing. TV Mandarin sentences yielded higher intelligibility scores with normal-hearing (NH) listeners than did noise-vocoded sentences. The intelligibility advantage of noise-suppressed vocoded speech depended on the masker type (SSN vs 2TB). NV speech was more negatively influenced by envelope dynamic range compression than was TV speech. These findings suggest that an interactional effect exists between the carrier signal type employed in the vocoding process and envelope distortion caused by signal processing

    Learning to detect dysarthria from raw speech

    Full text link
    Speech classifiers of paralinguistic traits traditionally learn from diverse hand-crafted low-level features, by selecting the relevant information for the task at hand. We explore an alternative to this selection, by learning jointly the classifier, and the feature extraction. Recent work on speech recognition has shown improved performance over speech features by learning from the waveform. We extend this approach to paralinguistic classification and propose a neural network that can learn a filterbank, a normalization factor and a compression power from the raw speech, jointly with the rest of the architecture. We apply this model to dysarthria detection from sentence-level audio recordings. Starting from a strong attention-based baseline on which mel-filterbanks outperform standard low-level descriptors, we show that learning the filters or the normalization and compression improves over fixed features by 10% absolute accuracy. We also observe a gain over OpenSmile features by learning jointly the feature extraction, the normalization, and the compression factor with the architecture. This constitutes a first attempt at learning jointly all these operations from raw audio for a speech classification task.Comment: 5 pages, 3 figures, submitted to ICASS

    Adapting End-to-End Speech Recognition for Readable Subtitles

    Full text link
    Automatic speech recognition (ASR) systems are primarily evaluated on transcription accuracy. However, in some use cases such as subtitling, verbatim transcription would reduce output readability given limited screen size and reading time. Therefore, this work focuses on ASR with output compression, a task challenging for supervised approaches due to the scarcity of training data. We first investigate a cascaded system, where an unsupervised compression model is used to post-edit the transcribed speech. We then compare several methods of end-to-end speech recognition under output length constraints. The experiments show that with limited data far less than needed for training a model from scratch, we can adapt a Transformer-based ASR model to incorporate both transcription and compression capabilities. Furthermore, the best performance in terms of WER and ROUGE scores is achieved by explicitly modeling the length constraints within the end-to-end ASR system.Comment: IWSLT 202

    A user's guide for the signal processing software for image and speech compression developed in the Communications and Signal Processing Laboratory (CSPL), version 1

    Get PDF
    A complete documentation of the software developed in the Communication and Signal Processing Laboratory (CSPL) during the period of July 1985 to March 1986 is provided. Utility programs and subroutines that were developed for a user-friendly image and speech processing environment are described. Additional programs for data compression of image and speech type signals are included. Also, programs for the zero-memory and block transform quantization in the presence of channel noise are described. Finally, several routines for simulating the perfromance of image compression algorithms are included

    Speech Compression Using Discrete Wavelet Transform

    Get PDF
    Speech compression is an area of digital processing that is focusing on reducing bit rate of the speech signal for transmission or storage without significant loss of quality. Wavelet transform has been recently proposed for signal analysis. Speech signal compression using wavelet transform is given a considerable attention in this thesis. Speech coding is a lossy scheme and is implemented here to compress onedimensional speech signal. Basically, this scheme consists of four operations which are the transform, threshold techniques (by level and global threshold), quantization, and entropy encoding operations. The reconstruction of the compressed signal as well as the detailed steps needed are discussed.The performance of wavelet compression is compared against linear Productive Coding and Global System for Mobile Communication (GSM) algorithms using SNR, PSNR, NRMSE and compression ratio. Software simulating the lossy compression scheme is developed using Matlab 6. This software provides the basic speech analysis as well as the compression and decompression operations. The results obtained show reasonably high compression ratio and good signal quality

    Speech Development by Imitation

    Get PDF
    The Double Cone Model (DCM) is a model of how the brain transforms sensory input to motor commands through successive stages of data compression and expansion. We have tested a subset of the DCM on speech recognition, production and imitation. The experiments show that the DCM is a good candidate for an artificial speech processing system that can develop autonomously. We show that the DCM can learn a repertoire of speech sounds by listening to speech input. It is also able to link the individual elements of speech to sequences that can be recognized or reproduced, thus allowing the system to imitate spoken language

    Voice technology and BBN

    Get PDF
    The following research was discussed: (1) speech signal processing; (2) automatic speech recognition; (3) continuous speech understanding; (4) speaker recognition; (5) speech compression; (6) subjective and objective evaluation of speech communication system; (7) measurement of the intelligibility and quality of speech when degraded by noise or other masking stimuli; (8) speech synthesis; (9) instructional aids for second-language learning and for training of the deaf; and (10) investigation of speech correlates of psychological stress. Experimental psychology, control systems, and human factors engineering, which are often relevant to the proper design and operation of speech systems are described

    Level discrimination of speech sounds by hearing-impaired individuals with and without hearing amplification

    Get PDF
    Objectives: The current study was designed to see how hearing-impaired individuals judge level differences between speech sounds with and without hearing amplification. It was hypothesized that hearing aid compression should adversely affect the user's ability to judge level differences. Design: Thirty-eight hearing-impaired participants performed an adaptive tracking procedure to determine their level-discrimination thresholds for different word and sentence tokens, as well as speech-spectrum noise, with and without their hearing aids. Eight normal-hearing participants performed the same task for comparison. Results: Level discrimination for different word and sentence tokens was more difficult than the discrimination of stationary noises. Word level discrimination was significantly more difficult than sentence level discrimination. There were no significant differences, however, between mean performance with and without hearing aids and no correlations between performance and various hearing aid measurements. Conclusions: There is a clear difficulty in judging the level differences between words or sentences relative to differences between broadband noises, but this difficulty was found for both hearing-impaired and normal-hearing individuals and had no relation to hearing aid compression measures. The lack of a clear adverse effect of hearing aid compression on level discrimination is suggested to be due to the low effective compression ratios of currently fit hearing aids
    corecore