77,905 research outputs found
Effects of noise suppression and envelope dynamic range compression on the intelligibility of vocoded sentences for a tonal language
Vocoder simulation studies have suggested that the carrier signal type employed affects the intelligibility of vocoded speech. The present work further assessed how carrier signal type interacts with additional signal processing, namely, single-channel noise suppression and envelope dynamic range compression, in determining the intelligibility of vocoder simulations. In Experiment 1, Mandarin sentences that had been corrupted by speech spectrum-shaped noise (SSN) or two-talker babble (2TB) were processed by one of four single-channel noise-suppression algorithms before undergoing tone-vocoded (TV) or noise-vocoded (NV) processing. In Experiment 2, dynamic ranges of multiband envelope waveforms were compressed by scaling of the mean-removed envelope waveforms with a compression factor before undergoing TV or NV processing. TV Mandarin sentences yielded higher intelligibility scores with normal-hearing (NH) listeners than did noise-vocoded sentences. The intelligibility advantage of noise-suppressed vocoded speech depended on the masker type (SSN vs 2TB). NV speech was more negatively influenced by envelope dynamic range compression than was TV speech. These findings suggest that an interactional effect exists between the carrier signal type employed in the vocoding process and envelope distortion caused by signal processing
Learning to detect dysarthria from raw speech
Speech classifiers of paralinguistic traits traditionally learn from diverse
hand-crafted low-level features, by selecting the relevant information for the
task at hand. We explore an alternative to this selection, by learning jointly
the classifier, and the feature extraction. Recent work on speech recognition
has shown improved performance over speech features by learning from the
waveform. We extend this approach to paralinguistic classification and propose
a neural network that can learn a filterbank, a normalization factor and a
compression power from the raw speech, jointly with the rest of the
architecture. We apply this model to dysarthria detection from sentence-level
audio recordings. Starting from a strong attention-based baseline on which
mel-filterbanks outperform standard low-level descriptors, we show that
learning the filters or the normalization and compression improves over fixed
features by 10% absolute accuracy. We also observe a gain over OpenSmile
features by learning jointly the feature extraction, the normalization, and the
compression factor with the architecture. This constitutes a first attempt at
learning jointly all these operations from raw audio for a speech
classification task.Comment: 5 pages, 3 figures, submitted to ICASS
Adapting End-to-End Speech Recognition for Readable Subtitles
Automatic speech recognition (ASR) systems are primarily evaluated on
transcription accuracy. However, in some use cases such as subtitling, verbatim
transcription would reduce output readability given limited screen size and
reading time. Therefore, this work focuses on ASR with output compression, a
task challenging for supervised approaches due to the scarcity of training
data. We first investigate a cascaded system, where an unsupervised compression
model is used to post-edit the transcribed speech. We then compare several
methods of end-to-end speech recognition under output length constraints. The
experiments show that with limited data far less than needed for training a
model from scratch, we can adapt a Transformer-based ASR model to incorporate
both transcription and compression capabilities. Furthermore, the best
performance in terms of WER and ROUGE scores is achieved by explicitly modeling
the length constraints within the end-to-end ASR system.Comment: IWSLT 202
A user's guide for the signal processing software for image and speech compression developed in the Communications and Signal Processing Laboratory (CSPL), version 1
A complete documentation of the software developed in the Communication and Signal Processing Laboratory (CSPL) during the period of July 1985 to March 1986 is provided. Utility programs and subroutines that were developed for a user-friendly image and speech processing environment are described. Additional programs for data compression of image and speech type signals are included. Also, programs for the zero-memory and block transform quantization in the presence of channel noise are described. Finally, several routines for simulating the perfromance of image compression algorithms are included
Speech Compression Using Discrete Wavelet Transform
Speech compression is an area of digital processing that is focusing on reducing
bit rate of the speech signal for transmission or storage without significant loss of
quality. Wavelet transform has been recently proposed for signal analysis. Speech signal
compression using wavelet transform is given a considerable attention in this thesis.
Speech coding is a lossy scheme and is implemented here to compress onedimensional
speech signal. Basically, this scheme consists of four operations which are
the transform, threshold techniques (by level and global threshold), quantization, and
entropy encoding operations. The reconstruction of the compressed signal as well as the
detailed steps needed are discussed.The performance of wavelet compression is compared against linear Productive
Coding and Global System for Mobile Communication (GSM) algorithms using SNR,
PSNR, NRMSE and compression ratio.
Software simulating the lossy compression scheme is developed using Matlab 6.
This software provides the basic speech analysis as well as the compression and
decompression operations. The results obtained show reasonably high compression ratio
and good signal quality
Speech Development by Imitation
The Double Cone Model (DCM) is a model
of how the brain transforms sensory input to
motor commands through successive stages of
data compression and expansion. We have
tested a subset of the DCM on speech recognition, production and imitation. The experiments show that the DCM is a good candidate
for an artificial speech processing system that
can develop autonomously. We show that the
DCM can learn a repertoire of speech sounds
by listening to speech input. It is also able to
link the individual elements of speech to sequences that can be recognized or reproduced,
thus allowing the system to imitate spoken
language
Voice technology and BBN
The following research was discussed: (1) speech signal processing; (2) automatic speech recognition; (3) continuous speech understanding; (4) speaker recognition; (5) speech compression; (6) subjective and objective evaluation of speech communication system; (7) measurement of the intelligibility and quality of speech when degraded by noise or other masking stimuli; (8) speech synthesis; (9) instructional aids for second-language learning and for training of the deaf; and (10) investigation of speech correlates of psychological stress. Experimental psychology, control systems, and human factors engineering, which are often relevant to the proper design and operation of speech systems are described
Level discrimination of speech sounds by hearing-impaired individuals with and without hearing amplification
Objectives: The current study was designed to see how hearing-impaired individuals judge level differences between speech sounds with and without hearing amplification. It was hypothesized that hearing aid compression should adversely affect the user's ability to judge level differences.
Design: Thirty-eight hearing-impaired participants performed an adaptive tracking procedure to determine their level-discrimination thresholds for different word and sentence tokens, as well as speech-spectrum noise, with and without their hearing aids. Eight normal-hearing participants performed the same task for comparison.
Results: Level discrimination for different word and sentence tokens was more difficult than the discrimination of stationary noises. Word level discrimination was significantly more difficult than sentence level discrimination. There were no significant differences, however, between mean performance with and without hearing aids and no correlations between performance and various hearing aid measurements.
Conclusions: There is a clear difficulty in judging the level differences between words or sentences relative to differences between broadband noises, but this difficulty was found for both hearing-impaired and normal-hearing individuals and had no relation to hearing aid compression measures. The lack of a clear adverse effect of hearing aid compression on level discrimination is suggested to be due to the low effective compression ratios of currently fit hearing aids
- …