1,752 research outputs found
Estimation of Severity of Speech Disability through Speech Envelope
In this paper, envelope detection of speech is discussed to distinguish the
pathological cases of speech disabled children. The speech signal samples of
children of age between five to eight years are considered for the present
study. These speech signals are digitized and are used to determine the speech
envelope. The envelope is subjected to ratio mean analysis to estimate the
disability. This analysis is conducted on ten speech signal samples which are
related to both place of articulation and manner of articulation. Overall
speech disability of a pathological subject is estimated based on the results
of above analysis.Comment: 8 pages,4 Figures,Signal & Image Processing Journal AIRC
A robust speech enhancement method in noisy environments
Speech enhancement aims to eliminate or reduce undesirable noises and distortions, this processing should keep features of the speech to enhance the quality and intelligibility of degraded speech signals. In this study, we investigated a combined approach using single-frequency filtering (SFF) and a modified spectral subtraction method to enhance single-channel speech. The SFF method involves dividing the speech signal into uniform subband envelopes, and then performing spectral over-subtraction on each envelope. A smoothing parameter, determined by the a-posteriori signal-to-noise ratio (SNR), is used to estimate and update the noise without the need for explicitly detecting silence. To evaluate the performance of our algorithm, we employed objective measures such as segmental SNR (segSNR), extended short-term objective intelligibility (ESTOI), and perceptual evaluation of speech quality (PESQ). We tested our algorithm with various types of noise at different SNR levels and achieved results ranging from 4.24 to 15.41 for segSNR, 0.57 to 0.97 for ESTOI, and 2.18 to 4.45 for PESQ. Compared to other standard and existing speech enhancement methods, our algorithm produces better results and performs well in reducing undesirable noises
Improving the robustness of the usual fbe-based asr front-end
All speech recognition systems require some form of signal representation that parametrically models the
temporal evolution of the spectral envelope. Current parameterizations involve, either explicitly or implicitly, a
set of energies from frequency bands which are often distributed in a mel scale. The computation of those filterbank
energies (FBE) always includes smoothing of basic spectral measurements and non-linear amplitude
compression. A variety of linear transformations are typically applied to this time-frequency representation prior
to the Hidden Markov Model (HMM) pattern-matching stage of recognition. In the paper, we will discuss some
robustness issues involved in both the computation of the FBEs and the posterior linear transformations,
presenting alternative techniques that can improve robustness in additive noise conditions. In particular, the root
non-linearity, a voicing-dependent FBE computation technique and a time&frequency filtering (tiffing)
technique will be considered. Recognition results for the Aurora database will be shown to illustrate the potential
application of these alternatives techniques for enhancing the robustness of speech recognition systems.Peer ReviewedPostprint (published version
- …