23 research outputs found
Evaluation of the Importance of Time-Frequency Contributions to Speech Intelligibility in Noise
Recent studies on binary masking techniques make the assumption that each time-frequency (T-F) unit contributes an equal amount to the overall intelligibility of speech. The present study demonstrated that the importance of each T-F unit to speech intelligibility varies in accordance with speech content. Specifically, T-F units are categorized into two classes, speech-present T-F units and speech-absent T-F units. Results indicate that the importance of each speech-present T-F unit to speech intelligibility is highly related to the loudness of its target component, while the importance of each speech-absent T-F unit varies according to the loudness of its masker component. Two types of mask errors are also considered, which include miss and false alarm errors. Consistent with previous work, false alarm errors are shown to be more harmful to speech intelligibility than miss errors when the mixture signal-to-noise ratio (SNR) is below 0 dB. However, the relative importance between the two types of error is conditioned on the SNR level of the input speech signal. Based on these observations, a mask-based objective measure, the loudness weighted hit-false, is proposed for predicting speech intelligibility. The proposed objective measure shows significantly higher correlation with intelligibility compared to two existing mask-based objective measures
A new mask-based objective measure for predicting the intelligibility of binary masked speech
ABSTRACT Mask-based objective speech-intelligibility measures have been successfully proposed for evaluating the performance of binary masking algorithms. These objective measures were computed directly by comparing the estimated binary mask against the ground truth ideal binary mask (IdBM). Most of these objective measures, however, assign equal weight to all time-frequency (T-F) units. In this study, we propose to improve the existing mask-based objective measures by weighting each T-F unit according to its target or masker loudness. The proposed objective measure shows significantly better performance than two other existing mask-based objective measures
Preference for 20-40 ms window duration in speech analysis
In speech processing the short-time magnitude spectrum is believed to contain most of the information about speech intelligibility and it is normally computed using the short-time Fourier transform over 20-40 ms window duration. In this paper, we investigate the effect of the analysis window duration on speech intelligibility in a systematic way. For this purpose, both subjective and objective experiments are conducted. The subjective experiment is in a form of a consonant recognition task by human listeners, whereas the objective experiment is in a form of an automatic speech recognition (ASR) task. In our experiments various analysis window durations are investigated. For the subjective experiment we construct speech stimuli based purely on the short-time magnitude information. The results of the subjective experiment show that the analysis window duration of 15–35 ms is the optimum choice when speech is reconstructed from the short-time magnitude spectrum. Similar conclusions were made based on the results of the objective (ASR) experiment. The ASR results were found to have statistically significant correlation with the subjective intelligibility results. Index Terms — Analysis window duration, magnitude spectrum, automatic speech recognition, speech intelligibility 1
The Effect of the Additivity Assumption on Time and Frequency Domain Wiener Filtering for Speech Enhancement
In this paper, we investigate the validity of the common assumption made in Wiener filtering that the clean speech and noise signals are uncorrelated under short-time analysis typically used for speech enhancement. In order to achieve this we have performed speech enhancement experiments, where speech corrupted by additive white Gaussian noise is enhanced by a Wiener filter designed in the time as well as the frequency domains. Results of oracle-style experiments confirm that the inclusion of the additivity assumption in Wiener filtering results in negligible degradation of enhanced speech quality. Informal listening tests show that the background noise resulting from time domain enhancement to be more tolerable than the background noise resulting from frequency domain framework. Index Terms: Wiener filtering, speech enhancement 1
A mutation in mouse Pak1ip1 causes orofacial clefting while human PAK1IP1 maps to 6p24 translocation breaking points associated with orofacial clefting.
Orofacial clefts are among the most common birth defects and result in an improper formation of the mouth or the roof of the mouth. Monosomy of the distal aspect of human chromosome 6p has been recognized as causative in congenital malformations affecting the brain and cranial skeleton including orofacial clefts. Among the genes located in this region is PAK1IP1, which encodes a nucleolar factor involved in ribosomal stress response. Here, we report the identification of a novel mouse line that carries a point mutation in the Pak1ip1 gene. Homozygous mutants show severe developmental defects of the brain and craniofacial skeleton, including a median orofacial cleft. We recovered this line of mice in a forward genetic screen and named the allele manta-ray (mray). Our findings prompted us to examine human cases of orofacial clefting for mutations in the PAK1IP1 gene or association with the locus. No deleterious variants in the PAK1IP1 gene coding region were recognized, however, we identified a borderline association effect for SNP rs494723 suggesting a possible role for the PAK1IP1 gene in human orofacial clefting