37 research outputs found

    Sparse Gammatone Signal Model Predicts Perceived Noise Intrusiveness

    Get PDF
    Is it possible to predict the intrusiveness of background noise in speech signals as perceived by humans? Such a question is important to the automatic evaluation of speech enhancement systems, including those designed for new wideband speech telephony, and the goal of a future ITU quality assessment standard. In this paper, we show that this is possible by modeling the encoding of the noise signal at the auditory nerve. Indeed, recent research suggests that sparse signal representations may be indicative of the encoding process in the auditory system, making them interesting for modeling human sound perception. Here, we further explore this hypothesis, and decompose background noise in the speech signal into a sparse combination of gammatone functions, resulting in a sparse, physiologically grounded representation of the noise. We then show that the number of gammatones required to encode the noise is directly correlated with the perception of noise intrusiveness. Furthermore, we show that an established measure of noise intrusiveness based on this new representation outperforms the same measure based on the traditional loudness model

    "Can you hear me now?":Automatic assessment of background noise intrusiveness and speech intelligibility in telecommunications

    Get PDF
    This thesis deals with signal-based methods that predict how listeners perceive speech quality in telecommunications. Such tools, called objective quality measures, are of great interest in the telecommunications industry to evaluate how new or deployed systems affect the end-user quality of experience. Two widely used measures, ITU-T Recommendations P.862 âPESQâ and P.863 âPOLQAâ, predict the overall listening quality of a speech signal as it would be rated by an average listener, but do not provide further insight into the composition of that score. This is in contrast to modern telecommunication systems, in which components such as noise reduction or speech coding process speech and non-speech signal parts differently. Therefore, there has been a growing interest for objective measures that assess different quality features of speech signals, allowing for a more nuanced analysis of how these components affect quality. In this context, the present thesis addresses the objective assessment of two quality features: background noise intrusiveness and speech intelligibility. The perception of background noise is investigated with newly collected datasets, including signals that go beyond the traditional telephone bandwidth, as well as Lombard (effortful) speech. We analyze listener scores for noise intrusiveness, and their relation to scores for perceived speech distortion and overall quality. We then propose a novel objective measure of noise intrusiveness that uses a sparse representation of noise as a model of high-level auditory coding. The proposed approach is shown to yield results that highly correlate with listener scores, without requiring training data. With respect to speech intelligibility, we focus on the case where the signal is degraded by strong background noises or very low bit-rate coding. Considering that listeners use prior linguistic knowledge in assessing intelligibility, we propose an objective measure that works at the phoneme level and performs a comparison of phoneme class-conditional probability estimations. The proposed approach is evaluated on a large corpus of recordings from public safety communication systems that use low bit-rate coding, and further extended to the assessment of synthetic speech, showing its applicability to a large range of distortion types. The effectiveness of both measures is evaluated with standardized performance metrics, using corpora that follow established recommendations for subjective listening tests

    The perceptual flow of phonetic feature processing

    Get PDF

    Cross-spectral synergy and consonant identification (A)

    Get PDF
    corecore