20,444 research outputs found

    Text-Independent Automatic Speaker Identification Using Partitioned Neural Networks

    Get PDF
    This dissertation introduces a binary partitioned approach to statistical pattern classification which is applied to talker identification using neural networks. In recent years artificial neural networks have been shown to work exceptionally well for small but difficult pattern classification tasks. However, their application to large tasks (i.e., having more than ten to 20 categories) is limited by a dramatic increase in required training time. The time required to train a single network to perform N-way classification is nearly proportional to the exponential of N. In contrast, the binary partitioned approach requires training times on the order of N2. Besides partitioning, other related issues were investigated such as acoustic feature selection for speaker identification and neural network optimization. The binary partitioned approach was used to develop an automatic speaker identification system for 120 male and 130 female speakers of a standard speech data base. The system performs with 100% accuracy in a text-independent mode when trained with about nine to 14 seconds of speech and tested with six to eight seconds of speech

    The impact of spectrally asynchronous delay on the intelligibility of conversational speech

    Get PDF
    Conversationally spoken speech is rampant with rapidly changing and complex acoustic cues that individuals are able to hear, process, and encode to meaning. For many hearing-impaired listeners, a hearing aid is necessary to hear these spectral and temporal acoustic cues of speech. For listeners with mild-moderate high frequency sensorineural hearing loss, open-fit digital signal processing (DSP) hearing aids are the most common amplification option. Open-fit DSP hearing aids introduce a spectrally asynchronous delay to the acoustic signal by allowing audible low frequency information to pass to the eardrum unimpeded while the aid delivers amplified high frequency sounds to the eardrum that has a delayed onset relative to the natural pathway of sound. These spectrally asynchronous delays may disrupt the natural acoustic pattern of speech. The primary goal of this study is to measure the effect of spectrally asynchronous delay on the intelligibility of conversational speech by normal-hearing and hearing-impaired listeners. A group of normal-hearing listeners (n = 25) and listeners with mild-moderate high frequency sensorineural hearing loss (n = 25) participated in this study. The acoustic stimuli included 200 conversationally-spoken recordings of the low predictability sentences from the revised speech perception in noise test (r-SPIN). These 200 sentences were modified to control for audibility for the hearing-impaired group and so that the acoustic energy above 2 kHz was delayed by either 0 ms (control), 4ms, 8ms, or 32 ms relative to the low frequency energy. The data were analyzed in order to find the effect of each of the four delay conditions on the intelligibility of the final key word of each sentence. Normal-hearing listeners were minimally affected by the asynchronous delay. However, the hearing-impaired listeners were deleteriously affected by increasing amounts of spectrally asynchronous delay. Although the hearing-impaired listeners performed well overall in their perception of conversationally spoken speech in quiet, the intelligibility of conversationally spoken sentences significantly decreased when the delay values were equal to or greater than 4 ms. Therefore, hearing aid manufacturers need to restrict the amount of delay introduced by DSP so that it does not distort the acoustic patterns of conversational speech

    The listening talker: A review of human and algorithmic context-induced modifications of speech

    Get PDF
    International audienceSpeech output technology is finding widespread application, including in scenarios where intelligibility might be compromised - at least for some listeners - by adverse conditions. Unlike most current algorithms, talkers continually adapt their speech patterns as a response to the immediate context of spoken communication, where the type of interlocutor and the environment are the dominant situational factors influencing speech production. Observations of talker behaviour can motivate the design of more robust speech output algorithms. Starting with a listener-oriented categorisation of possible goals for speech modification, this review article summarises the extensive set of behavioural findings related to human speech modification, identifies which factors appear to be beneficial, and goes on to examine previous computational attempts to improve intelligibility in noise. The review concludes by tabulating 46 speech modifications, many of which have yet to be perceptually or algorithmically evaluated. Consequently, the review provides a roadmap for future work in improving the robustness of speech output

    Dynamically Mitigating Data Discrepancy with Balanced Focal Loss for Replay Attack Detection

    Full text link
    It becomes urgent to design effective anti-spoofing algorithms for vulnerable automatic speaker verification systems due to the advancement of high-quality playback devices. Current studies mainly treat anti-spoofing as a binary classification problem between bonafide and spoofed utterances, while lack of indistinguishable samples makes it difficult to train a robust spoofing detector. In this paper, we argue that for anti-spoofing, it needs more attention for indistinguishable samples over easily-classified ones in the modeling process, to make correct discrimination a top priority. Therefore, to mitigate the data discrepancy between training and inference, we propose to leverage a balanced focal loss function as the training objective to dynamically scale the loss based on the traits of the sample itself. Besides, in the experiments, we select three kinds of features that contain both magnitude-based and phase-based information to form complementary and informative features. Experimental results on the ASVspoof2019 dataset demonstrate the superiority of the proposed methods by comparison between our systems and top-performing ones. Systems trained with the balanced focal loss perform significantly better than conventional cross-entropy loss. With complementary features, our fusion system with only three kinds of features outperforms other systems containing five or more complex single models by 22.5% for min-tDCF and 7% for EER, achieving a min-tDCF and an EER of 0.0124 and 0.55% respectively. Furthermore, we present and discuss the evaluation results on real replay data apart from the simulated ASVspoof2019 data, indicating that research for anti-spoofing still has a long way to go.Comment: This work has been accepted by the 25th International Conference on Pattern Recognition (ICPR2020

    Development of rotorcraft interior. Noise control concepts. Phase 1: Definition study

    Get PDF
    A description of helicopter noise, diagnostic techniques for source and path identification, an interior noise prediction model, and a measurement program for model validation are provided

    Evaluation of preprocessors for neural network speaker verification

    Get PDF
    corecore