20,795 research outputs found

    Prosodic-Enhanced Siamese Convolutional Neural Networks for Cross-Device Text-Independent Speaker Verification

    Full text link
    In this paper a novel cross-device text-independent speaker verification architecture is proposed. Majority of the state-of-the-art deep architectures that are used for speaker verification tasks consider Mel-frequency cepstral coefficients. In contrast, our proposed Siamese convolutional neural network architecture uses Mel-frequency spectrogram coefficients to benefit from the dependency of the adjacent spectro-temporal features. Moreover, although spectro-temporal features have proved to be highly reliable in speaker verification models, they only represent some aspects of short-term acoustic level traits of the speaker's voice. However, the human voice consists of several linguistic levels such as acoustic, lexicon, prosody, and phonetics, that can be utilized in speaker verification models. To compensate for these inherited shortcomings in spectro-temporal features, we propose to enhance the proposed Siamese convolutional neural network architecture by deploying a multilayer perceptron network to incorporate the prosodic, jitter, and shimmer features. The proposed end-to-end verification architecture performs feature extraction and verification simultaneously. This proposed architecture displays significant improvement over classical signal processing approaches and deep algorithms for forensic cross-device speaker verification.Comment: Accepted in 9th IEEE International Conference on Biometrics: Theory, Applications, and Systems (BTAS 2018

    Goldilocks Forgetting in Cross-Situational Learning

    Get PDF
    Given that there is referential uncertainty (noise) when learning words, to what extent can forgetting filter some of that noise out, and be an aid to learning? Using a Cross Situational Learning model we find a U-shaped function of errors indicative of a "Goldilocks" zone of forgetting: an optimum store-loss ratio that is neither too aggressive nor too weak, but just the right amount to produce better learning outcomes. Forgetting acts as a high-pass filter that actively deletes (part of) the referential ambiguity noise, retains intended referents, and effectively amplifies the signal. The model achieves this performance without incorporating any specific cognitive biases of the type proposed in the constraints and principles account, and without any prescribed developmental changes in the underlying learning mechanism. Instead we interpret the model performance as more of a by-product of exposure to input, where the associative strengths in the lexicon grow as a function of linguistic experience in combination with memory limitations. The result adds a mechanistic explanation for the experimental evidence on spaced learning and, more generally, advocates integrating domain-general aspects of cognition, such as memory, into the language acquisition process

    Spatial evolution of human dialects

    Get PDF
    The geographical pattern of human dialects is a result of history. Here, we formulate a simple spatial model of language change which shows that the final result of this historical evolution may, to some extent, be predictable. The model shows that the boundaries of language dialect regions are controlled by a length minimizing effect analogous to surface tension, mediated by variations in population density which can induce curvature, and by the shape of coastline or similar borders. The predictability of dialect regions arises because these effects will drive many complex, randomized early states toward one of a smaller number of stable final configurations. The model is able to reproduce observations and predictions of dialectologists. These include dialect continua, isogloss bundling, fanning, the wave-like spread of dialect features from cities, and the impact of human movement on the number of dialects that an area can support. The model also provides an analytical form for S\'{e}guy's Curve giving the relationship between geographical and linguistic distance, and a generalisation of the curve to account for the presence of a population centre. A simple modification allows us to analytically characterize the variation of language use by age in an area undergoing linguistic change
    • …
    corecore