20,795 research outputs found
Prosodic-Enhanced Siamese Convolutional Neural Networks for Cross-Device Text-Independent Speaker Verification
In this paper a novel cross-device text-independent speaker verification
architecture is proposed. Majority of the state-of-the-art deep architectures
that are used for speaker verification tasks consider Mel-frequency cepstral
coefficients. In contrast, our proposed Siamese convolutional neural network
architecture uses Mel-frequency spectrogram coefficients to benefit from the
dependency of the adjacent spectro-temporal features. Moreover, although
spectro-temporal features have proved to be highly reliable in speaker
verification models, they only represent some aspects of short-term acoustic
level traits of the speaker's voice. However, the human voice consists of
several linguistic levels such as acoustic, lexicon, prosody, and phonetics,
that can be utilized in speaker verification models. To compensate for these
inherited shortcomings in spectro-temporal features, we propose to enhance the
proposed Siamese convolutional neural network architecture by deploying a
multilayer perceptron network to incorporate the prosodic, jitter, and shimmer
features. The proposed end-to-end verification architecture performs feature
extraction and verification simultaneously. This proposed architecture displays
significant improvement over classical signal processing approaches and deep
algorithms for forensic cross-device speaker verification.Comment: Accepted in 9th IEEE International Conference on Biometrics: Theory,
Applications, and Systems (BTAS 2018
Goldilocks Forgetting in Cross-Situational Learning
Given that there is referential uncertainty (noise) when learning words, to what extent can forgetting filter some of that noise out, and be an aid to learning? Using a Cross Situational Learning model we find a U-shaped function of errors indicative of a "Goldilocks" zone of forgetting: an optimum store-loss ratio that is neither too aggressive nor too weak, but just the right amount to produce better learning outcomes. Forgetting acts as a high-pass filter that actively deletes (part of) the referential ambiguity noise, retains intended referents, and effectively amplifies the signal. The model achieves this performance without incorporating any specific cognitive biases of the type proposed in the constraints and principles account, and without any prescribed developmental changes in the underlying learning mechanism. Instead we interpret the model performance as more of a by-product of exposure to input, where the associative strengths in the lexicon grow as a function of linguistic experience in combination with memory limitations. The result adds a mechanistic explanation for the experimental evidence on spaced learning and, more generally, advocates integrating domain-general aspects of cognition, such as memory, into the language acquisition process
Spatial evolution of human dialects
The geographical pattern of human dialects is a result of history. Here, we
formulate a simple spatial model of language change which shows that the final
result of this historical evolution may, to some extent, be predictable. The
model shows that the boundaries of language dialect regions are controlled by a
length minimizing effect analogous to surface tension, mediated by variations
in population density which can induce curvature, and by the shape of coastline
or similar borders. The predictability of dialect regions arises because these
effects will drive many complex, randomized early states toward one of a
smaller number of stable final configurations. The model is able to reproduce
observations and predictions of dialectologists. These include dialect
continua, isogloss bundling, fanning, the wave-like spread of dialect features
from cities, and the impact of human movement on the number of dialects that an
area can support. The model also provides an analytical form for S\'{e}guy's
Curve giving the relationship between geographical and linguistic distance, and
a generalisation of the curve to account for the presence of a population
centre. A simple modification allows us to analytically characterize the
variation of language use by age in an area undergoing linguistic change
- …