37 research outputs found

    Australian accent-based speaker classification

    Get PDF

    Automatic classification of speaker characteristics

    Get PDF

    VoxCeleb2: Deep Speaker Recognition

    Full text link
    The objective of this paper is speaker recognition under noisy and unconstrained conditions. We make two key contributions. First, we introduce a very large-scale audio-visual speaker recognition dataset collected from open-source media. Using a fully automated pipeline, we curate VoxCeleb2 which contains over a million utterances from over 6,000 speakers. This is several times larger than any publicly available speaker recognition dataset. Second, we develop and compare Convolutional Neural Network (CNN) models and training strategies that can effectively recognise identities from voice under various conditions. The models trained on the VoxCeleb2 dataset surpass the performance of previous works on a benchmark dataset by a significant margin.Comment: To appear in Interspeech 2018. The audio-visual dataset can be downloaded from http://www.robots.ox.ac.uk/~vgg/data/voxceleb2 . 1806.05622v2: minor fixes; 5 page

    Syllable frequency effects in immediate but not delayed syllable naming in English

    Get PDF
    <p>Syllable frequency effects in production tasks are interpreted as evidence that speakers retrieve precompiled articulatory programs for high frequency syllables from a mental syllabary. They have not been found reliably in English, nor isolated to the phonetic encoding processes during which the syllabary is thought to be accessed. In this experiment, 48 participants produced matched high- and novel/low-frequency syllables in a near-replication of Laganaro and Alario’s [(2006) On the locus of the syllable frequency effect in speech production. <i>Journal of Memory and Language, 55</i>(2), 198–196, <a href="http://dx.doi.org/10.1016/j.jml.2006.05.001" target="_blank">http://dx.doi.org/10.1016/j.jml.2006.05.001</a>] production conditions: immediate naming, naming following an unfilled delay, and naming after delay filled by concurrent articulation. Immediate naming was faster for high frequency syllables, demonstrating a robust syllable frequency effect in English. There was no high frequency advantage in either delayed naming condition, leaving open the question of whether syllable frequency effects arise during phonological or phonetic encoding.</p

    Estimation of Prior Probabilities in Speaker Recognition

    Get PDF

    Temporal Hidden Markov Models

    Get PDF

    The Automatic Neuroscientist: automated experimental design with real-time fMRI

    Get PDF
    A standard approach in functional neuroimaging explores how a particular cognitive task activates a set of brain regions (one task-to-many regions mapping). Importantly though, the same neural system can be activated by inherently different tasks. To date, there is no approach available that systematically explores whether and how distinct tasks probe the same neural system (many tasks-to-region mapping). In our work, presented here we propose an alternative framework, the Automatic Neuroscientist, which turns the typical fMRI approach on its head. We use real-time fMRI in combination with state-of-the-art optimisation techniques to automatically design the optimal experiment to evoke a desired target brain state. Here, we present two proof-of-principle studies involving visual and auditory stimuli. The data demonstrate this closed-loop approach to be very powerful, hugely speeding up fMRI and providing an accurate estimation of the underlying relationship between stimuli and neural responses across an extensive experimental parameter space. Finally, we detail four scenarios where our approach can be applied, suggesting how it provides a novel description of how cognition and the brain interrelate.Comment: 22 pages, 7 figures, work presented at OHBM 201

    Fuzzy feature weighting techniques for vector quantisation

    Get PDF

    Effects of Gender and Regional Dialect on Uptalk in the American Midwest

    Get PDF
    This study compares the distribution of uptalk contours across male and female speakers of two Midwestern dialects of American English. Sixteen speakers, evenly divided between dialect and gender, were recorded reading ten passages in plain lab speech. The contours defined as uptalk in this study were H* H-H%, H* L-H%, L* H-H%, and L* L-H%. The results indicate that neither gender nor dialect had an effect on overall uptalk frequency, which could reflect prosodic similarities in the two dialects. The null results for gender are particularly surprising because they run contrary to many of the previous studies on uptalk, which found that women use uptalk more than men. Gender and dialect also had no significant effect on the types of uptalk contours used: speakers from both dialects used primarily three of the four uptalk contours that were examined (H* L-H%, L* H-H% and L* L-H%). These uptalk contours differed from the uptalk contours identified in other North American varieties of English, suggesting that there are regional differences in uptalk realization.Undergraduate Research ScholarshipNational Science Foundation (BCS-1056409)No embargoAcademic Major: LinguisticsAcademic Major: Spanis
    corecore