48,980 research outputs found

    Self-Supervised Vision-Based Detection of the Active Speaker as Support for Socially-Aware Language Acquisition

    Full text link
    This paper presents a self-supervised method for visual detection of the active speaker in a multi-person spoken interaction scenario. Active speaker detection is a fundamental prerequisite for any artificial cognitive system attempting to acquire language in social settings. The proposed method is intended to complement the acoustic detection of the active speaker, thus improving the system robustness in noisy conditions. The method can detect an arbitrary number of possibly overlapping active speakers based exclusively on visual information about their face. Furthermore, the method does not rely on external annotations, thus complying with cognitive development. Instead, the method uses information from the auditory modality to support learning in the visual domain. This paper reports an extensive evaluation of the proposed method using a large multi-person face-to-face interaction dataset. The results show good performance in a speaker dependent setting. However, in a speaker independent setting the proposed method yields a significantly lower performance. We believe that the proposed method represents an essential component of any artificial cognitive system or robotic platform engaging in social interactions.Comment: 10 pages, IEEE Transactions on Cognitive and Developmental System

    Efficient Invariant Features for Sensor Variability Compensation in Speaker Recognition

    Get PDF
    In this paper, we investigate the use of invariant features for speaker recognition. Owing to their characteristics, these features are introduced to cope with the difficult and challenging problem of sensor variability and the source of performance degradation inherent in speaker recognition systems. Our experiments show: (1) the effectiveness of these features in match cases; (2) the benefit of combining these features with the mel frequency cepstral coefficients to exploit their discrimination power under uncontrolled conditions (mismatch cases). Consequently, the proposed invariant features result in a performance improvement as demonstrated by a reduction in the equal error rate and the minimum decision cost function compared to the GMM-UBM speaker recognition systems based on MFCC features

    A food chain approach to control of Shiga toxin-producing Escherichia coli in New Zealand : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Veterinary Science at Massey University, Palmerston North, New Zealand

    Get PDF
    Part of Chapter 3 has been published as: Browne, A.S., Midwinter, A.C., Withers, H., Cookson, A.L., Biggs, P.J., Marshall J.C., Benschop, J., Hathaway, S., Haack, N., Akhter, R., & French, N.P. (2018). Molecular epidemiology of Shiga toxin-producing Escherichia coli (STEC) on New Zealand dairy farms: Application of a culture-independent assay and whole-genome sequencing. Applied and Environmental Microbiology, 84(14). DOI: 10.1128/AEM.00481-18This thesis describes the prevalence and molecular epidemiology of Shiga toxin-producing Escherichia coli (STEC) in New Zealand using microbiological, genomic, molecular, and statistical methods. STEC are a zoonotic pathogen that can cause bloody diarrhoea and acute kidney failure. Cattle are a well-recognized STEC reservoir, and previous research has identified living near cattle and contact with their faeces as an increased risk for human infection. Seven STEC serogroups (O157, O26, O45, O103, O111, O121, O145), known as the ‘Top 7’ STEC, have been identified as an increased risk to human health, with the New Zealand meat industry undertaking testing to ensure that veal beef exports to some international markets are free of these ‘Top 7’ serogroups. A random stratified cross-sectional study of ‘Top 7’ STEC prevalence of young dairy calves (n=1,508) on New Zealand dairy farms (n=102) found that approximately 20% of calves and 75% of farms were positive for one or more of the ‘Top 7’ STEC. ‘Top 7’ STEC prevalence was positively associated with increased number of calves in a calf pen, and prevalence significantly varied by region. This study utilized a new culture-independent diagnostic test, NeoSEEK (PCR/MALDI-TOF method), and used statistical and microbiological techniques to evaluate the sensitivity and specificity of the method for this and further studies. A longitudinal study evaluating prevalence and transmission of ‘Top 7’ STEC in animals and the dairy farm environment found evidence of calf-to-calf, dam-to-calf, and environment-to-calf transmission. Whole genome sequencing analysis and prevalence data revealed cross-contamination of young veal calf hides occurs during transport and lairage to processing plants. Analysis of New Zealand serogroup O26 bacterial isolates (n=152), in comparison to publicly available genome sequence data (n=252) from other countries (n=14), suggested introduction of STEC and non-STEC O26 into New Zealand during few periods in the 20th and early 21st century. Populations of New Zealand serogroup O26 E. coli are monophyletic, possibly due to minimal live cattle importations into the country. Further research in this area should focus on effective interventions at the farm and meat processing level to decrease the risk of veal beef contamination, while protecting public health

    Constrained speaker linking

    Get PDF
    In this paper we study speaker linking (a.k.a.\ partitioning) given constraints of the distribution of speaker identities over speech recordings. Specifically, we show that the intractable partitioning problem becomes tractable when the constraints pre-partition the data in smaller cliques with non-overlapping speakers. The surprisingly common case where speakers in telephone conversations are known, but the assignment of channels to identities is unspecified, is treated in a Bayesian way. We show that for the Dutch CGN database, where this channel assignment task is at hand, a lightweight speaker recognition system can quite effectively solve the channel assignment problem, with 93% of the cliques solved. We further show that the posterior distribution over channel assignment configurations is well calibrated.Comment: Submitted to Interspeech 2014, some typos fixe

    NPLDA: A Deep Neural PLDA Model for Speaker Verification

    Full text link
    The state-of-art approach for speaker verification consists of a neural network based embedding extractor along with a backend generative model such as the Probabilistic Linear Discriminant Analysis (PLDA). In this work, we propose a neural network approach for backend modeling in speaker recognition. The likelihood ratio score of the generative PLDA model is posed as a discriminative similarity function and the learnable parameters of the score function are optimized using a verification cost. The proposed model, termed as neural PLDA (NPLDA), is initialized using the generative PLDA model parameters. The loss function for the NPLDA model is an approximation of the minimum detection cost function (DCF). The speaker recognition experiments using the NPLDA model are performed on the speaker verificiation task in the VOiCES datasets as well as the SITW challenge dataset. In these experiments, the NPLDA model optimized using the proposed loss function improves significantly over the state-of-art PLDA based speaker verification system.Comment: Published in Odyssey 2020, the Speaker and Language Recognition Workshop (VOiCES Special Session). Link to GitHub Implementation: https://github.com/iiscleap/NeuralPlda. arXiv admin note: substantial text overlap with arXiv:2001.0703
    • …
    corecore