48,980 research outputs found
Self-Supervised Vision-Based Detection of the Active Speaker as Support for Socially-Aware Language Acquisition
This paper presents a self-supervised method for visual detection of the
active speaker in a multi-person spoken interaction scenario. Active speaker
detection is a fundamental prerequisite for any artificial cognitive system
attempting to acquire language in social settings. The proposed method is
intended to complement the acoustic detection of the active speaker, thus
improving the system robustness in noisy conditions. The method can detect an
arbitrary number of possibly overlapping active speakers based exclusively on
visual information about their face. Furthermore, the method does not rely on
external annotations, thus complying with cognitive development. Instead, the
method uses information from the auditory modality to support learning in the
visual domain. This paper reports an extensive evaluation of the proposed
method using a large multi-person face-to-face interaction dataset. The results
show good performance in a speaker dependent setting. However, in a speaker
independent setting the proposed method yields a significantly lower
performance. We believe that the proposed method represents an essential
component of any artificial cognitive system or robotic platform engaging in
social interactions.Comment: 10 pages, IEEE Transactions on Cognitive and Developmental System
Efficient Invariant Features for Sensor Variability Compensation in Speaker Recognition
In this paper, we investigate the use of invariant features for speaker recognition. Owing to their characteristics, these features are introduced to cope with the difficult and challenging problem of sensor variability and the source of performance degradation inherent in speaker recognition systems. Our experiments show: (1) the effectiveness of these features in match cases; (2) the benefit of combining these features with the mel frequency cepstral coefficients to exploit their discrimination power under uncontrolled conditions (mismatch cases). Consequently, the proposed invariant features result in a performance improvement as demonstrated by a reduction in the equal error rate and the minimum decision cost function compared to the GMM-UBM speaker recognition systems based on MFCC features
A food chain approach to control of Shiga toxin-producing Escherichia coli in New Zealand : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Veterinary Science at Massey University, Palmerston North, New Zealand
Part of Chapter 3 has been published as:
Browne, A.S., Midwinter, A.C., Withers, H., Cookson, A.L., Biggs, P.J., Marshall J.C., Benschop, J., Hathaway, S., Haack, N., Akhter, R., & French, N.P. (2018). Molecular epidemiology of Shiga toxin-producing Escherichia coli (STEC) on New Zealand dairy farms: Application of a culture-independent assay and whole-genome sequencing. Applied and Environmental Microbiology, 84(14). DOI: 10.1128/AEM.00481-18This thesis describes the prevalence and molecular epidemiology of Shiga toxin-producing Escherichia coli (STEC) in New Zealand using microbiological, genomic, molecular, and statistical methods. STEC are a zoonotic pathogen that can cause
bloody diarrhoea and acute kidney failure. Cattle are a well-recognized STEC reservoir, and previous research has identified living near cattle and contact with their faeces as an increased risk for human infection. Seven STEC serogroups (O157, O26,
O45, O103, O111, O121, O145), known as the ‘Top 7’ STEC, have been identified as an increased risk to human health, with the New Zealand meat industry undertaking testing to ensure that veal beef exports to some international markets are free of these ‘Top 7’ serogroups. A random stratified cross-sectional study of ‘Top 7’ STEC prevalence of young dairy calves (n=1,508) on New Zealand dairy farms (n=102) found that approximately 20% of calves and 75% of farms were positive for one or more of the ‘Top 7’ STEC. ‘Top 7’ STEC prevalence was positively associated with increased number of calves in a calf pen, and prevalence significantly varied by region. This study utilized a new culture-independent diagnostic test, NeoSEEK (PCR/MALDI-TOF method), and used statistical and microbiological techniques to evaluate the sensitivity and specificity of the method for this and further studies. A longitudinal study evaluating prevalence and transmission of ‘Top 7’ STEC in animals and the dairy farm environment found evidence of calf-to-calf, dam-to-calf, and environment-to-calf transmission. Whole genome sequencing analysis and prevalence data revealed cross-contamination of young veal calf hides occurs during transport and lairage to processing plants. Analysis of New Zealand serogroup O26 bacterial isolates (n=152), in comparison to publicly available genome sequence data (n=252) from other countries (n=14), suggested introduction of STEC and non-STEC O26 into New Zealand during few periods in the 20th and early 21st century. Populations of New Zealand serogroup O26 E. coli are monophyletic, possibly due to minimal live cattle importations into the country. Further research in this area should focus on effective interventions at the farm and meat processing level to decrease the risk of veal beef contamination, while protecting
public health
Constrained speaker linking
In this paper we study speaker linking (a.k.a.\ partitioning) given
constraints of the distribution of speaker identities over speech recordings.
Specifically, we show that the intractable partitioning problem becomes
tractable when the constraints pre-partition the data in smaller cliques with
non-overlapping speakers. The surprisingly common case where speakers in
telephone conversations are known, but the assignment of channels to identities
is unspecified, is treated in a Bayesian way. We show that for the Dutch CGN
database, where this channel assignment task is at hand, a lightweight speaker
recognition system can quite effectively solve the channel assignment problem,
with 93% of the cliques solved. We further show that the posterior distribution
over channel assignment configurations is well calibrated.Comment: Submitted to Interspeech 2014, some typos fixe
NPLDA: A Deep Neural PLDA Model for Speaker Verification
The state-of-art approach for speaker verification consists of a neural
network based embedding extractor along with a backend generative model such as
the Probabilistic Linear Discriminant Analysis (PLDA). In this work, we propose
a neural network approach for backend modeling in speaker recognition. The
likelihood ratio score of the generative PLDA model is posed as a
discriminative similarity function and the learnable parameters of the score
function are optimized using a verification cost. The proposed model, termed as
neural PLDA (NPLDA), is initialized using the generative PLDA model parameters.
The loss function for the NPLDA model is an approximation of the minimum
detection cost function (DCF). The speaker recognition experiments using the
NPLDA model are performed on the speaker verificiation task in the VOiCES
datasets as well as the SITW challenge dataset. In these experiments, the NPLDA
model optimized using the proposed loss function improves significantly over
the state-of-art PLDA based speaker verification system.Comment: Published in Odyssey 2020, the Speaker and Language Recognition
Workshop (VOiCES Special Session). Link to GitHub Implementation:
https://github.com/iiscleap/NeuralPlda. arXiv admin note: substantial text
overlap with arXiv:2001.0703
- …