4,720 research outputs found

    Max-margin Metric Learning for Speaker Recognition

    Full text link
    Probabilistic linear discriminant analysis (PLDA) is a popular normalization approach for the i-vector model, and has delivered state-of-the-art performance in speaker recognition. A potential problem of the PLDA model, however, is that it essentially assumes Gaussian distributions over speaker vectors, which is not always true in practice. Additionally, the objective function is not directly related to the goal of the task, e.g., discriminating true speakers and imposters. In this paper, we propose a max-margin metric learning approach to solve the problems. It learns a linear transform with a criterion that the margin between target and imposter trials are maximized. Experiments conducted on the SRE08 core test show that compared to PLDA, the new approach can obtain comparable or even better performance, though the scoring is simply a cosine computation

    System Independent Fault Diagnosis for Synchronous Generator

    Get PDF
    Creating a unified fault diagnosis model that can detect faults across systems with different ratings (system independent fault diagnosis) would be of great interest in making condition-based maintenance (CBM) more popular. In this work, three phase synchronous generators with 3 and 5 kVA ratings are used for detecting stator inter-turn short circuit faults. Our baseline is a 3 kVA generator working at 1 A load during training and testing, to emulate the system/load dependent fault diagnosis. We obtained a classification accuracy of 99.75%, 100% and 100% for R phase, Y phase and B phase faults respectively. Subsequently, we evaluated the system for its load independent performance. Performance accuracy deteriorated due to the load specific variations (LSV) in the input feature vector (IFV). LSV is undesired, and we used nuisance attribute projection (NAP) to remove them. Using NAP, we obtained a performance improvement of 23.13%, 17.75% and 20.72% for three fault models on the 3 kVA generator and similar performance improvement was obtained for 5 kVA generator also. Further, we experimented for load and system independent fault diagnosis. In this case, we consider LSV and system specific variations (SSV) on IFV as undesired. We experimented with two types of NAP, (1) single step NAP, (2) stacked NAP. Experimental results show that the two staged stacked NAP outperforms. We obtained an improvement of 23.99%, 16.06% and 28.39%, in classification accuracy for three fault models, resulting in overall classification accuracy of 89.22%, 94.67% and 94.59% for R phase, Y phase and B phase fault models respectively

    Forensic and Automatic Speaker Recognition System

    Get PDF
    Current Automatic Speaker Recognition (ASR) System has emerged as an important medium of confirmation of identity in many businesses, ecommerce applications, forensics and law enforcement as well. Specialists trained in criminological recognition can play out this undertaking far superior by looking at an arrangement of acoustic, prosodic, and semantic attributes which has been referred to as structured listening. An algorithmbased system has been developed in the recognition of forensic speakers by physics scientists and forensic linguists to reduce the probability of a contextual bias or pre-centric understanding of a reference model with the validity of an unknown audio sample and any suspicious individual. Many researchers are continuing to develop automatic algorithms in signal processing and machine learning so that improving performance can effectively introduce the speaker’s identity, where the automatic system performs equally with the human audience. In this paper, I examine the literature about the identification of speakers by machines and humans, emphasizing the key technical speaker pattern emerging for the automatic technology in the last decade. I focus on many aspects of automatic speaker recognition (ASR) systems, including speaker-specific features, speaker models, standard assessment data sets, and performance metric

    Potential for social involvement modulates activity within the mirror and the mentalizing systems

    Get PDF
    Processing biological motion is fundamental for everyday life activities, such as social interaction, motor learning and nonverbal communication. The ability to detect the nature of a motor pattern has been investigated by means of point-light displays (PLD), sets of moving light points reproducing human kinematics, easily recognizable as meaningful once in motion. Although PLD are rudimentary, the human brain can decipher their content including social intentions. Neuroimaging studies suggest that inferring the social meaning conveyed by PLD could rely on both the Mirror Neuron System (MNS) and the Mentalizing System (MS), but their specific role to this endeavor remains uncertain. We describe a functional magnetic resonance imaging experiment in which participants had to judge whether visually presented PLD and videoclips of human-like walkers (HL) were facing towards or away from them. Results show that coding for stimulus direction specifically engages the MNS when considering PLD moving away from the observer, while the nature of the stimulus reveals a dissociation between MNS -mainly involved in coding for PLD- and MS, recruited by HL moving away. These results suggest that the contribution of the two systems can be modulated by the nature of the observed stimulus and its potential for social involvement

    Efficient Invariant Features for Sensor Variability Compensation in Speaker Recognition

    Get PDF
    In this paper, we investigate the use of invariant features for speaker recognition. Owing to their characteristics, these features are introduced to cope with the difficult and challenging problem of sensor variability and the source of performance degradation inherent in speaker recognition systems. Our experiments show: (1) the effectiveness of these features in match cases; (2) the benefit of combining these features with the mel frequency cepstral coefficients to exploit their discrimination power under uncontrolled conditions (mismatch cases). Consequently, the proposed invariant features result in a performance improvement as demonstrated by a reduction in the equal error rate and the minimum decision cost function compared to the GMM-UBM speaker recognition systems based on MFCC features
    • …
    corecore