3,589 research outputs found

    Local representations and random sampling for speaker verification

    Get PDF
    In text-independent speaker verification, studies focused on compensating intra-speaker variabilities at the modeling stage through the last decade. Intra-speaker variabilities may be due to channel effects, phonetic content or the speaker himself in the form of speaking style, emotional state, health or other similar factors. Joint Factor Analysis, Total Variability Space compensation, Nuisance Attribute Projection are some of the most successful approaches for inter-session variability compensation in the literature. In this thesis, we criticize the assumptions of low dimensionality of channel space in these methods and propose to partition the acoustic space into local regions. Intra-speaker variability compensation may be done in each local space separately. Two architectures are proposed depending on whether the subsequent modeling and scoring steps will also be done locally or globally. We have also focused on a particular component of intra-speaker variability, namely within-session variability. The main source of within-session variability is the differences in the phonetic content of speech segments in a single utterance. The variabilities in phonetic content may be either due to across acoustic event variabilities or due to differences in the actual realizations of the acoustic events. We propose a method to combat these variabilities through random sampling of training utterance. The method is shown to be effective both in short and long test utterances

    A Differential Approach for Gaze Estimation

    Full text link
    Non-invasive gaze estimation methods usually regress gaze directions directly from a single face or eye image. However, due to important variabilities in eye shapes and inner eye structures amongst individuals, universal models obtain limited accuracies and their output usually exhibit high variance as well as biases which are subject dependent. Therefore, increasing accuracy is usually done through calibration, allowing gaze predictions for a subject to be mapped to his/her actual gaze. In this paper, we introduce a novel image differential method for gaze estimation. We propose to directly train a differential convolutional neural network to predict the gaze differences between two eye input images of the same subject. Then, given a set of subject specific calibration images, we can use the inferred differences to predict the gaze direction of a novel eye sample. The assumption is that by allowing the comparison between two eye images, annoyance factors (alignment, eyelid closing, illumination perturbations) which usually plague single image prediction methods can be much reduced, allowing better prediction altogether. Experiments on 3 public datasets validate our approach which constantly outperforms state-of-the-art methods even when using only one calibration sample or when the latter methods are followed by subject specific gaze adaptation.Comment: Extension to our paper A differential approach for gaze estimation with calibration (BMVC 2018) Submitted to PAMI on Aug. 7th, 2018 Accepted by PAMI short on Dec. 2019, in IEEE Transactions on Pattern Analysis and Machine Intelligenc

    Speaker recognition by means of restricted Boltzmann machine adaptation

    Get PDF
    Restricted Boltzmann Machines (RBMs) have shown success in speaker recognition. In this paper, RBMs are investigated in a framework comprising a universal model training and model adaptation. Taking advantage of RBM unsupervised learning algorithm, a global model is trained based on all available background data. This general speaker-independent model, referred to as URBM, is further adapted to the data of a specific speaker to build speaker-dependent model. In order to show its effectiveness, we have applied this framework to two different tasks. It has been used to discriminatively model target and impostor spectral features for classification. It has been also utilized to produce a vector-based representation for speakers. This vector-based representation, similar to i-vector, can be further used for speaker recognition using either cosine scoring or Probabilistic Linear Discriminant Analysis (PLDA). The evaluation is performed on the core test condition of the NIST SRE 2006 database.Peer ReviewedPostprint (author's final draft

    Advancing Pattern Recognition Techniques for Brain-Computer Interfaces: Optimizing Discriminability, Compactness, and Robustness

    Get PDF
    In dieser Dissertation formulieren wir drei zentrale Zielkriterien zur systematischen Weiterentwicklung der Mustererkennung moderner Brain-Computer Interfaces (BCIs). Darauf aufbauend wird ein Rahmenwerk zur Mustererkennung von BCIs entwickelt, das die drei Zielkriterien durch einen neuen Optimierungsalgorithmus vereint. Darüber hinaus zeigen wir die erfolgreiche Umsetzung unseres Ansatzes für zwei innovative BCI Paradigmen, für die es bisher keine etablierte Mustererkennungsmethodik gibt

    A Multiday Evaluation of Real-Time Intramuscular EMG Usability with ANN

    Get PDF
    Recent developments in implantable technology, such as high-density recordings, wireless transmission of signals to a prosthetic hand, may pave the way for intramuscular electromyography (iEMG)-based myoelectric control in the future. This study aimed to investigate the real-time control performance of iEMG over time. A novel protocol was developed to quantify the robustness of the real-time performance parameters. Intramuscular wires were used to record EMG signals, which were kept inside the muscles for five consecutive days. Tests were performed on multiple days using Fitts’ law. Throughput, completion rate, path efficiency and overshoot were evaluated as performance metrics using three train/test strategies. Each train/test scheme was categorized on the basis of data quantity and the time difference between training and testing data. An artificial neural network (ANN) classifier was trained and tested on (i) data from the same day (WDT), (ii) data collected from the previous day and tested on present-day (BDT) and (iii) trained on all previous days including the present day and tested on present-day (CDT). It was found that the completion rate (91.6 ± 3.6%) of CDT was significantly better (p < 0.01) than BDT (74.02 ± 5.8%) and WDT (88.16 ± 3.6%). For BDT, on average, the first session of each day was significantly better (p < 0.01) than the second and third sessions for completion rate (77.9 ± 14.0%) and path efficiency (88.9 ± 16.9%). Subjects demonstrated the ability to achieve targets successfully with wire electrodes. Results also suggest that time variations in the iEMG signal can be catered by concatenating the data over several days. This scheme can be helpful in attaining stable and robust performance
    corecore