3,589 research outputs found
Local representations and random sampling for speaker verification
In text-independent speaker verification, studies focused on compensating intra-speaker variabilities at the modeling stage through the last decade. Intra-speaker variabilities may be due to channel effects, phonetic content or the speaker himself in the form of speaking style, emotional state, health or other similar factors. Joint Factor Analysis, Total Variability Space compensation, Nuisance Attribute Projection are some of the most successful approaches for inter-session variability compensation in the literature. In this thesis, we criticize the assumptions of low dimensionality of channel space in these methods and propose to partition the acoustic space into local regions. Intra-speaker variability compensation may be done in each local space separately. Two architectures are proposed depending on whether the subsequent modeling and scoring steps will also be done locally or globally. We have also focused on a particular component of intra-speaker variability, namely within-session variability. The main source of within-session variability is the differences in the phonetic content of speech segments in a single utterance. The variabilities in phonetic content may be either due to across acoustic event variabilities or due to differences in the actual realizations of the acoustic events. We propose a method to combat these variabilities through random sampling of training utterance. The method is shown to be effective both in short and long test utterances
A Differential Approach for Gaze Estimation
Non-invasive gaze estimation methods usually regress gaze directions directly
from a single face or eye image. However, due to important variabilities in eye
shapes and inner eye structures amongst individuals, universal models obtain
limited accuracies and their output usually exhibit high variance as well as
biases which are subject dependent. Therefore, increasing accuracy is usually
done through calibration, allowing gaze predictions for a subject to be mapped
to his/her actual gaze. In this paper, we introduce a novel image differential
method for gaze estimation. We propose to directly train a differential
convolutional neural network to predict the gaze differences between two eye
input images of the same subject. Then, given a set of subject specific
calibration images, we can use the inferred differences to predict the gaze
direction of a novel eye sample. The assumption is that by allowing the
comparison between two eye images, annoyance factors (alignment, eyelid
closing, illumination perturbations) which usually plague single image
prediction methods can be much reduced, allowing better prediction altogether.
Experiments on 3 public datasets validate our approach which constantly
outperforms state-of-the-art methods even when using only one calibration
sample or when the latter methods are followed by subject specific gaze
adaptation.Comment: Extension to our paper A differential approach for gaze estimation
with calibration (BMVC 2018) Submitted to PAMI on Aug. 7th, 2018 Accepted by
PAMI short on Dec. 2019, in IEEE Transactions on Pattern Analysis and Machine
Intelligenc
Effects of steady state free precession parameters on cardiac mass, function, and volumes
G0400444/Medical Research Council/United Kingdom
Wellcome Trust/United Kingdo
Speaker recognition by means of restricted Boltzmann machine adaptation
Restricted Boltzmann Machines (RBMs) have shown success in speaker recognition. In this paper, RBMs are investigated in a framework comprising a universal model training and model adaptation. Taking advantage of RBM unsupervised learning algorithm, a global model is trained based on all available background data. This general speaker-independent model, referred to as URBM, is further adapted to the data of a specific speaker to build speaker-dependent model. In order to show its effectiveness, we have applied this framework to two different tasks. It has been used to discriminatively model target and impostor spectral features for classification. It has been also utilized to produce a vector-based representation for speakers. This vector-based representation, similar to i-vector, can be further used for speaker recognition using either cosine scoring or Probabilistic Linear Discriminant Analysis (PLDA). The evaluation is performed on the core test condition of the NIST SRE 2006 database.Peer ReviewedPostprint (author's final draft
Advancing Pattern Recognition Techniques for Brain-Computer Interfaces: Optimizing Discriminability, Compactness, and Robustness
In dieser Dissertation formulieren wir drei zentrale Zielkriterien zur systematischen Weiterentwicklung der Mustererkennung moderner Brain-Computer Interfaces (BCIs). Darauf aufbauend wird ein Rahmenwerk zur Mustererkennung von BCIs entwickelt, das die drei Zielkriterien durch einen neuen Optimierungsalgorithmus vereint. Darüber hinaus zeigen wir die erfolgreiche Umsetzung unseres Ansatzes für zwei innovative BCI Paradigmen, für die es bisher keine etablierte Mustererkennungsmethodik gibt
A Multiday Evaluation of Real-Time Intramuscular EMG Usability with ANN
Recent developments in implantable technology, such as high-density recordings, wireless transmission of signals to a prosthetic hand, may pave the way for intramuscular electromyography (iEMG)-based myoelectric control in the future. This study aimed to investigate the real-time control performance of iEMG over time. A novel protocol was developed to quantify the robustness of the real-time performance parameters. Intramuscular wires were used to record EMG signals, which were kept inside the muscles for five consecutive days. Tests were performed on multiple days using Fitts’ law. Throughput, completion rate, path efficiency and overshoot were evaluated as performance metrics using three train/test strategies. Each train/test scheme was categorized on the basis of data quantity and the time difference between training and testing data. An artificial neural network (ANN) classifier was trained and tested on (i) data from the same day (WDT), (ii) data collected from the previous day and tested on present-day (BDT) and (iii) trained on all previous days including the present day and tested on present-day (CDT). It was found that the completion rate (91.6 ± 3.6%) of CDT was significantly better (p < 0.01) than BDT (74.02 ± 5.8%) and WDT (88.16 ± 3.6%). For BDT, on average, the first session of each day was significantly better (p < 0.01) than the second and third sessions for completion rate (77.9 ± 14.0%) and path efficiency (88.9 ± 16.9%). Subjects demonstrated the ability to achieve targets successfully with wire electrodes. Results also suggest that time variations in the iEMG signal can be catered by concatenating the data over several days. This scheme can be helpful in attaining stable and robust performance
- …