122 research outputs found

    Robust speaker recognition in presence of non-trivial environmental noise (toward greater biometric security)

    Get PDF
    The aim of this thesis is to investigate speaker recognition in the presence of environmental noise, and to develop a robust speaker recognition method. Recently, Speaker Recognition has been the object of considerable research due to its wide use in various areas. Despite major developments in this field, there are still many limitations and challenges. Environmental noises and their variations are high up in the list of challenges since it impossible to provide a noise free environment. A novel approach is proposed to address the issue of performance degradation in environmental noise. This approach is based on the estimation of signal-to-noise ratio (SNR) and detection of ambient noise from the recognition signal to re-train the reference model for the claimed speaker and to generate a new adapted noisy model to decrease the noise mismatch with recognition utterances. This approach is termed “Training on the fly” for robustness of speaker recognition under noisy environments. To detect the noise in the recognition signal two different techniques are proposed: the first technique including generating an emulated noise depending on estimated power spectrum of the original noise using 1/3 octave band filter bank and white noise signal. This emulated noise become close enough to original one that includes in the input signal (recognition signal). The second technique deals with extracting the noise from the input signal using one of speech enhancement algorithm with spectral subtraction to find the noise in the signal. Training on the fly approach (using both techniques) has been examined using two feature approaches and two different kinds of artificial clean and noisy speech databases collected in different environments. Furthermore, the speech samples were text independent. The training on the fly approach is a significant improvement in performance when compared with the performance of conventional speaker recognition (based on clean reference models). Moreover, the training on the fly based on noise extraction showed the best results for all types of noisy data

    Efficient speaker recognition for mobile devices

    Get PDF

    Classification and fusion methods for multimodal biometric authentication.

    Get PDF
    Ouyang, Hua.Thesis (M.Phil.)--Chinese University of Hong Kong, 2007.Includes bibliographical references (leaves 81-89).Abstracts in English and Chinese.Chapter 1 --- Introduction --- p.1Chapter 1.1 --- Biometric Authentication --- p.1Chapter 1.2 --- Multimodal Biometric Authentication --- p.2Chapter 1.2.1 --- Combination of Different Biometric Traits --- p.3Chapter 1.2.2 --- Multimodal Fusion --- p.5Chapter 1.3 --- Audio-Visual Bi-modal Authentication --- p.6Chapter 1.4 --- Focus of This Research --- p.7Chapter 1.5 --- Organization of This Thesis --- p.8Chapter 2 --- Audio-Visual Bi-modal Authentication --- p.10Chapter 2.1 --- Audio-visual Authentication System --- p.10Chapter 2.1.1 --- Why Audio and Mouth? --- p.10Chapter 2.1.2 --- System Overview --- p.11Chapter 2.2 --- XM2VTS Database --- p.12Chapter 2.3 --- Visual Feature Extraction --- p.14Chapter 2.3.1 --- Locating the Mouth --- p.14Chapter 2.3.2 --- Averaged Mouth Images --- p.17Chapter 2.3.3 --- Averaged Optical Flow Images --- p.21Chapter 2.4 --- Audio Features --- p.23Chapter 2.5 --- Video Stream Classification --- p.23Chapter 2.6 --- Audio Stream Classification --- p.25Chapter 2.7 --- Simple Fusion --- p.26Chapter 3 --- Weighted Sum Rules for Multi-modal Fusion --- p.27Chapter 3.1 --- Measurement-Level Fusion --- p.27Chapter 3.2 --- Product Rule and Sum Rule --- p.28Chapter 3.2.1 --- Product Rule --- p.28Chapter 3.2.2 --- Naive Sum Rule (NS) --- p.29Chapter 3.2.3 --- Linear Weighted Sum Rule (WS) --- p.30Chapter 3.3 --- Optimal Weights Selection for WS --- p.31Chapter 3.3.1 --- Independent Case --- p.31Chapter 3.3.2 --- Identical Case --- p.33Chapter 3.4 --- Confidence Measure Based Fusion Weights --- p.35Chapter 4 --- Regularized k-Nearest Neighbor Classifier --- p.39Chapter 4.1 --- Motivations --- p.39Chapter 4.1.1 --- Conventional k-NN Classifier --- p.39Chapter 4.1.2 --- Bayesian Formulation of kNN --- p.40Chapter 4.1.3 --- Pitfalls and Drawbacks of kNN Classifiers --- p.41Chapter 4.1.4 --- Metric Learning Methods --- p.43Chapter 4.2 --- Regularized k-Nearest Neighbor Classifier --- p.46Chapter 4.2.1 --- Metric or Not Metric? --- p.46Chapter 4.2.2 --- Proposed Classifier: RkNN --- p.47Chapter 4.2.3 --- Hyperkernels and Hyper-RKHS --- p.49Chapter 4.2.4 --- Convex Optimization of RkNN --- p.52Chapter 4.2.5 --- Hyper kernel Construction --- p.53Chapter 4.2.6 --- Speeding up RkNN --- p.56Chapter 4.3 --- Experimental Evaluation --- p.57Chapter 4.3.1 --- Synthetic Data Sets --- p.57Chapter 4.3.2 --- Benchmark Data Sets --- p.64Chapter 5 --- Audio-Visual Authentication Experiments --- p.68Chapter 5.1 --- Effectiveness of Visual Features --- p.68Chapter 5.2 --- Performance of Simple Sum Rule --- p.71Chapter 5.3 --- Performances of Individual Modalities --- p.73Chapter 5.4 --- Identification Tasks Using Confidence-based Weighted Sum Rule --- p.74Chapter 5.4.1 --- Effectiveness of WS_M_C Rule --- p.75Chapter 5.4.2 --- WS_M_C v.s. WS_M --- p.76Chapter 5.5 --- Speaker Identification Using RkNN --- p.77Chapter 6 --- Conclusions and Future Work --- p.78Chapter 6.1 --- Conclusions --- p.78Chapter 6.2 --- Important Follow-up Works --- p.80Bibliography --- p.81Chapter A --- Proof of Proposition 3.1 --- p.90Chapter B --- Proof of Proposition 3.2 --- p.9

    Verificaciónn de firma y gråficos manuscritos: Características discriminantes y nuevos escenarios de aplicación biométrica

    Full text link
    Tesis doctoral inédita leída en la Escuela Politécnica Superior, Departamento de Tecnología Electrónica y de las Comunicaciones. Fecha de lectura: Febrero 2015The proliferation of handheld devices such as smartphones and tablets brings a new scenario for biometric authentication, and in particular to automatic signature verification. Research on signature verification has been traditionally carried out using signatures acquired on digitizing tablets or Tablet-PCs. This PhD Thesis addresses the problem of user authentication on handled devices using handwritten signatures and graphical passwords based on free-form doodles, as well as the effects of biometric aging on signatures. The Thesis pretends to analyze: (i) which are the effects of mobile conditions on signature and doodle verification, (ii) which are the most distinctive features in mobile conditions, extracted from the pen or fingertip trajectory, (iii) how do different similarity computation (i.e. matching) algorithms behave with signatures and graphical passwords captured on mobile conditions, and (iv) what is the impact of aging on signature features and verification performance. Two novel datasets have been presented in this Thesis. A database containing free-form graphical passwords drawn with the fingertip on a smartphone is described. It is the first publicly available graphical password database to the extent of our knowledge. A dataset containing signatures from users captured over a period 15 months is also presented, aimed towards the study of biometric aging. State-of-the-art local and global matching algorithms are used, namely Hidden Markov Models, Gaussian Mixture Models, Dynamic Time Warping and distance-based classifiers. A large proportion of features presented in the research literature is considered in this Thesis. The experimental contribution of this Thesis is divided in three main topics: signature verification on handheld devices, the effects of aging on signature verification, and free-form graphical password-based authentication. First, regarding signature verification in mobile conditions, we use a database captured both on a handheld device and digitizing tablet in an office-like scenario. We analyze the discriminative power of both global and local features using discriminant analysis and feature selection techniques. The effects of the lack of pen-up trajectories on handheld devices (when the stylus tip is not in contact with the screen) are also studied. We then analyze the effects of biometric aging on the signature trait. Using three different matching algorithms, Hidden Markov Models (HMM), Dynamic Time Warping (DTW), and distance-based classifiers, the impact in verification performance is studied. We also study the effects of aging on individual users and individual signature features. Template update techniques are analyzed as a way of mitigating the negative impact of aging. Regarding graphical passwords, the DooDB graphical password database is first presented. A statistical analysis is performed comparing the database samples (free-form doodles and simplified signatures) with handwritten signatures. The sample variability (inter-user, intra-user and inter-session) is also analyzed, as well as the learning curve for each kind of trait. Benchmark results are also reported using state of the art classifiers. Graphical password verification is afterwards studied using features and matching algorithms from the signature verification state of the art. Feature selection is also performed and the resulting feature sets are analyzed. The main contributions of this work can be summarized as follows. A thorough analysis of individual feature performance has been carried out, both for global and local features and on signatures acquired using pen tablets and handheld devices. We have found which individual features are the most robust and which have very low discriminative potential (pen inclination and pressure among others). It has been found that feature selection increases verification performance dramatically, from example from ERRs (Equal Error Rates) over 30% using all available local features, in the case of handheld devices and skilled forgeries, to rates below 20% after feature selection. We study the impact of the lack of trajectory information when the pen tip is not in contact with the acquisition device surface (which happens when touchscreens are used for signature acquisitions), and we have found that the lack of pen-up trajectories negatively affects verification performance. As an example, the EER for the local system increases from 9.3% to 12.1% against skilled forgeries when pen-up trajectories are not available. We study the effects of biometric aging on signature verification and study a number of ways to compensate the observed performance degradation. It is found that aging does not affect equally all the users in the database and that features related to signature dynamics are more degraded than static features. Comparing the performance using test signatures from the first months with the last months, a variable effect of aging on the EER against random forgeries is observed in the three systems that are evaluated, from 0.0% to 0.5% in the DTW system, from 1.0% to 5.0% in the distance-based system using global features, and from 3.2% to 27.8% in the HMM system. A new graphical password database has been acquired and made publicly available. Verification algorithms for finger-drawn graphical passwords and simplified signatures are compared and feature analysis is performed. We have found that inter-session variability has a highly negative impact on verification performance, but this can be mitigated performing feature selection and applying fusion of different matchers. It has also been found that some feature types are prevalent in the optimal feature vectors and that classifiers have a very different behavior against skilled and random forgeries. An EER of 3.4% and 22.1% against random and skilled forgeries is obtained for free-form doodles, which is a promising performance

    Robust text independent closed set speaker identification systems and their evaluation

    Get PDF
    PhD ThesisThis thesis focuses upon text independent closed set speaker identi cation. The contributions relate to evaluation studies in the presence of various types of noise and handset e ects. Extensive evaluations are performed on four databases. The rst contribution is in the context of the use of the Gaussian Mixture Model-Universal Background Model (GMM-UBM) with original speech recordings from only the TIMIT database. Four main simulations for Speaker Identi cation Accuracy (SIA) are presented including di erent fusion strategies: Late fusion (score based), early fusion (feature based) and early-late fusion (combination of feature and score based), late fusion using concatenated static and dynamic features (features with temporal derivatives such as rst order derivative delta and second order derivative delta-delta features, namely acceleration features), and nally fusion of statistically independent normalized scores. The second contribution is again based on the GMM-UBM approach. Comprehensive evaluations of the e ect of Additive White Gaussian Noise (AWGN), and Non-Stationary Noise (NSN) (with and without a G.712 type handset) upon identi cation performance are undertaken. In particular, three NSN types with varying Signal to Noise Ratios (SNRs) were tested corresponding to: street tra c, a bus interior and a crowded talking environment. The performance evaluation also considered the e ect of late fusion techniques based on score fusion, namely mean, maximum, and linear weighted sum fusion. The databases employed were: TIMIT, SITW, and NIST 2008; and 120 speakers were selected from each database to yield 3,600 speech utterances. The third contribution is based on the use of the I-vector, four combinations of I-vectors with 100 and 200 dimensions were employed. Then, various fusion techniques using maximum, mean, weighted sum and cumulative fusion with the same I-vector dimension were used to improve the SIA. Similarly, both interleaving and concatenated I-vector fusion were exploited to produce 200 and 400 I-vector dimensions. The system was evaluated with four di erent databases using 120 speakers from each database. TIMIT, SITW and NIST 2008 databases were evaluated for various types of NSN namely, street-tra c NSN, bus-interior NSN and crowd talking NSN; and the G.712 type handset at 16 kHz was also applied. As recommendations from the study in terms of the GMM-UBM approach, mean fusion is found to yield overall best performance in terms of the SIA with noisy speech, whereas linear weighted sum fusion is overall best for original database recordings. However, in the I-vector approach the best SIA was obtained from the weighted sum and the concatenated fusion.Ministry of Higher Education and Scienti c Research (MoHESR), and the Iraqi Cultural Attach e, Al-Mustansiriya University, Al-Mustansiriya University College of Engineering in Iraq for supporting my PhD scholarship

    Getting Past the Language Gap: Innovations in Machine Translation

    Get PDF
    In this chapter, we will be reviewing state of the art machine translation systems, and will discuss innovative methods for machine translation, highlighting the most promising techniques and applications. Machine translation (MT) has benefited from a revitalization in the last 10 years or so, after a period of relatively slow activity. In 2005 the field received a jumpstart when a powerful complete experimental package for building MT systems from scratch became freely available as a result of the unified efforts of the MOSES international consortium. Around the same time, hierarchical methods had been introduced by Chinese researchers, which allowed the introduction and use of syntactic information in translation modeling. Furthermore, the advances in the related field of computational linguistics, making off-the-shelf taggers and parsers readily available, helped give MT an additional boost. Yet there is still more progress to be made. For example, MT will be enhanced greatly when both syntax and semantics are on board: this still presents a major challenge though many advanced research groups are currently pursuing ways to meet this challenge head-on. The next generation of MT will consist of a collection of hybrid systems. It also augurs well for the mobile environment, as we look forward to more advanced and improved technologies that enable the working of Speech-To-Speech machine translation on hand-held devices, i.e. speech recognition and speech synthesis. We review all of these developments and point out in the final section some of the most promising research avenues for the future of MT

    Single-Microphone Speech Enhancement and Separation Using Deep Learning

    Get PDF
    The cocktail party problem comprises the challenging task of understanding a speech signal in a complex acoustic environment, where multiple speakers and background noise signals simultaneously interfere with the speech signal of interest. A signal processing algorithm that can effectively increase the speech intelligibility and quality of speech signals in such complicated acoustic situations is highly desirable. Especially for applications involving mobile communication devices and hearing assistive devices. Due to the re-emergence of machine learning techniques, today, known as deep learning, the challenges involved with such algorithms might be overcome. In this PhD thesis, we study and develop deep learning-based techniques for two sub-disciplines of the cocktail party problem: single-microphone speech enhancement and single-microphone multi-talker speech separation. Specifically, we conduct in-depth empirical analysis of the generalizability capability of modern deep learning-based single-microphone speech enhancement algorithms. We show that performance of such algorithms is closely linked to the training data, and good generalizability can be achieved with carefully designed training data. Furthermore, we propose uPIT, a deep learning-based algorithm for single-microphone speech separation and we report state-of-the-art results on a speaker-independent multi-talker speech separation task. Additionally, we show that uPIT works well for joint speech separation and enhancement without explicit prior knowledge about the noise type or number of speakers. Finally, we show that deep learning-based speech enhancement algorithms designed to minimize the classical short-time spectral amplitude mean squared error leads to enhanced speech signals which are essentially optimal in terms of STOI, a state-of-the-art speech intelligibility estimator.Comment: PhD Thesis. 233 page
    • 

    corecore