1,480 research outputs found
Text-independent speaker recognition
This research presents new text-independent speaker recognition system with multivariate tools such as Principal Component Analysis (PCA) and Independent Component Analysis (ICA) embedded into the recognition system after the feature extraction step. The proposed approach evaluates the performance of such a recognition system when trained and used in clean and noisy environments. Additive white Gaussian noise and convolutive noise are added. Experiments were carried out to investigate the robust ability of PCA and ICA using the designed approach. The application of ICA improved the performance of the speaker recognition model when compared to PCA. Experimental results show that use of ICA enabled extraction of higher order statistics thereby capturing speaker dependent statistical cues in a text-independent recognition system. The results show that ICA has a better de-correlation and dimension reduction property than PCA. To simulate a multi environment system, we trained our model such that every time a new speech signal was read, it was contaminated with different types of noises and stored in the database. Results also show that ICA outperforms PCA under adverse environments. This is verified by computing recognition accuracy rates obtained when the designed system was tested for different train and test SNR conditions with additive white Gaussian noise and test delay conditions with echo effect
Recent Advances in Signal Processing
The signal processing task is a very critical issue in the majority of new technological inventions and challenges in a variety of applications in both science and engineering fields. Classical signal processing techniques have largely worked with mathematical models that are linear, local, stationary, and Gaussian. They have always favored closed-form tractability over real-world accuracy. These constraints were imposed by the lack of powerful computing tools. During the last few decades, signal processing theories, developments, and applications have matured rapidly and now include tools from many areas of mathematics, computer science, physics, and engineering. This book is targeted primarily toward both students and researchers who want to be exposed to a wide variety of signal processing techniques and algorithms. It includes 27 chapters that can be categorized into five different areas depending on the application at hand. These five categories are ordered to address image processing, speech processing, communication systems, time-series analysis, and educational packages respectively. The book has the advantage of providing a collection of applications that are completely independent and self-contained; thus, the interested reader can choose any chapter and skip to another without losing continuity
Models and analysis of vocal emissions for biomedical applications
This book of Proceedings collects the papers presented at the 3rd International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications, MAVEBA 2003, held 10-12 December 2003, Firenze, Italy. The workshop is organised every two years, and aims to stimulate contacts between specialists active in research and industrial developments, in the area of voice analysis for biomedical applications. The scope of the Workshop includes all aspects of voice modelling and analysis, ranging from fundamental research to all kinds of biomedical applications and related established and advanced technologies
Discriminative preprocessing of speech : towards improving biometric authentication
Im Rahmen des "SecurePhone-Projektes" wurde ein multimodales System zur Benutzerauthentifizierung entwickelt, das auf ein PDA implementiert wurde. Bei der vollzogenen Erweiterung dieses Systems wurde der Möglichkeit nachgegangen, die Benutzerauthentifizierung durch eine auf biometrischen Parametern (E.: "feature enhancement") basierende Unterscheidung zwischen Sprechern sowie durch eine Kombination mehrerer Parameter zu verbessern.
In der vorliegenden Dissertation wird ein allgemeines Bezugssystem zur Verbesserung der Parameter präsentiert, das ein mehrschichtiges neuronales Netz (E.: "MLP: multilayer perceptron") benutzt, um zu einer optimalen Sprecherdiskrimination zu gelangen.
In einem ersten Schritt wird beim Trainieren des MLPs eine Teilmenge der Sprecher (Sprecherbasis) berücksichtigt, um die zugrundeliegenden Charakteristika des vorhandenen akustischen Parameterraums darzustellen.
Am Ende eines zweiten Schrittes steht die Erkenntnis, dass die Größe der verwendeten Sprecherbasis die Leistungsfähigkeit eines Sprechererkennungssystems entscheidend beeinflussen kann.
Ein dritter Schritt führt zur Feststellung, dass sich die Selektion der Sprecherbasis ebenfalls auf die Leistungsfähigkeit des Systems auswirken kann. Aufgrund dieser Beobachtung wird eine automatische Selektionsmethode für die Sprecher auf der Basis des maximalen Durchschnittswertes der Zwischenklassenvariation (between-class variance) vorgeschlagen. Unter Rückgriff auf verschiedene sprachliche Produktionssituationen (Sprachproduktion mit und ohne Hintergrundgeräusche; Sprachproduktion beim Telefonieren) wird gezeigt, dass diese Methode die Leistungsfähigkeit des Erkennungssystems verbessern kann.
Auf der Grundlage dieser Ergebnisse wird erwartet, dass sich die hier für die Sprechererkennung verwendete Methode auch für andere biometrische Modalitäten als sinnvoll erweist.
Zusätzlich wird in der vorliegenden Dissertation eine alternative Parameterrepräsentation vorgeschlagen, die aus der sog. "Sprecher-Stimme-Signatur" (E.: "SVS: speaker voice signature") abgeleitet wird. Die SVS besteht aus Trajektorien in einem Kohonennetz (E.: "SOM: self-organising map"), das den akustischen Raum repräsentiert. Als weiteres Ergebnis der Arbeit erweist sich diese Parameterrepräsentation als Ergänzung zu dem zugrundeliegenden Parameterset. Deshalb liegt eine Kombination beider Parametersets im Sinne einer Verbesserung der Leistungsfähigkeit des Erkennungssystems nahe.
Am Ende der Arbeit sind schließlich einige potentielle Erweiterungsmöglichkeiten zu den vorgestellten Methoden zu finden.
Schlüsselwörter: Feature Enhancement, MLP, SOM, Sprecher-Basis-Selektion, SprechererkennungIn the context of the SecurePhone project, a multimodal user authentication system was developed for implementation on a PDA. Extending this system, we investigate biometric feature enhancement and multi-feature fusion with the aim of improving user authentication accuracy.
In this dissertation, a general framework for feature enhancement is proposed which uses a multilayer perceptron (MLP) to achieve optimal speaker discrimination.
First, to train this MLP a subset of speakers (speaker basis) is used to represent the underlying characteristics of the given acoustic feature space.
Second, the size of the speaker basis is found to be among the crucial factors affecting the performance of a speaker recognition system.
Third, it is found that the selection of the speaker basis can also influence system performance. Based on this observation, an automatic speaker selection approach is proposed on the basis of the maximal average between-class variance. Tests in a variety of conditions, including clean and noisy as well as telephone speech, show that this approach can improve the performance of speaker recognition systems. This approach, which is applied here to feature enhancement for speaker recognition, can be expected to also be effective with other biometric modalities besides speech.
Further, an alternative feature representation is proposed in this dissertation, which is derived from what we call speaker voice signatures (SVS). These are trajectories in a Kohonen self organising map (SOM) which has been trained to represent the acoustic space. This feature representation is found to be somewhat complementary to the baseline feature set, suggesting that they can be fused to achieve improved performance in speaker recognition.
Finally, this dissertation finishes with a number of potential extensions of the proposed approaches.
Keywords: feature enhancement, MLP, SOM, speaker basis selection, speaker recognition, biometric, authentication, verificatio
Recommended from our members
Evaluation and analysis of hybrid intelligent pattern recognition techniques for speaker identification
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.The rapid momentum of the technology progress in the recent years has led to a tremendous rise in the use of biometric authentication systems. The objective of this research is to investigate the problem
of identifying a speaker from its voice regardless of the content (i.e.
text-independent), and to design efficient methods of combining face and voice in producing a robust authentication system.
A novel approach towards speaker identification is developed using
wavelet analysis, and multiple neural networks including Probabilistic
Neural Network (PNN), General Regressive Neural Network (GRNN)and Radial Basis Function-Neural Network (RBF NN) with the AND
voting scheme. This approach is tested on GRID and VidTIMIT cor-pora and comprehensive test results have been validated with state-
of-the-art approaches. The system was found to be competitive and it improved the recognition rate by 15% as compared to the classical Mel-frequency Cepstral Coe±cients (MFCC), and reduced the recognition time by 40% compared to Back Propagation Neural Network (BPNN), Gaussian Mixture Models (GMM) and Principal Component Analysis (PCA).
Another novel approach using vowel formant analysis is implemented using Linear Discriminant Analysis (LDA). Vowel formant based speaker identification is best suitable for real-time implementation and requires only a few bytes of information to be stored for each speaker, making it both storage and time efficient. Tested on GRID and Vid-TIMIT, the proposed scheme was found to be 85.05% accurate when Linear Predictive Coding (LPC) is used to extract the vowel formants, which is much higher than the accuracy of BPNN and GMM. Since the proposed scheme does not require any training time other than creating a small database of vowel formants, it is faster as well. Furthermore, an increasing number of speakers makes it di±cult for BPNN and GMM to sustain their accuracy, but the proposed score-based methodology stays almost linear.
Finally, a novel audio-visual fusion based identification system is implemented using GMM and MFCC for speaker identi¯cation and PCA for face recognition. The results of speaker identification and face recognition are fused at different levels, namely the feature, score and decision levels. Both the score-level and decision-level (with OR voting) fusions were shown to outperform the feature-level fusion in terms of accuracy and error resilience. The result is in line with the distinct nature of the two modalities which lose themselves when combined at the feature-level. The GRID and VidTIMIT test results validate that
the proposed scheme is one of the best candidates for the fusion of
face and voice due to its low computational time and high recognition accuracy
Recommended from our members
Pattern mining approaches used in sensor-based biometric recognition: a review
Sensing technologies place significant interest in the use of biometrics for the recognition and assessment of individuals. Pattern mining techniques have established a critical step in the progress of sensor-based biometric systems that are capable of perceiving, recognizing and computing sensor data, being a technology that searches for the high-level information about pattern recognition from low-level sensor readings in order to construct an artificial substitute for human recognition. The design of a successful sensor-based biometric recognition system needs to pay attention to the different issues involved in processing variable data being - acquisition of biometric data from a sensor, data pre-processing, feature extraction, recognition and/or classification, clustering and validation. A significant number of approaches from image processing, pattern identification and machine learning have been used to process sensor data. This paper aims to deliver a state-of-the-art summary and present strategies for utilizing the broadly utilized pattern mining methods in order to identify the challenges as well as future research directions of sensor-based biometric systems
- …