294 research outputs found
Spatio-Temporal Analysis of Spontaneous Speech with Microphone Arrays
Accurate detection, localization and tracking of multiple moving speakers permits a wide spectrum of applications. Techniques are required that are versatile, robust to environmental variations, and not constraining for non-technical end-users. Based on distant recording of spontaneous multiparty conversations, this thesis focuses on the use of microphone arrays to address the question Who spoke where and when?. The speed, the versatility and the robustness of the proposed techniques are tested on a variety of real indoor recordings, including multiple moving speakers as well as seated speakers in meetings. Optimized implementations are provided in most cases. We propose to discretize the physical space into a few sectors, and for each time frame, to determine which sectors contain active acoustic sources (Where? When?). A topological interpretation of beamforming is proposed, which permits both to evaluate the average acoustic energy in a sector for a negligible cost, and to locate precisely a speaker within an active sector. One additional contribution that goes beyond the eld of microphone arrays is a generic, automatic threshold selection method, which does not require any training data. On the speaker detection task, the new approach is dramatically superior to the more classical approach where a threshold is set on training data. We use the new approach into an integrated system for multispeaker detection-localization. Another generic contribution is a principled, threshold-free, framework for short-term clustering of multispeaker location estimates, which also permits to detect where and when multiple trajectories intersect. On multi-party meeting recordings, using distant microphones only, short-term clustering yields a speaker segmentation performance similar to that of close-talking microphones. The resulting short speech segments are then grouped into speaker clusters (Who?), through an extension of the Bayesian Information Criterion to merge multiple modalities. On meeting recordings, the speaker clustering performance is signicantly improved by merging the classical mel-cepstrum information with the short-term speaker location information. Finally, a close analysis of the speaker clustering results suggests that future research should investigate the effect of human acoustic radiation characteristics on the overall transmission channel, when a speaker is a few meters away from a microphone
A Spectrogram Model for Enhanced Source Localization and Noise-Robust ASR
This paper proposes a simple, computationally efficient 2-mixture model approach to discrimination between speech and background noise. It is directly derived from observations on real data, and can be used in a fully unsupervised manner, with the EM algorithm. A first application to sector-based, joint audio source localization and detection, using multiple microphones, confirms that the model can provide major enhancement. A second application to the single channel speech recognition task in a noisy environment yields major improvement on stationary noise and promising results on non-stationary noise
A Frequency-Domain Silence Noise Model
This paper proposes a simple, computationally efficient 2-mixture model approach to discrimination between speech and background noise. It is directly derived from observations on real data, and can be used in a fully unsupervised manner, with the EM algorithm. A first application to sector-based, joint audio source localization and detection, using multiple microphones, confirms that the model can provide major enhancement. A second application to the single channel speech recognition task in a noisy environment yields major improvement on stationary noise and promising results on non-stationary noise
Speech processing using digital MEMS microphones
The last few years have seen the start of a unique change in microphones for consumer
devices such as smartphones or tablets. Almost all analogue capacitive microphones
are being replaced by digital silicon microphones or MEMS microphones.
MEMS microphones perform differently to conventional analogue microphones. Their
greatest disadvantage is significantly increased self-noise or decreased SNR, while
their most significant benefits are ease of design and manufacturing and improved sensitivity
matching.
This thesis presents research on speech processing, comparing conventional analogue
microphones with the newly available digital MEMS microphones. Specifically, voice
activity detection, speaker diarisation (who spoke when), speech separation and speech
recognition are looked at in detail.
In order to carry out this research different microphone arrays were built using digital
MEMS microphones and corpora were recorded to test existing algorithms and devise
new ones. Some corpora that were created for the purpose of this research will be
released to the public in 2013.
It was found that the most commonly used VAD algorithm in current state-of-theart
diarisation systems is not the best-performing one, i.e. MLP-based voice activity
detection consistently outperforms the more frequently used GMM-HMM-based VAD
schemes. In addition, an algorithm was derived that can determine the number of active
speakers in a meeting recording given audio data from a microphone array of known
geometry, leading to improved diarisation results.
Finally, speech separation experiments were carried out using different post-filtering
algorithms, matching or exceeding current state-of-the art results.
The performance of the algorithms and methods presented in this thesis was verified
by comparing their output using speech recognition tools and simple MLLR adaptation
and the results are presented as word error rates, an easily comprehensible scale.
To summarise, using speech recognition and speech separation experiments, this thesis
demonstrates that the significantly reduced SNR of the MEMS microphone can be
compensated for with well established adaptation techniques such as MLLR. MEMS
microphones do not affect voice activity detection and speaker diarisation performance
Recent Advances in Signal Processing
The signal processing task is a very critical issue in the majority of new technological inventions and challenges in a variety of applications in both science and engineering fields. Classical signal processing techniques have largely worked with mathematical models that are linear, local, stationary, and Gaussian. They have always favored closed-form tractability over real-world accuracy. These constraints were imposed by the lack of powerful computing tools. During the last few decades, signal processing theories, developments, and applications have matured rapidly and now include tools from many areas of mathematics, computer science, physics, and engineering. This book is targeted primarily toward both students and researchers who want to be exposed to a wide variety of signal processing techniques and algorithms. It includes 27 chapters that can be categorized into five different areas depending on the application at hand. These five categories are ordered to address image processing, speech processing, communication systems, time-series analysis, and educational packages respectively. The book has the advantage of providing a collection of applications that are completely independent and self-contained; thus, the interested reader can choose any chapter and skip to another without losing continuity
The role of zero crossings in speech recognition and processing
Imperial Users onl
Proceedings of the 7th Sound and Music Computing Conference
Proceedings of the SMC2010 - 7th Sound and Music Computing Conference, July 21st - July 24th 2010
Biometrics
Biometrics uses methods for unique recognition of humans based upon one or more intrinsic physical or behavioral traits. In computer science, particularly, biometrics is used as a form of identity access management and access control. It is also used to identify individuals in groups that are under surveillance. The book consists of 13 chapters, each focusing on a certain aspect of the problem. The book chapters are divided into three sections: physical biometrics, behavioral biometrics and medical biometrics. The key objective of the book is to provide comprehensive reference and text on human authentication and people identity verification from both physiological, behavioural and other points of view. It aims to publish new insights into current innovations in computer systems and technology for biometrics development and its applications. The book was reviewed by the editor Dr. Jucheng Yang, and many of the guest editors, such as Dr. Girija Chetty, Dr. Norman Poh, Dr. Loris Nanni, Dr. Jianjiang Feng, Dr. Dongsun Park, Dr. Sook Yoon and so on, who also made a significant contribution to the book
- …