69 research outputs found

    Studies on noise robust automatic speech recognition

    Get PDF
    Noise in everyday acoustic environments such as cars, traffic environments, and cafeterias remains one of the main challenges in automatic speech recognition (ASR). As a research theme, it has received wide attention in conferences and scientific journals focused on speech technology. This article collection reviews both the classic and novel approaches suggested for noise robust ASR. The articles are literature reviews written for the spring 2009 seminar course on noise robust automatic speech recognition (course code T-61.6060) held at TKK

    Deep Learning for Distant Speech Recognition

    Full text link
    Deep learning is an emerging technology that is considered one of the most promising directions for reaching higher levels of artificial intelligence. Among the other achievements, building computers that understand speech represents a crucial leap towards intelligent machines. Despite the great efforts of the past decades, however, a natural and robust human-machine speech interaction still appears to be out of reach, especially when users interact with a distant microphone in noisy and reverberant environments. The latter disturbances severely hamper the intelligibility of a speech signal, making Distant Speech Recognition (DSR) one of the major open challenges in the field. This thesis addresses the latter scenario and proposes some novel techniques, architectures, and algorithms to improve the robustness of distant-talking acoustic models. We first elaborate on methodologies for realistic data contamination, with a particular emphasis on DNN training with simulated data. We then investigate on approaches for better exploiting speech contexts, proposing some original methodologies for both feed-forward and recurrent neural networks. Lastly, inspired by the idea that cooperation across different DNNs could be the key for counteracting the harmful effects of noise and reverberation, we propose a novel deep learning paradigm called network of deep neural networks. The analysis of the original concepts were based on extensive experimental validations conducted on both real and simulated data, considering different corpora, microphone configurations, environments, noisy conditions, and ASR tasks.Comment: PhD Thesis Unitn, 201

    雑音特性の変動を伴う多様な環境で実用可能な音声強調

    Get PDF
    筑波大学 (University of Tsukuba)201

    Evaluation of room acoustic qualities and defects by use of auralization

    Get PDF

    Speech processing using digital MEMS microphones

    Get PDF
    The last few years have seen the start of a unique change in microphones for consumer devices such as smartphones or tablets. Almost all analogue capacitive microphones are being replaced by digital silicon microphones or MEMS microphones. MEMS microphones perform differently to conventional analogue microphones. Their greatest disadvantage is significantly increased self-noise or decreased SNR, while their most significant benefits are ease of design and manufacturing and improved sensitivity matching. This thesis presents research on speech processing, comparing conventional analogue microphones with the newly available digital MEMS microphones. Specifically, voice activity detection, speaker diarisation (who spoke when), speech separation and speech recognition are looked at in detail. In order to carry out this research different microphone arrays were built using digital MEMS microphones and corpora were recorded to test existing algorithms and devise new ones. Some corpora that were created for the purpose of this research will be released to the public in 2013. It was found that the most commonly used VAD algorithm in current state-of-theart diarisation systems is not the best-performing one, i.e. MLP-based voice activity detection consistently outperforms the more frequently used GMM-HMM-based VAD schemes. In addition, an algorithm was derived that can determine the number of active speakers in a meeting recording given audio data from a microphone array of known geometry, leading to improved diarisation results. Finally, speech separation experiments were carried out using different post-filtering algorithms, matching or exceeding current state-of-the art results. The performance of the algorithms and methods presented in this thesis was verified by comparing their output using speech recognition tools and simple MLLR adaptation and the results are presented as word error rates, an easily comprehensible scale. To summarise, using speech recognition and speech separation experiments, this thesis demonstrates that the significantly reduced SNR of the MEMS microphone can be compensated for with well established adaptation techniques such as MLLR. MEMS microphones do not affect voice activity detection and speaker diarisation performance

    Evaluation of room acoustic qualities and defects by use of auralization

    Full text link
    corecore