758 research outputs found

    Realistic multi-microphone data simulation for distant speech recognition

    Full text link
    The availability of realistic simulated corpora is of key importance for the future progress of distant speech recognition technology. The reliability, flexibility and low computational cost of a data simulation process may ultimately allow researchers to train, tune and test different techniques in a variety of acoustic scenarios, avoiding the laborious effort of directly recording real data from the targeted environment. In the last decade, several simulated corpora have been released to the research community, including the data-sets distributed in the context of projects and international challenges, such as CHiME and REVERB. These efforts were extremely useful to derive baselines and common evaluation frameworks for comparison purposes. At the same time, in many cases they highlighted the need of a better coherence between real and simulated conditions. In this paper, we examine this issue and we describe our approach to the generation of realistic corpora in a domestic context. Experimental validation, conducted in a multi-microphone scenario, shows that a comparable performance trend can be observed with both real and simulated data across different recognition frameworks, acoustic models, as well as multi-microphone processing techniques.Comment: Proc. of Interspeech 201

    EXPERIMENTAL EVALUATION OF MODIFIED PHASE TRANSFORM FOR SOUND SOURCE DETECTION

    Get PDF
    The detection of sound sources with microphone arrays can be enhanced through processing individual microphone signals prior to the delay and sum operation. One method in particular, the Phase Transform (PHAT) has demonstrated improvement in sound source location images, especially in reverberant and noisy environments. Recent work proposed a modification to the PHAT transform that allows varying degrees of spectral whitening through a single parameter, andamp;acirc;, which has shown positive improvement in target detection in simulation results. This work focuses on experimental evaluation of the modified SRP-PHAT algorithm. Performance results are computed from actual experimental setup of an 8-element perimeter array with a receiver operating characteristic (ROC) analysis for detecting sound sources. The results verified simulation results of PHAT- andamp;acirc; in improving target detection probabilities. The ROC analysis demonstrated the relationships between various target types (narrowband and broadband), room reverberation levels (high and low) and noise levels (different SNR) with respect to optimal andamp;acirc;. Results from experiment strongly agree with those of simulations on the effect of PHAT in significantly improving detection performance for narrowband and broadband signals especially at low SNR and in the presence of high levels of reverberation

    Studies on noise robust automatic speech recognition

    Get PDF
    Noise in everyday acoustic environments such as cars, traffic environments, and cafeterias remains one of the main challenges in automatic speech recognition (ASR). As a research theme, it has received wide attention in conferences and scientific journals focused on speech technology. This article collection reviews both the classic and novel approaches suggested for noise robust ASR. The articles are literature reviews written for the spring 2009 seminar course on noise robust automatic speech recognition (course code T-61.6060) held at TKK

    IMPACT OF MICROPHONE POSITIONAL ERRORS ON SPEECH INTELLIGIBILITY

    Get PDF
    The speech of a person speaking in a noisy environment can be enhanced through electronic beamforming using spatially distributed microphones. As this approach demands precise information about the microphone locations, its application is limited in places where microphones must be placed quickly or changed on a regular basis. Highly precise calibration or measurement process can be tedious and time consuming. In order to understand tolerable limits on the calibration process, the impact of microphone position error on the intelligibility is examined. Analytical expressions are derived by modeling the microphone position errors as a zero mean uniform distribution. Experiments and simulations were performed to show relationships between precision of the microphone location measurement and loss in intelligibility. A variety of microphone array configurations and distracting sources (other interfering speech and white noise) are considered. For speech near the threshold of intelligibility, the results show that microphone position errors with standard deviations less than 1.5cm can limit losses in intelligibility to within 10% of the maximum (perfect microphone placement) for all the microphone distributions examined. Of different array distributions experimented, the linear array tends to be more vulnerable whereas the non-uniform 3D array showed a robust performance to positional errors

    Deep Learning for Distant Speech Recognition

    Full text link
    Deep learning is an emerging technology that is considered one of the most promising directions for reaching higher levels of artificial intelligence. Among the other achievements, building computers that understand speech represents a crucial leap towards intelligent machines. Despite the great efforts of the past decades, however, a natural and robust human-machine speech interaction still appears to be out of reach, especially when users interact with a distant microphone in noisy and reverberant environments. The latter disturbances severely hamper the intelligibility of a speech signal, making Distant Speech Recognition (DSR) one of the major open challenges in the field. This thesis addresses the latter scenario and proposes some novel techniques, architectures, and algorithms to improve the robustness of distant-talking acoustic models. We first elaborate on methodologies for realistic data contamination, with a particular emphasis on DNN training with simulated data. We then investigate on approaches for better exploiting speech contexts, proposing some original methodologies for both feed-forward and recurrent neural networks. Lastly, inspired by the idea that cooperation across different DNNs could be the key for counteracting the harmful effects of noise and reverberation, we propose a novel deep learning paradigm called network of deep neural networks. The analysis of the original concepts were based on extensive experimental validations conducted on both real and simulated data, considering different corpora, microphone configurations, environments, noisy conditions, and ASR tasks.Comment: PhD Thesis Unitn, 201

    Effects of errorless learning on the acquisition of velopharyngeal movement control

    Get PDF
    Session 1pSC - Speech Communication: Cross-Linguistic Studies of Speech Sound Learning of the Languages of Hong Kong (Poster Session)The implicit motor learning literature suggests a benefit for learning if errors are minimized during practice. This study investigated whether the same principle holds for learning velopharyngeal movement control. Normal speaking participants learned to produce hypernasal speech in either an errorless learning condition (in which the possibility for errors was limited) or an errorful learning condition (in which the possibility for errors was not limited). Nasality level of the participants’ speech was measured by nasometer and reflected by nasalance scores (in %). Errorless learners practiced producing hypernasal speech with a threshold nasalance score of 10% at the beginning, which gradually increased to a threshold of 50% at the end. The same set of threshold targets were presented to errorful learners but in a reversed order. Errors were defined by the proportion of speech with a nasalance score below the threshold. The results showed that, relative to errorful learners, errorless learners displayed fewer errors (50.7% vs. 17.7%) and a higher mean nasalance score (31.3% vs. 46.7%) during the acquisition phase. Furthermore, errorless learners outperformed errorful learners in both retention and novel transfer tests. Acknowledgment: Supported by The University of Hong Kong Strategic Research Theme for Sciences of Learning © 2012 Acoustical Society of Americapublished_or_final_versio

    Acoustic indoor localization employing code division multiple access

    Get PDF
    Thesis (Master)--Izmir Institute of Technology, Electronics and Communication Engineering, Izmir, 2010Includes bibliographical references (leaves: 107-108)Text in English; Abstract: Turkish and Englishxvi, 160 69 leavesIndoor localization becomes a demand that comes into prominence day by day. Although extensively used outdoor location systems have been proposed, they can not operate in indoor applications. Hence new investigations have been carried on for accurate indoor localization in the last decade. In this thesis, a new indoor location system, that aims to locate an entity within an accuracy of about 2 cm using ordinary and inexpensive off-the-shelf devices, has been proposed and an implementation has been applied to evaluate the system performance. Therefore, time of arrival measurements of acoustic signals, which are binary phase shift keying modulated Gold code sequences using direct sequence spread spectrum technique, are done. Direct sequence-code division multiple access is applied to perform simultaneous accurate distance measurements and provides immunity to noise and interference. Two methods have been proposed for the location estimation. The first method takes the average of four location estimates obtained by trilateration technique. In the second method, only a single robust position estimate is obtained using three distances while the least reliable fourth distance measurement is not taken into account. The system performance is evaluated at positions from two height levels using two sets of variables determined by experimental results. The precision distributions in the work area and the precision versus accuracy plots depict the system performance for different sets of variables. The proposed system provides location estimates of better than 2 cm accuracy within 99% precision. Eventually, created graphical user interface provides a user friendly environment to adjust the parameters

    Accurate acoustic ranging system using android smartphones

    Get PDF
    ACCURATE ACOUSTIC RANGING SYSTEM USING ANDROID SMARTPHONES By Mohammadbagher Fotouhi, Master of Science A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science at Virginia Commonwealth University Virginia Commonwealth University 2017 Major Director: Dr. Ruixin Niu, Associate Professor of Department of Electrical and Computer Engineering In this thesis, we present the design, implementation, and evaluation of an android ranging system, a high-accuracy acoustic-based ranging system which allows two android mobile phones to learn their physical distance from each other. In this system we propose a practical solution for accurate ranging based on acoustic communication between speakers and microphones on two smartphones. Using the audible-band acoustic signal with the Wi-Fi assistance without the sound disturbance is promising for large deployment. Our method is a pure software-based solution and uses only the most basic set of commodity hardware: a speaker, a microphone, and Wi-Fi communication. So it is readily applicable to many commercial-off-the-shelf mobile devices like cell phones. Our system is the result of several design goals, including user privacy, decentralized administration, and low cost. Rather than relying on any centralized management which tracks the user’s location to help them find their distance, our system helps devices learn their distance from each other without advertising their location information with any centralized management. Compared to alternatives that require special-purpose hardware or pre-existence of precision location infrastructure , our system is applicable on most of off-the-shelf components so it is a commodity-based solution will obviously have wider applications and is cost effective. Currently, two smartphones are used to estimate the distance between them through Wi-Fi and audio communications. The basic idea is estimating the distance between two phones by estimating the traveling time of audio signal from one phone to the other as the speed of sound is known. The preliminary results of ranging demonstrate that our algorithm could achieve high accuracy, and stable and reliable results for real time smartphone-based indoor ranging

    Simulation tool implementing centralized and distributed algorithms for tracking acoustic targets

    Get PDF
    The goal of this document is the implementation of a software tool for the simulation of the acoustic tracking problem over a wireless sensor network working in a centralized or distributed manner. Its Graphical User Interface (GUI) allows the user to configure the parameters associated to the diffusion adaptive algorithms implemented in the simulation tool, in order to offer a visual representation of the behavior of a real sensor network working with those settings. For illustration we ran several simulations, which allowed us to visualize the performance of different network configurations. The results obtained with the implemented simulation tool show it can be very helpful to study the audio target tracking problem and ultimately for the design of sensor networks that can guarantee certain performance criteria. Moreover, we have developed the code for the implementation of a real acoustictracking sensor network working in a centralized manner, using ©Libelium’sWaspmote™ sensor boards as the network nodes and using ©Libelium’s Meshlium-Xtreme™ as central node.Ingeniería de Sistemas Audiovisuale
    corecore