758 research outputs found
Realistic multi-microphone data simulation for distant speech recognition
The availability of realistic simulated corpora is of key importance for the
future progress of distant speech recognition technology. The reliability,
flexibility and low computational cost of a data simulation process may
ultimately allow researchers to train, tune and test different techniques in a
variety of acoustic scenarios, avoiding the laborious effort of directly
recording real data from the targeted environment.
In the last decade, several simulated corpora have been released to the
research community, including the data-sets distributed in the context of
projects and international challenges, such as CHiME and REVERB. These efforts
were extremely useful to derive baselines and common evaluation frameworks for
comparison purposes. At the same time, in many cases they highlighted the need
of a better coherence between real and simulated conditions.
In this paper, we examine this issue and we describe our approach to the
generation of realistic corpora in a domestic context. Experimental validation,
conducted in a multi-microphone scenario, shows that a comparable performance
trend can be observed with both real and simulated data across different
recognition frameworks, acoustic models, as well as multi-microphone processing
techniques.Comment: Proc. of Interspeech 201
EXPERIMENTAL EVALUATION OF MODIFIED PHASE TRANSFORM FOR SOUND SOURCE DETECTION
The detection of sound sources with microphone arrays can be enhanced through processing individual microphone signals prior to the delay and sum operation. One method in particular, the Phase Transform (PHAT) has demonstrated improvement in sound source location images, especially in reverberant and noisy environments. Recent work proposed a modification to the PHAT transform that allows varying degrees of spectral whitening through a single parameter, andamp;acirc;, which has shown positive improvement in target detection in simulation results. This work focuses on experimental evaluation of the modified SRP-PHAT algorithm. Performance results are computed from actual experimental setup of an 8-element perimeter array with a receiver operating characteristic (ROC) analysis for detecting sound sources. The results verified simulation results of PHAT- andamp;acirc; in improving target detection probabilities. The ROC analysis demonstrated the relationships between various target types (narrowband and broadband), room reverberation levels (high and low) and noise levels (different SNR) with respect to optimal andamp;acirc;. Results from experiment strongly agree with those of simulations on the effect of PHAT in significantly improving detection performance for narrowband and broadband signals especially at low SNR and in the presence of high levels of reverberation
Studies on noise robust automatic speech recognition
Noise in everyday acoustic environments such as cars, traffic environments, and cafeterias remains one of the main challenges in automatic speech recognition (ASR). As a research theme, it has received wide attention in conferences and scientific journals focused on speech technology. This article collection reviews both the classic and novel approaches suggested for noise robust ASR. The articles are literature reviews written for the spring 2009 seminar course on noise robust automatic speech recognition (course code T-61.6060) held at TKK
IMPACT OF MICROPHONE POSITIONAL ERRORS ON SPEECH INTELLIGIBILITY
The speech of a person speaking in a noisy environment can be enhanced through electronic beamforming using spatially distributed microphones. As this approach demands precise information about the microphone locations, its application is limited in places where microphones must be placed quickly or changed on a regular basis. Highly precise calibration or measurement process can be tedious and time consuming. In order to understand tolerable limits on the calibration process, the impact of microphone position error on the intelligibility is examined. Analytical expressions are derived by modeling the microphone position errors as a zero mean uniform distribution. Experiments and simulations were performed to show relationships between precision of the microphone location measurement and loss in intelligibility. A variety of microphone array configurations and distracting sources (other interfering speech and white noise) are considered. For speech near the threshold of intelligibility, the results show that microphone position errors with standard deviations less than 1.5cm can limit losses in intelligibility to within 10% of the maximum (perfect microphone placement) for all the microphone distributions examined. Of different array distributions experimented, the linear array tends to be more vulnerable whereas the non-uniform 3D array showed a robust performance to positional errors
Deep Learning for Distant Speech Recognition
Deep learning is an emerging technology that is considered one of the most
promising directions for reaching higher levels of artificial intelligence.
Among the other achievements, building computers that understand speech
represents a crucial leap towards intelligent machines. Despite the great
efforts of the past decades, however, a natural and robust human-machine speech
interaction still appears to be out of reach, especially when users interact
with a distant microphone in noisy and reverberant environments. The latter
disturbances severely hamper the intelligibility of a speech signal, making
Distant Speech Recognition (DSR) one of the major open challenges in the field.
This thesis addresses the latter scenario and proposes some novel techniques,
architectures, and algorithms to improve the robustness of distant-talking
acoustic models. We first elaborate on methodologies for realistic data
contamination, with a particular emphasis on DNN training with simulated data.
We then investigate on approaches for better exploiting speech contexts,
proposing some original methodologies for both feed-forward and recurrent
neural networks. Lastly, inspired by the idea that cooperation across different
DNNs could be the key for counteracting the harmful effects of noise and
reverberation, we propose a novel deep learning paradigm called network of deep
neural networks. The analysis of the original concepts were based on extensive
experimental validations conducted on both real and simulated data, considering
different corpora, microphone configurations, environments, noisy conditions,
and ASR tasks.Comment: PhD Thesis Unitn, 201
Effects of errorless learning on the acquisition of velopharyngeal movement control
Session 1pSC - Speech Communication: Cross-Linguistic Studies of Speech Sound Learning of the Languages of Hong Kong (Poster Session)The implicit motor learning literature suggests a benefit for learning if errors are minimized during practice. This study investigated whether the same principle holds for learning velopharyngeal movement control. Normal speaking participants learned to produce hypernasal speech in either an errorless learning condition (in which the possibility for errors was limited) or an errorful learning condition (in which the possibility for errors was not limited). Nasality level of the participants’ speech was measured by nasometer and reflected by nasalance scores (in %). Errorless learners practiced producing hypernasal speech with a threshold nasalance score of 10% at the beginning, which gradually increased to a threshold of 50% at the end. The same set of threshold targets were presented to errorful learners but in a reversed order. Errors were defined by the proportion of speech with a nasalance score below the threshold. The results showed that, relative to errorful learners, errorless learners displayed fewer errors (50.7% vs. 17.7%) and a higher mean nasalance score (31.3% vs. 46.7%) during the acquisition phase. Furthermore, errorless learners outperformed errorful learners in both retention and novel transfer tests. Acknowledgment: Supported by The University of Hong Kong Strategic Research Theme for Sciences of Learning © 2012 Acoustical Society of Americapublished_or_final_versio
Acoustic indoor localization employing code division multiple access
Thesis (Master)--Izmir Institute of Technology, Electronics and Communication Engineering, Izmir, 2010Includes bibliographical references (leaves: 107-108)Text in English; Abstract: Turkish and Englishxvi, 160 69 leavesIndoor localization becomes a demand that comes into prominence day by day. Although extensively used outdoor location systems have been proposed, they can not operate in indoor applications. Hence new investigations have been carried on for accurate indoor localization in the last decade. In this thesis, a new indoor location system, that aims to locate an entity within an accuracy of about 2 cm using ordinary and inexpensive off-the-shelf devices, has been proposed and an implementation has been applied to evaluate the system performance. Therefore, time of arrival measurements of acoustic signals, which are binary phase shift keying modulated Gold code sequences using direct sequence spread spectrum technique, are done. Direct sequence-code division multiple access is applied to perform simultaneous accurate distance measurements and provides immunity to noise and interference. Two methods have been proposed for the location estimation. The first method takes the average of four location estimates obtained by trilateration technique. In the second method, only a single robust position estimate is obtained using three distances while the least reliable fourth distance measurement is not taken into account. The system performance is evaluated at positions from two height levels using two sets of variables determined by experimental results. The precision distributions in the work area and the precision versus accuracy plots depict the system performance for different sets of variables. The proposed system provides location estimates of better than 2 cm accuracy within 99% precision. Eventually, created graphical user interface provides a user friendly environment to adjust the parameters
Accurate acoustic ranging system using android smartphones
ACCURATE ACOUSTIC RANGING SYSTEM USING ANDROID SMARTPHONES
By Mohammadbagher Fotouhi, Master of Science
A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science at Virginia Commonwealth University
Virginia Commonwealth University 2017
Major Director: Dr. Ruixin Niu, Associate Professor of Department of Electrical and Computer Engineering
In this thesis, we present the design, implementation, and evaluation of an android ranging system, a high-accuracy acoustic-based ranging system which allows two android mobile phones to learn their physical distance from each other.
In this system we propose a practical solution for accurate ranging based on acoustic communication between speakers and microphones on two smartphones. Using the audible-band acoustic signal with the Wi-Fi assistance without the sound disturbance is promising for large deployment. Our method is a pure software-based solution and uses only the most basic set of commodity hardware: a speaker, a microphone, and Wi-Fi communication. So it is readily applicable to many commercial-off-the-shelf mobile devices like cell phones.
Our system is the result of several design goals, including user privacy, decentralized administration, and low cost. Rather than relying on any centralized management which tracks the user’s location to help them find their distance, our system helps devices learn their distance from each other without advertising their location information with any centralized management.
Compared to alternatives that require special-purpose hardware or pre-existence of precision location infrastructure , our system is applicable on most of off-the-shelf components so it is a commodity-based solution will obviously have wider applications and is cost effective.
Currently, two smartphones are used to estimate the distance between them through Wi-Fi and audio communications. The basic idea is estimating the distance between two phones by estimating the traveling time of audio signal from one phone to the other as the speed of sound is known. The preliminary results of ranging demonstrate that our algorithm could achieve high accuracy, and stable and reliable results for real time smartphone-based indoor ranging
Simulation tool implementing centralized and distributed algorithms for tracking acoustic targets
The goal of this document is the implementation of a software tool for the
simulation of the acoustic tracking problem over a wireless sensor network working
in a centralized or distributed manner. Its Graphical User Interface (GUI) allows
the user to configure the parameters associated to the diffusion adaptive algorithms
implemented in the simulation tool, in order to offer a visual representation of the
behavior of a real sensor network working with those settings. For illustration we
ran several simulations, which allowed us to visualize the performance of different
network configurations. The results obtained with the implemented simulation tool
show it can be very helpful to study the audio target tracking problem and ultimately
for the design of sensor networks that can guarantee certain performance criteria.
Moreover, we have developed the code for the implementation of a real acoustictracking
sensor network working in a centralized manner, using ©Libelium’sWaspmote™
sensor boards as the network nodes and using ©Libelium’s Meshlium-Xtreme™ as central
node.Ingeniería de Sistemas Audiovisuale
- …