991 research outputs found

    Source Separation and DOA Estimation for Underdetermined Auditory Scene

    Get PDF

    Vision-Guided Robot Hearing

    Get PDF
    International audienceNatural human-robot interaction (HRI) in complex and unpredictable environments is important with many potential applicatons. While vision-based HRI has been thoroughly investigated, robot hearing and audio-based HRI are emerging research topics in robotics. In typical real-world scenarios, humans are at some distance from the robot and hence the sensory (microphone) data are strongly impaired by background noise, reverberations and competing auditory sources. In this context, the detection and localization of speakers plays a key role that enables several tasks, such as improving the signal-to-noise ratio for speech recognition, speaker recognition, speaker tracking, etc. In this paper we address the problem of how to detect and localize people that are both seen and heard. We introduce a hybrid deterministic/probabilistic model. The deterministic component allows us to map 3D visual data onto an 1D auditory space. The probabilistic component of the model enables the visual features to guide the grouping of the auditory features in order to form audiovisual (AV) objects. The proposed model and the associated algorithms are implemented in real-time (17 FPS) using a stereoscopic camera pair and two microphones embedded into the head of the humanoid robot NAO. We perform experiments with (i)~synthetic data, (ii)~publicly available data gathered with an audiovisual robotic head, and (iii)~data acquired using the NAO robot. The results validate the approach and are an encouragement to investigate how vision and hearing could be further combined for robust HRI

    A survey on acoustic positioning systems for location-based services

    Get PDF
    Positioning systems have become increasingly popular in the last decade for location-based services, such as navigation, and asset tracking and management. As opposed to outdoor positioning, where the global navigation satellite system became the standard technology, there is no consensus yet for indoor environments despite the availability of different technologies, such as radio frequency, magnetic field, visual light communications, or acoustics. Within these options, acoustics emerged as a promising alternative to obtain high-accuracy low-cost systems. Nevertheless, acoustic signals have to face very demanding propagation conditions, particularly in terms of multipath and Doppler effect. Therefore, even if many acoustic positioning systems have been proposed in the last decades, it remains an active and challenging topic. This article surveys the developed prototypes and commercial systems that have been presented since they first appeared around the 1980s to 2022. We classify these systems into different groups depending on the observable that they use to calculate the user position, such as the time-of-flight, the received signal strength, or the acoustic spectrum. Furthermore, we summarize the main properties of these systems in terms of accuracy, coverage area, and update rate, among others. Finally, we evaluate the limitations of these groups based on the link budget approach, which gives an overview of the system's coverage from parameters such as source and noise level, detection threshold, attenuation, and processing gain.Agencia Estatal de InvestigaciónResearch Council of Norwa

    Acoustic Echo Estimation using the model-based approach with Application to Spatial Map Construction in Robotics

    Get PDF

    Intelligent ultrasound hand gesture recognition system

    Get PDF
    With the booming development of technology, hand gesture recognition has become a hotspot in Human-Computer Interaction (HCI) systems. Ultrasound hand gesture recognition is an innovative method that has attracted ample interest due to its strong real-time performance, low cost, large field of view, and illumination independence. Well-investigated HCI applications include external digital pens, game controllers on smart mobile devices, and web browser control on laptops. This thesis probes gesture recognition systems on multiple platforms to study the behavior of system performance with various gesture features. Focused on this topic, the contributions of this thesis can be summarized from the perspectives of smartphone acoustic field and hand model simulation, real-time gesture recognition on smart devices with speed categorization algorithm, fast reaction gesture recognition based on temporal neural networks, and angle of arrival-based gesture recognition system. Firstly, a novel pressure-acoustic simulation model is developed to examine its potential for use in acoustic gesture recognition. The simulation model is creating a new system for acoustic verification, which uses simulations mimicking real-world sound elements to replicate a sound pressure environment as authentically as possible. This system is fine-tuned through sensitivity tests within the simulation and validate with real-world measurements. Following this, the study constructs novel simulations for acoustic applications, informed by the verified acoustic field distribution, to assess their effectiveness in specific devices. Furthermore, a simulation focused on understanding the effects of the placement of sound devices and hand-reflected sound waves is properly designed. Moreover, a feasibility test on phase control modification is conducted, revealing the practical applications and boundaries of this model. Mobility and system accuracy are two significant factors that determine gesture recognition performance. As smartphones have high-quality acoustic devices for developing gesture recognition, to achieve a portable gesture recognition system with high accuracy, novel algorithms were developed to distinguish gestures using smartphone built-in speakers and microphones. The proposed system adopts Short-Time-Fourier-Transform (STFT) and machine learning to capture hand movement and determine gestures by the pretrained neural network. To differentiate gesture speeds, a specific neural network was designed and set as part of the classification algorithm. The final accuracy rate achieves 96% among nine gestures and three speed levels. The proposed algorithms were evaluated comparatively through algorithm comparison, and the accuracy outperformed state-of-the-art systems. Furthermore, a fast reaction gesture recognition based on temporal neural networks was designed. Traditional ultrasound gesture recognition adopts convolutional neural networks that have flaws in terms of response time and discontinuous operation. Besides, overlap intervals in network processing cause cross-frame failures that greatly reduce system performance. To mitigate these problems, a novel fast reaction gesture recognition system that slices signals in short time intervals was designed. The proposed system adopted a novel convolutional recurrent neural network (CRNN) that calculates gesture features in a short time and combines features over time. The results showed the reaction time significantly reduced from 1s to 0.2s, and accuracy improved to 100% for six gestures. Lastly, an acoustic sensor array was built to investigate the angle information of performed gestures. The direction of a gesture is a significant feature for gesture classification, which enables the same gesture in different directions to represent different actions. Previous studies mainly focused on types of gestures and analyzing approaches (e.g., Doppler Effect and channel impulse response, etc.), while the direction of gestures was not extensively studied. An acoustic gesture recognition system based on both speed information and gesture direction was developed. The system achieved 94.9% accuracy among ten different gestures from two directions. The proposed system was evaluated comparatively through numerical neural network structures, and the results confirmed that incorporating additional angle information improved the system's performance. In summary, the work presented in this thesis validates the feasibility of recognizing hand gestures using remote ultrasonic sensing across multiple platforms. The acoustic simulation explores the smartphone acoustic field distribution and response results in the context of hand gesture recognition applications. The smartphone gesture recognition system demonstrates the accuracy of recognition through ultrasound signals and conducts an analysis of classification speed. The fast reaction system proposes a more optimized solution to address the cross-frame issue using temporal neural networks, reducing the response latency to 0.2s. The speed and angle-based system provides an additional feature for gesture recognition. The established work will accelerate the development of intelligent hand gesture recognition, enrich the available gesture features, and contribute to further research in various gestures and application scenarios

    Effects of errorless learning on the acquisition of velopharyngeal movement control

    Get PDF
    Session 1pSC - Speech Communication: Cross-Linguistic Studies of Speech Sound Learning of the Languages of Hong Kong (Poster Session)The implicit motor learning literature suggests a benefit for learning if errors are minimized during practice. This study investigated whether the same principle holds for learning velopharyngeal movement control. Normal speaking participants learned to produce hypernasal speech in either an errorless learning condition (in which the possibility for errors was limited) or an errorful learning condition (in which the possibility for errors was not limited). Nasality level of the participants’ speech was measured by nasometer and reflected by nasalance scores (in %). Errorless learners practiced producing hypernasal speech with a threshold nasalance score of 10% at the beginning, which gradually increased to a threshold of 50% at the end. The same set of threshold targets were presented to errorful learners but in a reversed order. Errors were defined by the proportion of speech with a nasalance score below the threshold. The results showed that, relative to errorful learners, errorless learners displayed fewer errors (50.7% vs. 17.7%) and a higher mean nasalance score (31.3% vs. 46.7%) during the acquisition phase. Furthermore, errorless learners outperformed errorful learners in both retention and novel transfer tests. Acknowledgment: Supported by The University of Hong Kong Strategic Research Theme for Sciences of Learning © 2012 Acoustical Society of Americapublished_or_final_versio

    Speech Recognition

    Get PDF
    Chapters in the first part of the book cover all the essential speech processing techniques for building robust, automatic speech recognition systems: the representation for speech signals and the methods for speech-features extraction, acoustic and language modeling, efficient algorithms for searching the hypothesis space, and multimodal approaches to speech recognition. The last part of the book is devoted to other speech processing applications that can use the information from automatic speech recognition for speaker identification and tracking, for prosody modeling in emotion-detection systems and in other speech processing applications that are able to operate in real-world environments, like mobile communication services and smart homes
    corecore