2,550 research outputs found

    Spatial, Spectral, and Perceptual Nonlinear Noise Reduction for Hands-free Microphones in a Car

    Get PDF
    Speech enhancement in an automobile is a challenging problem because interference can come from engine noise, fans, music, wind, road noise, reverberation, echo, and passengers engaging in other conversations. Hands-free microphones make the situation worse because the strength of the desired speech signal reduces with increased distance between the microphone and talker. Automobile safety is improved when the driver can use a hands-free interface to phones and other devices instead of taking his eyes off the road. The demand for high quality hands-free communication in the automobile requires the introduction of more powerful algorithms. This thesis shows that a unique combination of five algorithms can achieve superior speech enhancement for a hands-free system when compared to beamforming or spectral subtraction alone. Several different designs were analyzed and tested before converging on the configuration that achieved the best results. Beamforming, voice activity detection, spectral subtraction, perceptual nonlinear weighting, and talker isolation via pitch tracking all work together in a complementary iterative manner to create a speech enhancement system capable of significantly enhancing real world speech signals. The following conclusions are supported by the simulation results using data recorded in a car and are in strong agreement with theory. Adaptive beamforming, like the Generalized Side-lobe Canceller (GSC), can be effectively used if the filters only adapt during silent data frames because too much of the desired speech is cancelled otherwise. Spectral subtraction removes stationary noise while perceptual weighting prevents the introduction of offensive audible noise artifacts. Talker isolation via pitch tracking can perform better when used after beamforming and spectral subtraction because of the higher accuracy obtained after initial noise removal. Iterating the algorithm once increases the accuracy of the Voice Activity Detection (VAD), which improves the overall performance of the algorithm. Placing the microphone(s) on the ceiling above the head and slightly forward of the desired talker appears to be the best location in an automobile based on the experiments performed in this thesis. Objective speech quality measures show that the algorithm removes a majority of the stationary noise in a hands-free environment of an automobile with relatively minimal speech distortion

    A robust sequential hypothesis testing method for brake squeal localisation

    Get PDF
    This contribution deals with the in situ detection and localisation of brake squeal in an automobile. As brake squeal is emitted from regions known a priori, i.e., near the wheels, the localisation is treated as a hypothesis testing problem. Distributed microphone arrays, situated under the automobile, are used to capture the directional properties of the sound field generated by a squealing brake. The spatial characteristics of the sampled sound field is then used to formulate the hypothesis tests. However, in contrast to standard hypothesis testing approaches of this kind, the propagation environment is complex and time-varying. Coupled with inaccuracies in the knowledge of the sensor and source positions as well as sensor gain mismatches, modelling the sound field is difficult and standard approaches fail in this case. A previously proposed approach implicitly tried to account for such incomplete system knowledge and was based on ad hoc likelihood formulations. The current paper builds upon this approach and proposes a second approach, based on more solid theoretical foundations, that can systematically account for the model uncertainties. Results from tests in a real setting show that the proposed approach is more consistent than the prior state-of-the-art. In both approaches, the tasks of detection and localisation are decoupled for complexity reasons. The localisation (hypothesis testing) is subject to a prior detection of brake squeal and identification of the squeal frequencies. The approaches used for the detection and identification of squeal frequencies are also presented. The paper, further, briefly addresses some practical issues related to array design and placement. (C) 2019 Author(s)

    Detection of Nonstationary Noise and Improved Voice Activity Detection in an Automotive Hands-free Environment

    Get PDF
    Speech processing in the automotive environment is a challenging problem due to the presence of powerful and unpredictable nonstationary noise. This thesis addresses two detection problems involving both nonstationary noise signals and nonstationary desired signals. Two detectors are developed: one to detect passing vehicle noise in the presence of speech and one to detect speech in the presence of passing vehicle noise. The latter is then measured against a state-of-the-art voice activity detector used in telephony. The process of compiling a library of recordings in the automobile to facilitate this research is also detailed

    VOICE USER INTERFACE BASED PERMISSION GRANT SYSTEM FOR VEHICLES

    Get PDF
    A vehicle (e.g., automobile, motorcycle, a bus, a recreational vehicle (RV), a semi-trailer truck, a tractor or other type of farm equipment, a train, a plane, a helicopter, etc.) may include a so-called “head unit” that provides a voice user interface (VUI) by which to enable spoken human interaction with the head unit to respond to requests for permission (e.g., to access user personal data, to enable the usage of third-party services, etc.). For example, responsive to detecting that an action to be performed has not been granted permission, the head unit may produce (e.g., via one or more speakers) an audio prompt requesting the required permission. A user may answer the audio prompt with an audio input in the form of human speech, which the head unit may receive (e.g., via one or more microphones). The head unit may parse the audio input using speech recognition (e.g., a natural language understanding module) to identify a valid input (e.g., grant or deny permission to perform an action, request additional information, etc.) to which the audio input corresponds and, responsive to identifying a valid input, the head unit may perform the action (e.g., granting or denying permission to perform the action, providing additional information, etc.) associated with the valid input. In this way, the head unit may enable the user to control the granting of permissions via the VUI, which may be particularly beneficial in vehicle settings in which the user is operating the vehicle, as the hands-free, eyes-free user experience may reduce distractions to the user while operating the vehicle and thereby promote safety

    In Car Audio

    Get PDF
    This chapter presents implementations of advanced in Car Audio Applications. The system is composed by three main different applications regarding the In Car listening and communication experience. Starting from a high level description of the algorithms, several implementations on different levels of hardware abstraction are presented, along with empirical results on both the design process undergone and the performance results achieved

    A Defensive Driving Course for the Language Lab

    Get PDF

    Integration of a voice recognition system in a social robot

    Get PDF
    Human-Robot Interaction (HRI) 1 is one of the main fields in the study and research of robotics. Within this field, dialog systems and interaction by voice play a very important role. When speaking about human- robot natural dialog we assume that the robot has the capability to accurately recognize the utterance what the human wants to transmit verbally and even its semantic meaning, but this is not always achieved. In this paper we describe the steps and requirements that we went through in order to endow the personal social robot Maggie, developed in the University Carlos III of Madrid, with the capability of understanding the natural language spoken by any human. We have analyzed the different possibilities offered by current software/hardware alternatives by testing them in real environments. We have obtained accurate data related to the speech recognition capabilities in different environments, using the most modern audio acquisition systems and analyzing not so typical parameters as user age, sex, intonation, volume and language. Finally we propose a new model to classify recognition results as accepted and rejected, based in a second ASR opinion. This new approach takes into account the pre-calculated success rate in noise intervals for each recognition framework decreasing false positives and false negatives rate.The funds have provided by the Spanish Government through the project called `Peer to Peer Robot-Human Interaction'' (R2H), of MEC (Ministry of Science and Education), and the project “A new approach to social robotics'' (AROS), of MICINN (Ministry of Science and Innovation). The research leading to these results has received funding from the RoboCity2030-II-CM project (S2009/DPI-1559), funded by Programas de Actividades I+D en la Comunidad de Madrid and cofunded by Structural Funds of the EU
    • …
    corecore