978 research outputs found

    Spatial, Spectral, and Perceptual Nonlinear Noise Reduction for Hands-free Microphones in a Car

    Get PDF
    Speech enhancement in an automobile is a challenging problem because interference can come from engine noise, fans, music, wind, road noise, reverberation, echo, and passengers engaging in other conversations. Hands-free microphones make the situation worse because the strength of the desired speech signal reduces with increased distance between the microphone and talker. Automobile safety is improved when the driver can use a hands-free interface to phones and other devices instead of taking his eyes off the road. The demand for high quality hands-free communication in the automobile requires the introduction of more powerful algorithms. This thesis shows that a unique combination of five algorithms can achieve superior speech enhancement for a hands-free system when compared to beamforming or spectral subtraction alone. Several different designs were analyzed and tested before converging on the configuration that achieved the best results. Beamforming, voice activity detection, spectral subtraction, perceptual nonlinear weighting, and talker isolation via pitch tracking all work together in a complementary iterative manner to create a speech enhancement system capable of significantly enhancing real world speech signals. The following conclusions are supported by the simulation results using data recorded in a car and are in strong agreement with theory. Adaptive beamforming, like the Generalized Side-lobe Canceller (GSC), can be effectively used if the filters only adapt during silent data frames because too much of the desired speech is cancelled otherwise. Spectral subtraction removes stationary noise while perceptual weighting prevents the introduction of offensive audible noise artifacts. Talker isolation via pitch tracking can perform better when used after beamforming and spectral subtraction because of the higher accuracy obtained after initial noise removal. Iterating the algorithm once increases the accuracy of the Voice Activity Detection (VAD), which improves the overall performance of the algorithm. Placing the microphone(s) on the ceiling above the head and slightly forward of the desired talker appears to be the best location in an automobile based on the experiments performed in this thesis. Objective speech quality measures show that the algorithm removes a majority of the stationary noise in a hands-free environment of an automobile with relatively minimal speech distortion


    Get PDF
    The detection of sound sources with microphone arrays can be enhanced through processing individual microphone signals prior to the delay and sum operation. One method in particular, the Phase Transform (PHAT) has demonstrated improvement in sound source location images, especially in reverberant and noisy environments. Recent work proposed a modification to the PHAT transform that allows varying degrees of spectral whitening through a single parameter, andamp;acirc;, which has shown positive improvement in target detection in simulation results. This work focuses on experimental evaluation of the modified SRP-PHAT algorithm. Performance results are computed from actual experimental setup of an 8-element perimeter array with a receiver operating characteristic (ROC) analysis for detecting sound sources. The results verified simulation results of PHAT- andamp;acirc; in improving target detection probabilities. The ROC analysis demonstrated the relationships between various target types (narrowband and broadband), room reverberation levels (high and low) and noise levels (different SNR) with respect to optimal andamp;acirc;. Results from experiment strongly agree with those of simulations on the effect of PHAT in significantly improving detection performance for narrowband and broadband signals especially at low SNR and in the presence of high levels of reverberation

    Optimized Acoustic Localization with SRP-PHAT for Monitoring in Distributed Sensor Networks

    Get PDF
    Acoustic localization by means of sensor arrays has a variety of applications, from conference telephony to environment monitoring. Many of these tasks are appealing for implementation on embedded systems, however large dataflows and computational complexity of multi-channel signal processing impede the development of such systems. This paper proposes a method of acoustic localization targeted for distributed systems, such as Wireless Sensor Networks (WSN). The method builds on an optimized localization algorithm of Steered Response Power with Phase Transform (SRP-PHAT) and simplifies it further by reducing the initial search region, in which the sound source is contained. The sensor array is partitioned into sub-blocks, which may be implemented as independent nodes of WSN. For the region reduction two approaches are handled. One is based on Direction of Arrival estimation and the other - on multilateration. Both approaches are tested on real signals for speaker localization and industrial machinery monitoring applications. Experiment results indicate the method’s potency in both these tasks

    Mathematical modelling ano optimization strategies for acoustic source localization in reverberant environments

    Get PDF
    La presente Tesis se centra en el uso de técnicas modernas de optimización y de procesamiento de audio para la localización precisa y robusta de personas dentro de un entorno reverberante dotado con agrupaciones (arrays) de micrófonos. En esta tesis se han estudiado diversos aspectos de la localización sonora, incluyendo el modelado, la algoritmia, así como el calibrado previo que permite usar los algoritmos de localización incluso cuando la geometría de los sensores (micrófonos) es desconocida a priori. Las técnicas existentes hasta ahora requerían de un número elevado de micrófonos para obtener una alta precisión en la localización. Sin embargo, durante esta tesis se ha desarrollado un nuevo método que permite una mejora de más del 30\% en la precisión de la localización con un número reducido de micrófonos. La reducción en el número de micrófonos es importante ya que se traduce directamente en una disminución drástica del coste y en un aumento de la versatilidad del sistema final. Adicionalmente, se ha realizado un estudio exhaustivo de los fenómenos que afectan al sistema de adquisición y procesado de la señal, con el objetivo de mejorar el modelo propuesto anteriormente. Dicho estudio profundiza en el conocimiento y modelado del filtrado PHAT (ampliamente utilizado en localización acústica) y de los aspectos que lo hacen especialmente adecuado para localización. Fruto del anterior estudio, y en colaboración con investigadores del instituto IDIAP (Suiza), se ha desarrollado un sistema de auto-calibración de las posiciones de los micrófonos a partir del ruido difuso presente en una sala en silencio. Esta aportación relacionada con los métodos previos basados en la coherencia. Sin embargo es capaz de reducir el ruido atendiendo a parámetros físicos previamente conocidos (distancia máxima entre los micrófonos). Gracias a ello se consigue una mejor precisión utilizando un menor tiempo de cómputo. El conocimiento de los efectos del filtro PHAT ha permitido crear un nuevo modelo que permite la representación 'sparse' del típico escenario de localización. Este tipo de representación se ha demostrado ser muy conveniente para localización, permitiendo un enfoque sencillo del caso en el que existen múltiples fuentes simultáneas. La última aportación de esta tesis, es el de la caracterización de las Matrices TDOA (Time difference of arrival -Diferencia de tiempos de llegada, en castellano-). Este tipo de matrices son especialmente útiles en audio pero no están limitadas a él. Además, este estudio transciende a la localización con sonido ya que propone métodos de reducción de ruido de las medias TDOA basados en una representación matricial 'low-rank', siendo útil, además de en localización, en técnicas tales como el beamforming o el autocalibrado

    Robust indoor speaker recognition in a network of audio and video sensors

    Get PDF
    AbstractSituational awareness is achieved naturally by the human senses of sight and hearing in combination. Automatic scene understanding aims at replicating this human ability using microphones and cameras in cooperation. In this paper, audio and video signals are fused and integrated at different levels of semantic abstractions. We detect and track a speaker who is relatively unconstrained, i.e., free to move indoors within an area larger than the comparable reported work, which is usually limited to round table meetings. The system is relatively simple: consisting of just 4 microphone pairs and a single camera. Results show that the overall multimodal tracker is more reliable than single modality systems, tolerating large occlusions and cross-talk. System evaluation is performed on both single and multi-modality tracking. The performance improvement given by the audio–video integration and fusion is quantified in terms of tracking precision and accuracy as well as speaker diarisation error rate and precision–recall (recognition). Improvements vs. the closest works are evaluated: 56% sound source localisation computational cost over an audio only system, 8% speaker diarisation error rate over an audio only speaker recognition unit and 36% on the precision–recall metric over an audio–video dominant speaker recognition method

    NASA patent abstracts bibliography: A continuing bibliography. Section 1: Abstracts (supplement 16)

    Get PDF
    Abstracts are cited for 138 patents and patent applications introduced into the NASA scientific and technical information system during the period July 1979 through December 1979. Each entry cib consists of a citation, an abstract, and in most cases, a key illustration selected from the patent or patent application

    Acoustic source localisation and tracking using microphone arrays

    Get PDF
    This thesis considers the domain of acoustic source localisation and tracking in an indoor environment. Acoustic tracking has applications in security, human-computer interaction, and the diarisation of meetings. Source localisation and tracking is typically a computationally expensive task, making it hard to process on-line, especially as the number of speakers to track increases. Much of the literature considers single-source localisation, however a practical system must be able to cope with multiple speakers, possibly active simultaneously, without knowing beforehand how many speakers are present. Techniques are explored for reducing the computational requirements of an acoustic localisation system. Techniques to localise and track multiple active sources are also explored, and developed to be more computationally efficient than the current state of the art algorithms, whilst being able to track more speakers. The first contribution is the modification of a recent single-speaker source localisation technique, which improves the localisation speed. This is achieved by formalising the implicit assumption by the modified algorithm that speaker height is uniformly distributed on the vertical axis. Estimating height information effectively reduces the search space where speakers have previously been detected, but who may have moved over the horizontal-plane, and are unlikely to have significantly changed height. This is developed to allow multiple non-simultaneously active sources to be located. This is applicable when the system is given information from a secondary source such as a set of cameras allowing the efficient identification of active speakers rather than just the locations of people in the environment. The next contribution of the thesis is the application of a particle swarm technique to significantly further decrease the computational cost of localising a single source in an indoor environment, compared the state of the art. Several variants of the particle swarm technique are explored, including novel variants designed specifically for localising acoustic sources. Each method is characterised in terms of its computational complexity as well as the average localisation error. The techniques’ responses to acoustic noise are also considered, and they are found to be robust. A further contribution is made by using multi-optima swarm techniques to localise multiple simultaneously active sources. This makes use of techniques which extend the single-source particle swarm techniques to finding multiple optima of the acoustic objective function. Several techniques are investigated and their performance in terms of localisation accuracy and computational complexity is characterised. Consideration is also given to how these metrics change when an increasing number of active speakers are to be localised. Finally, the application of the multi-optima localisation methods as an input to a multi-target tracking system is presented. Tracking multiple speakers is a more complex task than tracking single acoustic source, as observations of audio activity must be associated in some way with distinct speakers. The tracker used is known to be a relatively efficient technique, and the nature of the multi-optima output format is modified to allow the application of this technique to the task of speaker tracking

    Passive acoustic method for aircraft localization

    Get PDF
    The present thesis investigates a passive acoustic method to locate maneuvering aircraft. The method is based on the acoustical Doppler effect, as a particular effect of the signals received by a mesh of spatially distributed microphones. A one-dimensional version of the ambiguity function allows for the calculation of the frequency stretch factor that occurs between the sound signals received by a pair of microphones. The mathematical expression for this frequency stretch is a function of the aircraft position and velocity, both of them being estimated by a genetic algorithm. The method requires only a minimum of seven microphones and the prior knowledge of the aircraft position and velocity at a given time. The advantages of the method are that it is suitable for all kind of aircraft, not only propeller-driven, and is not restricted to low heights above the ground. Its applicability could be, for instance, to supplement aircraft noise monitoring systems or to supervise small airports activities. This doctoral research includes the theoretical background of the method as well as the detailed description of its implementation. To assess the performance of the method, results from computer simulations are discussed. First of all, noise propagation is considered in a lossless medium, thus only geometrical spreading influences the sound emitted by the source traveling to the receivers. The accuracy of each step of the method has been evaluated and the results obtained reveal acceptable performance. Due to the large distances between microphones and the aircraft in flight, the atmospheric attenuation plays a major roll. Therefore, computer simulations have also been carried out under the assumption of an homogeneous but non lossless medium to evaluate the influence of the atmospheric absorption on the aircraft location. Under these conditions, the performance of the method with respect to the microphone distribution is discussed. Moreover, the location method has also been tested for a possible inaccuracy on the microphones synchronization. Finally, an outdoor experimental validation of the acoustic method has been carried out with a radio control airplane. The description of the experimental test is detailed in the present work as well as the results obtained.La tesi desenvolupa, implementa i valida un mètode acústic per a la localització d’aeronaus. El mètode es basa en l’efecte Doppler que es percep en els registres de diferents micròfons distribuïts al voltant d’un aeroport. La versió u-dimensional de la funció d’ambigüitat permet el còmput del factor de compressió o expansió que sorgeix entre els registres freqüencials d’ un parell de micròfons. Aquest factor Freqüencial es pot expressar matemàticament en funció de la posició i velocitat de l’aeronau, que s’estimen en aquesta tesi a partir d’algoritmes genètics. El mètode només requereix de set micròfons i el coneixement previ de la posició de l’avió en un moment donat. Els principals avantatges del mètode són que és un mètode vàlid per qualsevol tipus d’aeronau, no només per avions d’hèlix o helicòpters, i que no restringeix a vols de baixa alçada. La seva aplicació podria ser, per exemple, complementar un sistema de monitorització de soroll aeri o bé supervisar l’activitat dels aeroports petits que no disposen de sistemes de radar. Aquesta investigació inclou el desenvolupament teòric del mètode així com la descripció detallada de la seva implementació. Per tal d’avaluar l’efectivitat del mètode, es presenten i analitzen resultats obtinguts a partir de diverses simulacions. Com a primer cas, es considera que el so es propaga en un medi conservatiu, és a dir, el so que es propaga des de la font fins als receptors només es veu afectat per l’atenuació geomètrica. Sota aquest model senzill de propagació, s’ha analitzat l’accuracy de cada un dels passos del mètode i els resultats obtinguts posen de manifest una bona ... del mètode. Tenint en compte que les distàncies entre els micròfons i l’avió en vol són llargues, l’atenuació atmosfèrica influeix també en la propagació del so emès per l’avió. Per tant, el segon cas de simulacions que s’han dut a terme considera un medi de propagació homogeni i no conservatiu amb l’objectiu d’avaluar la influència de l’atenuació atmosfèrica en la localització acústica de l’aeronau. Sota aquestes condicions, també s’ha analitzat l’eficàcia del mètode en funció de la distribució de micròfons. A més, el mètode de localització s’ha posat a prova sota possibles errors en la sincronització dels set micròfons. Finalment, s’ha dut a terme una validació experimental del mètode amb una avioneta de radio control al Club Aeronàutic Egara. La descripció d’aquest test experimental es detalla en la tesis així com els resultats obtinguts que demostren la validesa satisfactòria del mètode
    • …