115 research outputs found

    Object Tracking from Audio and Video data using Linear Prediction method

    Get PDF
    Microphone arrays and video surveillance by camera are widely used for detection and tracking of a moving speaker. In this project, object tracking was planned using multimodal fusion i.e., Audio-Visual perception. Source localisation can be done by GCC-PHAT, GCC-ML for time delay estimation delay estimation. These methods are based on spectral content of the speech signals that can be effected by noise and reverberation. Video tracking can be done using Kalman filter or Particle filter. Therefore Linear Prediction method is used for audio and video tracking. Linear prediction in source localisation use features related to excitation source information of speech which are less effected by noise. Hence by using this excitation source information, time delays are estimated and the results are compared with GCC PHAT method. The dataset obtained from [20] is used in video tracking a single moving object captured through stationary camera. Then for object detection, projection histogram is done followed by linear prediction for tracking and the corresponding results are compared with Kalman filter method

    Mathematical modelling ano optimization strategies for acoustic source localization in reverberant environments

    Get PDF
    La presente Tesis se centra en el uso de técnicas modernas de optimización y de procesamiento de audio para la localización precisa y robusta de personas dentro de un entorno reverberante dotado con agrupaciones (arrays) de micrófonos. En esta tesis se han estudiado diversos aspectos de la localización sonora, incluyendo el modelado, la algoritmia, así como el calibrado previo que permite usar los algoritmos de localización incluso cuando la geometría de los sensores (micrófonos) es desconocida a priori. Las técnicas existentes hasta ahora requerían de un número elevado de micrófonos para obtener una alta precisión en la localización. Sin embargo, durante esta tesis se ha desarrollado un nuevo método que permite una mejora de más del 30\% en la precisión de la localización con un número reducido de micrófonos. La reducción en el número de micrófonos es importante ya que se traduce directamente en una disminución drástica del coste y en un aumento de la versatilidad del sistema final. Adicionalmente, se ha realizado un estudio exhaustivo de los fenómenos que afectan al sistema de adquisición y procesado de la señal, con el objetivo de mejorar el modelo propuesto anteriormente. Dicho estudio profundiza en el conocimiento y modelado del filtrado PHAT (ampliamente utilizado en localización acústica) y de los aspectos que lo hacen especialmente adecuado para localización. Fruto del anterior estudio, y en colaboración con investigadores del instituto IDIAP (Suiza), se ha desarrollado un sistema de auto-calibración de las posiciones de los micrófonos a partir del ruido difuso presente en una sala en silencio. Esta aportación relacionada con los métodos previos basados en la coherencia. Sin embargo es capaz de reducir el ruido atendiendo a parámetros físicos previamente conocidos (distancia máxima entre los micrófonos). Gracias a ello se consigue una mejor precisión utilizando un menor tiempo de cómputo. El conocimiento de los efectos del filtro PHAT ha permitido crear un nuevo modelo que permite la representación 'sparse' del típico escenario de localización. Este tipo de representación se ha demostrado ser muy conveniente para localización, permitiendo un enfoque sencillo del caso en el que existen múltiples fuentes simultáneas. La última aportación de esta tesis, es el de la caracterización de las Matrices TDOA (Time difference of arrival -Diferencia de tiempos de llegada, en castellano-). Este tipo de matrices son especialmente útiles en audio pero no están limitadas a él. Además, este estudio transciende a la localización con sonido ya que propone métodos de reducción de ruido de las medias TDOA basados en una representación matricial 'low-rank', siendo útil, además de en localización, en técnicas tales como el beamforming o el autocalibrado

    HOMOGENEOUS AND HETEROGENEOUS SENSORS FOR COMBUSTION SYSTEMS

    Get PDF
    Due to increasingly stringent emission regulations, it is important to develop clean combustors. Combustion behavior is very complex in almost all practical power plant systems. Measurement of temperature, pressure, local flow, and chemical composition inside the flame provides critical information to develop cleaner combustors. This would result in significant improvement in energy efficiency and reduce the environmental impact. A high density sensor network system would assist in understanding the various ongoing processes occurring within the combustors. This dissertation is focused on how much additional information can be gathered from multiple sensors. Four different time delay estimation methods (using cross correlation, phase transform, generalized cross correlation with maximum-likelihood estimation, and average square difference function) were examined using two sensors. Phase transform offered better results to calculate the time delay between a given pair of microphones. This has the potential to determine local noise generation sources from within flows and flames with the additional information on local noise generation source. As a step towards the development of a sensor network, different sensors were examined. A micro-thermocouple, microphone and microphone probes were utilized to enhance understanding of the flame with detailed information on the various ongoing processes in a premixed swirl flame. High frequency temperature and pressure measurements were used to identify the thermal and acoustic characteristics of the flame and combustor. The local distributions of fluctuating pressure and temperature were measured in different regions, in and around the flame. Pressure fluctuation showed significant variation in different directions for the combustive case relative to non-combustive flow. Also a comparison of the pressure and temperature fluctuations revealed that maximum temperature fluctuations occur mostly near to the visible flame boundary while maximum pressure fluctuation occur further away from the flame. Acoustic data from the premixed swirl combustor showed variation in fuel to air ratio changes the spatial distribution of noise as measured by different sensors placed around the combustor. A comparison of different sensors showed that a single sensor does not capture all the information with changes in fuel to air ratio

    Acoustic localization of people in reverberant environments using deep learning techniques

    Get PDF
    La localización de las personas a partir de información acústica es cada vez más importante en aplicaciones del mundo real como la seguridad, la vigilancia y la interacción entre personas y robots. En muchos casos, es necesario localizar con precisión personas u objetos en función del sonido que generan, especialmente en entornos ruidosos y reverberantes en los que los métodos de localización tradicionales pueden fallar, o en escenarios en los que los métodos basados en análisis de vídeo no son factibles por no disponer de ese tipo de sensores o por la existencia de oclusiones relevantes. Por ejemplo, en seguridad y vigilancia, la capacidad de localizar con precisión una fuente de sonido puede ayudar a identificar posibles amenazas o intrusos. En entornos sanitarios, la localización acústica puede utilizarse para controlar los movimientos y actividades de los pacientes, especialmente los que tienen problemas de movilidad. En la interacción entre personas y robots, los robots equipados con capacidades de localización acústica pueden percibir y responder mejor a su entorno, lo que permite interacciones más naturales e intuitivas con los humanos. Por lo tanto, el desarrollo de sistemas de localización acústica precisos y robustos utilizando técnicas avanzadas como el aprendizaje profundo es de gran importancia práctica. Es por esto que en esta tesis doctoral se aborda dicho problema en tres líneas de investigación fundamentales: (i) El diseño de un sistema extremo a extremo (end-to-end) basado en redes neuronales capaz de mejorar las tasas de localización de sistemas ya existentes en el estado del arte. (ii) El diseño de un sistema capaz de localizar a uno o varios hablantes simultáneos en entornos con características y con geometrías de arrays de sensores diferentes sin necesidad de re-entrenar. (iii) El diseño de sistemas capaces de refinar los mapas de potencia acústica necesarios para localizar a las fuentes acústicas para conseguir una mejor localización posterior. A la hora de evaluar la consecución de dichos objetivos se han utilizado diversas bases de datos realistas con características diferentes, donde las personas involucradas en las escenas pueden actuar sin ningún tipo de restricción. Todos los sistemas propuestos han sido evaluados bajo las mismas condiciones consiguiendo superar en términos de error de localización a los sistemas actuales del estado del arte

    Multilevel B-Splines-Based Learning Approach for Sound Source Localization

    Full text link
    © 2001-2012 IEEE. In this paper, a new learning approach for sound source localization is presented using ad hoc either synchronous or asynchronous distributed microphone networks based on the time differences of arrival (TDOA) estimation. It is first to propose a new concept in which the coordinates of a sound source location are defined as the functions of TDOAs, computing for each pair of microphone signals in the network. Then, given a set of pre-recorded sound measurements and their corresponding source locations, the multilevel B-splines-based learning model is proposed to be trained by the input of the known TDOAs and the output of the known coordinates of the sound source locations. For a new acoustic source, if its sound signals are recorded, the correspondingly computed TDOAs can be fed into the learned model to predict the location of the new source. Superiorities of the proposed method are to incorporate the acoustic characteristics of a targeted environment and even remaining uncertainty of TDOA estimations into the learning model before conducting its prediction and to be applicable for both synchronous or asynchronous distributed microphone sensor networks. The effectiveness of the proposed algorithm in terms of localization accuracy and computational cost in comparisons with the state-of-the-art methods was extensively validated on both synthetic simulation experiments as well as in three real-life environments

    Estimation of dominant sound source with three microphone array

    Get PDF
    Several real-life applications require a system that would reliably locate and track a single speaker. This can be achieved by using visual or audio data. Processing of an incoming signal to obtain the location of a source is known as Direction of Arrival (DOA) estimation. The basic setting in audio based DOA estimation is a set of microphones situated in known locations. The signal is captured by each of the microphones, and the signals are analyzed by one of the following methods: steered beamformer based method; subspace based method; or time delay estimation based method. The aim of this thesis is to review different classes of existing methods for DOA estimation and to create an application for visualizing the dominant sound source direction around a three-microphone array in real time. In practice, the objective is to enhance an algorithm for a DOA estimation proposed by Nokia Research Center. As visualization of dominant sound source creates a basis for many audio related applications, a practical example of such applications is developed. The proposed algorithm is based on time delay estimation method and utilizes cross correlation. Several enhancements are developed to the initial algorithm to improve its performance. The proposed algorithm is evaluated by comparing it with one of the most common methods, general cross correlation with phase transform (GCC PHAT). The evaluation includes testing all algorithms on three types of signals: speech signal arriving from a stationary location, speech signal arriving from a moving source, and a transient signal. Additionally, using the proposed algorithm, a computer application with a video tracker is developed. The results show that the initially proposed algorithm does not perform as well as GCC PHAT. The enhancements improve the algorithm performance notably, although they did not bring the efficiency of the algorithm to the level of GCC PHAT when processing speech signals. In case of transient signals, the enhanced algorithm was superior to GCC PHAT. The video tracker was able to successfully track the dominant sound source

    Informed Sound Source Localization for Hearing Aid Applications

    Get PDF

    An Online Solution for Localisation, Tracking and Separation of Moving Speech Sources

    Get PDF
    The problem of separating a time varying number of speech sources in a room is difficult to solve. The challenge lies in estimating the number and the location of these speech sources. Furthermore, the tracked speech sources need to be separated. This thesis proposes a solution which utilises the Random Finite Set approach to estimate the number and location of these speech sources and subsequently separate the speech source mixture via time frequency masking
    corecore