339 research outputs found

    A Geometric Deep Learning Approach to Sound Source Localization and Tracking

    Get PDF
    La localización y el tracking de fuentes sonoras mediante agrupaciones de micrófonos es un problema que, pese a llevar décadas siendo estudiado, permanece abierto. En los últimos años, modelos basados en deep learning han superado el estado del arte que había sido establecido por las técnicas clásicas de procesado de señal, pero estos modelos todavía presentan problemas para trabajar en espacios con alta reverberación o para realizar el tracking de varias fuentes sonoras, especialmente cuando no es posible aplicar ningún criterio para clasificarlas u ordenarlas. En esta tesis, se proponen nuevos modelos que, basados en las ideas del Geometric Deep Learning, suponen un avance en el estado del arte para las situaciones mencionadas previamente.Los modelos propuestos utilizan como entrada mapas de potencia acústica calculados con el algoritmo SRP-PHAT, una técnica clásica de procesado de señal que permite estimar la energía acústica recibida desde cualquier dirección del espacio. Además, también proponemos una nueva técnica para suprimir analíticamente el efecto de una fuente en las funciones de correlación cruzada usadas para calcular los mapas SRP-PHAT. Basándonos en técnicas de banda estrecha, se demuestra que es posible proyectar las funciones de correlación cruzada de las señales capturadas por una agrupación de micrófonos a un espacio ortogonal a una dirección dada simplemente usando una combinación lineal de las funciones originales con retardos temporales. La técnica propuesta puede usarse para diseñar sistemas iterativos de localización de múltiples fuentes que, tras localizar la fuente con mayor energía en las funciones de correlación cruzada o en los mapas SRP-PHAT, la cancelen para poder encontrar otras fuentes que estuvieran enmascaradas por ella.Antes de poder entrenar modelos de deep learning necesitamos datos. Esto, en el caso de seguir un esquema de aprendizaje supervisado, supone un dataset de grabaciones de audio multicanal con la posición de las fuentes etiquetada con precisión. Pese a que existen algunos datasets con estas características, estos no son lo suficientemente extensos para entrenar una red neuronal y los entornos acústicos que incluyen no son suficientemente variados. Para solventar el problema de la falta de datos, presentamos una técnica para simular escenas acústicas con una o varias fuentes en movimiento y, para realizar estas simulaciones conforme son necesarias durante el entrenamiento de la red, presentamos la que es, que sepamos, la primera librería de software libre para la simulación de acústica de salas con aceleración por GPU. Tal y como queda demostrado en esta tesis, esta librería es más de dos órdenes de magnitud más rápida que otras librerías del estado del arte.La idea principal del Geometric Deep Learning es que los modelos deberían compartir las simetrías (i.e. las invarianzas y equivarianzas) de los datos y el problema que se quiere resolver. Para la estimación de la dirección de llegada de una única fuente, el uso de mapas SRP-PHAT como entrada de nuestros modelos hace que la equivarianza a las rotaciones sea obvia y, tras presentar una primera aproximación usando redes convolucionales tridimensionales, presentamos un modelo basado en convoluciones icosaédricas que son capaces de aproximar la equivarianza al grupo continuo de rotaciones esféricas por la equivarianza al grupo discreto de las 60 simetrías del icosaedro. En la tesis se demuestra que los mapas SRP-PHAT son una característica de entrada mucho más robusta que los espectrogramas que se usan típicamente en muchos modelos del estado del arte y que el uso de las convoluciones icosaédricas, combinado con una nueva función softargmax que obtiene una salida de regresión a partir del resultado de una red convolucional interpretándolo como una distribución de probabilidad y calculando su valor esperado, permite reducir enormemente el número de parámetros entrenables de los modelos sin reducir la precisión de sus estimaciones.Cuando queremos realizar el tracking de varias fuentes en movimiento y no podemos aplicar ningún criterio para ordenarlas o clasificarlas, el problema se vuelve invariante a las permutaciones de las estimaciones, por lo que no podemos compararlas directamente con las etiquetas de referencia dado que no podemos esperar que sigan el mismo orden. Este tipo de modelos se han entrenado típicamente usando estrategias de entrenamiento invariantes a las permutaciones, pero estas normalmente no penalizan los cambios de identidad por lo que los modelos entrenados con ellas no mantienen la identidad de cada fuente de forma consistente. Para resolver este problema, en esta tesis proponemos una nueva estrategia de entrenamiento, a la que llamamos sliding permutation invariant training (sPIT), que es capaz de optimizar todas las características que podemos esperar de un sistema de tracking de múltiples fuentes: la precisión de sus estimaciones de dirección de llegada, la exactitud de sus detecciones y la consistencia de las identidades asignadas a cada fuente.Finalmente, proponemos un nuevo tipo de red recursiva que usa conjuntos de vectores en lugar de vectores para representar su entrada y su estado y que es invariante a las permutaciones de los elementos del conjunto de entrada y equivariante a las del conjunto de estado. En esta tesis se muestra como este es el comportamiento que deberíamos esperar de un sistema de tracking que toma como entradas las estimaciones de un modelo de localización multifuente y se compara el rendimiento de estas redes recursivas invariantes a las permutaciones con redes recursivas GRU convencionales para aplicaciones de tracking de fuentes sonoras.The localization and tracking of sound sources using microphone arrays is a problem that, even if it has attracted attention from the signal processing research community for decades, remains open. In recent years, deep learning models have surpassed the state-of-the-art that had been established by classic signal processing techniques, but these models still struggle with handling rooms with strong reverberations or tracking multiple sources that dynamically appear and disappear, especially when we cannot apply any criteria to classify or order them. In this thesis, we follow the ideas of the Geometric Deep Learning framework to propose new models and techniques that mean an advance of the state-of-the-art in the aforementioned scenarios. As the input of our models, we use acoustic power maps computed using the SRP-PHAT algorithm, a classic signal processing technique that allows us to estimate the acoustic energy received from any direction of the space and, therefore, compute arbitrary-shaped power maps. In addition, we also propose a new technique to analytically cancel a source from the generalized cross-correlations used to compute the SRP-PHAT maps. Based on previous narrowband cancellation techniques, we prove that we can project the cross-correlation functions of the signals captured by a microphone array into a space orthogonal to a given direction by just computing a linear combination of time-shifted versions of the original cross-correlations. The proposed cancellation technique can be used to design iterative multi-source localization systems where, after having found the strongest source in the generalized cross-correlation functions or in the SRP-PHAT maps, we can cancel it and find new sources that were previously masked by thefirst source. Before being able to train deep learning models we need data, which, in the case of following a supervised learning approach, means a dataset of multichannel recordings with the position of the sources accurately labeled. Although there exist some datasets like this, they are not large enough to train a neural network and the acoustic environments they include are not diverse enough. To overcome this lack of real data, we present a technique to simulate acoustic scenes with one or several moving sound sources and, to be able to perform these simulations as they are needed during the training, we present what is, to the best of our knowledge, the first free and open source room acoustics simulation library with GPU acceleration. As we prove in this thesis, the presented library is more than two orders of magnitude faster than other state-of-the-art CPU libraries. The main idea of the Geometric Deep Learning philosophy is that the models should fit the symmetries (i.e. the invariances and equivariances) of the data and the problem we want to solve. For single-source direction of arrival estimation, the use of SRP-PHAT maps as inputs of our models makes the rotational equivariance of the problem undeniably clear and, after a first approach using 3D convolutional neural networks, we present a model using icosahedral convolutions that approximate the equivariance to the continuous group of spherical rotations by the discrete group of the 60 icosahedral symmetries. We prove that the SRP-PHAT maps are a much more robust input feature than the spectrograms typically used in many state-of-the-art models and that the use of the icosahedral convolutions, combined with a new soft-argmax function that obtains a regression output from the output of the convolutional neural network by interpreting it as a probability distribution and computing its expected value, allows us to dramatically reduce the number of trainable parameters of the models without losing accuracy in their estimations. When we want to track multiple moving sources and we cannot use any criteria to order or classify them, the problem becomes invariant to the permutations of the estimates, so we cannot directly compare them with the ground truth labels since we cannot expect them to be in the same order. This kind of models has typically been trained using permutation invariant training strategies, but these strategies usually do not penalize the identity switches and the models trained with them do not keep the identity of every source consistent during the tracking. To solve this issue, we propose a new training strategy, which we call sliding permutation invariant training, that is able to optimize all the features that we could expect from a multi-source tracking system: the precision of the direction of arrival estimates, the accuracy of the source detections, and the consistency of the assigned identities. Finally, we propose a new kind of recursive neural network that, instead of using vectors as their input and their state, uses sets of vectors and is invariant to the permutation of the elements of the input set and equivariant to the permutations of the elements of the state set. We show how this is the behavior that we should expect from a tracking model which takes as inputs the estimates of a multi-source localization model and compare these permutation-invariant recursive neural networks with the conventional gated recurrent units for sound source tracking applications.<br /

    Sensor Signal and Information Processing II

    Get PDF
    In the current age of information explosion, newly invented technological sensors and software are now tightly integrated with our everyday lives. Many sensor processing algorithms have incorporated some forms of computational intelligence as part of their core framework in problem solving. These algorithms have the capacity to generalize and discover knowledge for themselves and learn new information whenever unseen data are captured. The primary aim of sensor processing is to develop techniques to interpret, understand, and act on information contained in the data. The interest of this book is in developing intelligent signal processing in order to pave the way for smart sensors. This involves mathematical advancement of nonlinear signal processing theory and its applications that extend far beyond traditional techniques. It bridges the boundary between theory and application, developing novel theoretically inspired methodologies targeting both longstanding and emergent signal processing applications. The topic ranges from phishing detection to integration of terrestrial laser scanning, and from fault diagnosis to bio-inspiring filtering. The book will appeal to established practitioners, along with researchers and students in the emerging field of smart sensors processing

    Scattering Center Extraction and Recognition Based on ESPRIT Algorithm

    Get PDF
    Inverse Synthetic Aperture Radar (ISAR) generates high quality radar images even in low visibility. And it provides important physical features for space target recognition and location. This thesis focuses on ISAR rapid imaging, scattering center information extraction, and target classification. Based on the principle of Fourier imaging, the backscattering field of radar target is obtained by physical optics (PO) algorithm, and the relation between scattering field and objective function is deduced. According to the resolution formula, the incident parameters of electromagnetic wave are set reasonably. The interpolation method is used to realize three-dimensional (3D) simulation of aircraft target, and the results are compared with direct imaging results. CLEAN algorithm extracts scattering center information effectively. But due to the limitation of resolution parameters, traditional imaging can’t meet the actual demand. Therefore, the super-resolution Estimation of Signal Parameters via Rotational Invariance Techniques (ESPRIT) algorithm is used to obtain spatial target location information. The signal subspace and noise subspace are orthogonal to each other. By combining spatial smoothing method with ESPRIT algorithm, the physical characteristics of geometric target scattering center are obtained accurately. In particular, the proposed method is validated on complex 3D aircraft targets and it proves that this method is applied to most scattering mechanisms. The distribution of scattering centers reflects the geometric information of the target. Therefore, the electromagnetic image to be recognized and ESPRIT image are matched by the domain matching method. And the classification results under different radii are obtained. In addition, because the neural network can extract rich image features, the improved ALEX network is used to classify and recognize target data processed by ESPRIT. It proves that ESPRIT algorithm can be used to expand the existing datasets and prepare for future identification of targets in real environments. Final a visual classification system is constructed to visually display the results

    Modelling, Simulation and Data Analysis in Acoustical Problems

    Get PDF
    Modelling and simulation in acoustics is currently gaining importance. In fact, with the development and improvement of innovative computational techniques and with the growing need for predictive models, an impressive boost has been observed in several research and application areas, such as noise control, indoor acoustics, and industrial applications. This led us to the proposal of a special issue about “Modelling, Simulation and Data Analysis in Acoustical Problems”, as we believe in the importance of these topics in modern acoustics’ studies. In total, 81 papers were submitted and 33 of them were published, with an acceptance rate of 37.5%. According to the number of papers submitted, it can be affirmed that this is a trending topic in the scientific and academic community and this special issue will try to provide a future reference for the research that will be developed in coming years

    Efficient algorithms and data structures for compressive sensing

    Get PDF
    Wegen der kontinuierlich anwachsenden Anzahl von Sensoren, und den stetig wachsenden Datenmengen, die jene produzieren, stößt die konventielle Art Signale zu verarbeiten, beruhend auf dem Nyquist-Kriterium, auf immer mehr Hindernisse und Probleme. Die kürzlich entwickelte Theorie des Compressive Sensing (CS) formuliert das Versprechen einige dieser Hindernisse zu beseitigen, indem hier allgemeinere Signalaufnahme und -rekonstruktionsverfahren zum Einsatz kommen können. Dies erlaubt, dass hierbei einzelne Abtastwerte komplexer strukturierte Informationen über das Signal enthalten können als dies bei konventiellem Nyquistsampling der Fall ist. Gleichzeitig verändert sich die Signalrekonstruktion notwendigerweise zu einem nicht-linearen Vorgang und ebenso müssen viele Hardwarekonzepte für praktische Anwendungen neu überdacht werden. Das heißt, dass man zwischen der Menge an Information, die man über Signale gewinnen kann, und dem Aufwand für das Design und Betreiben eines Signalverarbeitungssystems abwägen kann und muss. Die hier vorgestellte Arbeit trägt dazu bei, dass bei diesem Abwägen CS mehr begünstigt werden kann, indem neue Resultate vorgestellt werden, die es erlauben, dass CS einfacher in der Praxis Anwendung finden kann, wobei die zu erwartende Leistungsfähigkeit des Systems theoretisch fundiert ist. Beispielsweise spielt das Konzept der Sparsity eine zentrale Rolle, weshalb diese Arbeit eine Methode präsentiert, womit der Grad der Sparsity eines Vektors mittels einer einzelnen Beobachtung geschätzt werden kann. Wir zeigen auf, dass dieser Ansatz für Sparsity Order Estimation zu einem niedrigeren Rekonstruktionsfehler führt, wenn man diesen mit einer Rekonstruktion vergleicht, welcher die Sparsity des Vektors unbekannt ist. Um die Modellierung von Signalen und deren Rekonstruktion effizienter zu gestalten, stellen wir das Konzept von der matrixfreien Darstellung linearer Operatoren vor. Für die einfachere Anwendung dieser Darstellung präsentieren wir eine freie Softwarearchitektur und demonstrieren deren Vorzüge, wenn sie für die Rekonstruktion in einem CS-System genutzt wird. Konkret wird der Nutzen dieser Bibliothek, einerseits für das Ermitteln von Defektpositionen in Prüfkörpern mittels Ultraschall, und andererseits für das Schätzen von Streuern in einem Funkkanal aus Ultrabreitbanddaten, demonstriert. Darüber hinaus stellen wir für die Verarbeitung der Ultraschalldaten eine Rekonstruktionspipeline vor, welche Daten verarbeitet, die im Frequenzbereich Unterabtastung erfahren haben. Wir beschreiben effiziente Algorithmen, die bei der Modellierung und der Rekonstruktion zum Einsatz kommen und wir leiten asymptotische Resultate für die benötigte Anzahl von Messwerten, sowie die zu erwartenden Lokalisierungsgenauigkeiten der Defekte her. Wir zeigen auf, dass das vorgestellte System starke Kompression zulässt, ohne die Bildgebung und Defektlokalisierung maßgeblich zu beeinträchtigen. Für die Lokalisierung von Streuern mittels Ultrabreitbandradaren stellen wir ein CS-System vor, welches auf einem Random Demodulators basiert. Im Vergleich zu existierenden Messverfahren ist die hieraus resultierende Schätzung der Kanalimpulsantwort robuster gegen die Effekte von zeitvarianten Funkkanälen. Um den inhärenten Modellfehler, den gitterbasiertes CS begehen muss, zu beseitigen, zeigen wir auf wie Atomic Norm Minimierung es erlaubt ohne die Einschränkung auf ein endliches und diskretes Gitter R-dimensionale spektrale Komponenten aus komprimierten Beobachtungen zu schätzen. Hierzu leiten wir eine R-dimensionale Variante des ADMM her, welcher dazu in der Lage ist die Signalkovarianz in diesem allgemeinen Szenario zu schätzen. Weiterhin zeigen wir, wie dieser Ansatz zur Richtungsschätzung mit realistischen Antennenarraygeometrien genutzt werden kann. In diesem Zusammenhang präsentieren wir auch eine Methode, welche mittels Stochastic gradient descent Messmatrizen ermitteln kann, die sich gut für Parameterschätzung eignen. Die hieraus resultierenden Kompressionsverfahren haben die Eigenschaft, dass die Schätzgenauigkeit über den gesamten Parameterraum ein möglichst uniformes Verhalten zeigt. Zuletzt zeigen wir auf, dass die Kombination des ADMM und des Stochastic Gradient descent das Design eines CS-Systems ermöglicht, welches in diesem gitterfreien Szenario wünschenswerte Eigenschaften hat.Along with the ever increasing number of sensors, which are also generating rapidly growing amounts of data, the traditional paradigm of sampling adhering the Nyquist criterion is facing an equally increasing number of obstacles. The rather recent theory of Compressive Sensing (CS) promises to alleviate some of these drawbacks by proposing to generalize the sampling and reconstruction schemes such that the acquired samples can contain more complex information about the signal than Nyquist samples. The proposed measurement process is more complex and the reconstruction algorithms necessarily need to be nonlinear. Additionally, the hardware design process needs to be revisited as well in order to account for this new acquisition scheme. Hence, one can identify a trade-off between information that is contained in individual samples of a signal and effort during development and operation of the sensing system. This thesis addresses the necessary steps to shift the mentioned trade-off more to the favor of CS. We do so by providing new results that make CS easier to deploy in practice while also maintaining the performance indicated by theoretical results. The sparsity order of a signal plays a central role in any CS system. Hence, we present a method to estimate this crucial quantity prior to recovery from a single snapshot. As we show, this proposed Sparsity Order Estimation method allows to improve the reconstruction error compared to an unguided reconstruction. During the development of the theory we notice that the matrix-free view on the involved linear mappings offers a lot of possibilities to render the reconstruction and modeling stage much more efficient. Hence, we present an open source software architecture to construct these matrix-free representations and showcase its ease of use and performance when used for sparse recovery to detect defects from ultrasound data as well as estimating scatterers in a radio channel using ultra-wideband impulse responses. For the former of these two applications, we present a complete reconstruction pipeline when the ultrasound data is compressed by means of sub-sampling in the frequency domain. Here, we present the algorithms for the forward model, the reconstruction stage and we give asymptotic bounds for the number of measurements and the expected reconstruction error. We show that our proposed system allows significant compression levels without substantially deteriorating the imaging quality. For the second application, we develop a sampling scheme to acquire the channel Impulse Response (IR) based on a Random Demodulator that allows to capture enough information in the recorded samples to reliably estimate the IR when exploiting sparsity. Compared to the state of the art, this in turn allows to improve the robustness to the effects of time-variant radar channels while also outperforming state of the art methods based on Nyquist sampling in terms of reconstruction error. In order to circumvent the inherent model mismatch of early grid-based compressive sensing theory, we make use of the Atomic Norm Minimization framework and show how it can be used for the estimation of the signal covariance with R-dimensional parameters from multiple compressive snapshots. To this end, we derive a variant of the ADMM that can estimate this covariance in a very general setting and we show how to use this for direction finding with realistic antenna geometries. In this context we also present a method based on a Stochastic gradient descent iteration scheme to find compression schemes that are well suited for parameter estimation, since the resulting sub-sampling has a uniform effect on the whole parameter space. Finally, we show numerically that the combination of these two approaches yields a well performing grid-free CS pipeline

    Orthogonal frequency division multiplexing multiple-input multiple-output automotive radar with novel signal processing algorithms

    Get PDF
    Advanced driver assistance systems that actively assist the driver based on environment perception achieved significant advances in recent years. Along with this development, autonomous driving became a major research topic that aims ultimately at development of fully automated, driverless vehicles. Since such applications rely on environment perception, their ever increasing sophistication imposes growing demands on environmental sensors. Specifically, the need for reliable environment sensing necessitates the development of more sophisticated, high-performance radar sensors. A further vital challenge in terms of increased radar interference arises with the growing market penetration of the vehicular radar technology. To address these challenges, in many respects novel approaches and radar concepts are required. As the modulation is one of the key factors determining the radar performance, the research of new modulation schemes for automotive radar becomes essential. A topic that emerged in the last years is the radar operating with digitally generated waveforms based on orthogonal frequency division multiplexing (OFDM). Initially, the use of OFDM for radar was motivated by the combination of radar with communication via modulation of the radar waveform with communication data. Some subsequent works studied the use of OFDM as a modulation scheme in many different radar applications - from adaptive radar processing to synthetic aperture radar. This suggests that the flexibility provided by OFDM based digital generation of radar waveforms can potentially enable novel radar concepts that are well suited for future automotive radar systems. This thesis aims to explore the perspectives of OFDM as a modulation scheme for high-performance, robust and adaptive automotive radar. To this end, novel signal processing algorithms and OFDM based radar concepts are introduced in this work. The main focus of the thesis is on high-end automotive radar applications, while the applicability for real time implementation is of primary concern. The first part of this thesis focuses on signal processing algorithms for distance-velocity estimation. As a foundation for the algorithms presented in this thesis, a novel and rigorous signal model for OFDM radar is introduced. Based on this signal model, the limitations of the state-of-the-art OFDM radar signal processing are pointed out. To overcome these limitations, we propose two novel signal processing algorithms that build upon the conventional processing and extend it by more sophisticated modeling of the radar signal. The first method named all-cell Doppler compensation (ACDC) overcomes the Doppler sensitivity problem of OFDM radar. The core idea of this algorithm is the scenario-independent correction of Doppler shifts for the entire measurement signal. Since Doppler effect is a major concern for OFDM radar and influences the radar parametrization, its complete compensation opens new perspectives for OFDM radar. It not only achieves an improved, Doppler-independent performance, it also enables more favorable system parametrization. The second distance-velocity estimation algorithm introduced in this thesis addresses the issue of range and Doppler frequency migration due to the target’s motion during the measurement. For the conventional radar signal processing, these migration effects set an upper limit on the simultaneously achievable distance and velocity resolution. The proposed method named all-cell migration compensation (ACMC) extends the underlying OFDM radar signal model to account for the target motion. As a result, the effect of migration is compensated implicitly for the entire radar measurement, which leads to an improved distance and velocity resolution. Simulations show the effectiveness of the proposed algorithms in overcoming the two major limitations of the conventional OFDM radar signal processing. As multiple-input multiple-output (MIMO) radar is a well-established technology for improving the direction-of-arrival (DOA) estimation, the second part of this work studies the multiplexing methods for OFDM radar that enable simultaneous use of multiple transmit antennas for MIMO radar processing. After discussing the drawbacks of known multiplexing methods, we introduce two advanced multiplexing schemes for OFDM-MIMO radar based on non-equidistant interleaving of OFDM subcarriers. These multiplexing approaches exploit the multicarrier structure of OFDM for generation of orthogonal waveforms that enable a simultaneous operation of multiple MIMO channels occupying the same bandwidth. The primary advantage of these methods is that despite multiplexing they maintain all original radar parameters (resolution and unambiguous range in distance and velocity) for each individual MIMO channel. To obtain favorable interleaving patterns with low sidelobes, we propose an optimization approach based on genetic algorithms. Furthermore, to overcome the drawback of increased sidelobes due to subcarrier interleaving, we study the applicability of sparse processing methods for the distance-velocity estimation from measurements of non-equidistantly interleaved OFDM-MIMO radar. We introduce a novel sparsity based frequency estimation algorithm designed for this purpose. The third topic addressed in this work is the robustness of OFDM radar to interference from other radar sensors. In this part of the work we study the interference robustness of OFDM radar and propose novel interference mitigation techniques. The first interference suppression algorithm we introduce exploits the robustness of OFDM to narrowband interference by dropping subcarriers strongly corrupted by interference from evaluation. To avoid increase of sidelobes due to missing subcarriers, their values are reconstructed from the neighboring ones based on linear prediction methods. As a further measure for increasing the interference robustness in a more universal manner, we propose the extension of OFDM radar with cognitive features. We introduce the general concept of cognitive radar that is capable of adapting to the current spectral situation for avoiding interference. Our work focuses mainly on waveform adaptation techniques; we propose adaptation methods that allow dynamic interference avoidance without affecting adversely the estimation performance. The final part of this work focuses on prototypical implementation of OFDM-MIMO radar. With the constructed prototype, the feasibility of OFDM for high-performance radar applications is demonstrated. Furthermore, based on this radar prototype the algorithms presented in this thesis are validated experimentally. The measurements confirm the applicability of the proposed algorithms and concepts for real world automotive radar implementations

    High-resolution imaging methods in array signal processing

    Get PDF
    corecore