209 research outputs found

    The Subwoofer Room Impulse Response (SUBRIR) Database

    Full text link

    Spatial Multizone Soundfield Reproduction Design

    No full text
    It is desirable for people sharing a physical space to access different multimedia information streams simultaneously. For a good user experience, the interference of the different streams should be held to a minimum. This is straightforward for the video component but currently difficult for the audio sound component. Spatial multizone soundfield reproduction, which aims to provide an individual sound environment to each of a set of listeners without the use of physical isolation or headphones, has drawn significant attention of researchers in recent years. The realization of multizone soundfield reproduction is a conceptually challenging problem as currently most of the soundfield reproduction techniques concentrate on a single zone. This thesis considers the theory and design of a multizone soundfield reproduction system using arrays of loudspeakers in given complex environments. We first introduce a novel method for spatial multizone soundfield reproduction based on describing the desired multizone soundfield as an orthogonal expansion of formulated basis functions over the desired reproduction region. This provides the theoretical basis of both 2-D (height invariant) and 3-D soundfield reproduction for this work. We then extend the reproduction of the multizone soundfield over the desired region to reverberant environments, which is based on the identification of the acoustic transfer function (ATF) from the loudspeaker over the desired reproduction region using sparse methods. The simulation results confirm that the method leads to a significantly reduced number of required microphones for an accurate multizone sound reproduction compared with the state of the art, while it also facilitates the reproduction over a wide frequency range. In addition, we focus on the improvements of the proposed multizone reproduction system with regard to practical implementation. The so-called 2.5D multizone oundfield reproduction is considered to accurately reproduce the desired multizone soundfield over a selected 2-D plane at the height approximately level with the listener’s ears using a single array of loudspeakers with 3-D reverberant settings. Then, we propose an adaptive reverberation cancelation method for the multizone soundfield reproduction within the desired region and simplify the prior soundfield measurement process. Simulation results suggest that the proposed method provides a faster convergence rate than the comparative approaches under the same hardware provision. Finally, we conduct the real-world implementation based on the proposed theoretical work. The experimental results show that we can achieve a very noticeable acoustic energy contrast between the signals recorded in the bright zone and the quiet zone, especially for the system implementation with reverberation equalization

    Object-based Modeling of Audio for Coding and Source Separation

    Get PDF
    This thesis studies several data decomposition algorithms for obtaining an object-based representation of an audio signal. The estimation of the representation parameters are coupled with audio-specific criteria, such as the spectral redundancy, sparsity, perceptual relevance and spatial position of sounds. The objective is to obtain an audio signal representation that is composed of meaningful entities called audio objects that reflect the properties of real-world sound objects and events. The estimation of the object-based model is based on magnitude spectrogram redundancy using non-negative matrix factorization with extensions to multichannel and complex-valued data. The benefits of working with object-based audio representations over the conventional time-frequency bin-wise processing are studied. The two main applications of the object-based audio representations proposed in this thesis are spatial audio coding and sound source separation from multichannel microphone array recordings. In the proposed spatial audio coding algorithm, the audio objects are estimated from the multichannel magnitude spectrogram. The audio objects are used for recovering the content of each original channel from a single downmixed signal, using time-frequency filtering. The perceptual relevance of modeling the audio signal is considered in the estimation of the parameters of the object-based model, and the sparsity of the model is utilized in encoding its parameters. Additionally, a quantization of the model parameters is proposed that reflects the perceptual relevance of each quantized element. The proposed object-based spatial audio coding algorithm is evaluated via listening tests and comparing the overall perceptual quality to conventional time-frequency block-wise methods at the same bitrates. The proposed approach is found to produce comparable coding efficiency while providing additional functionality via the object-based coding domain representation, such as the blind separation of the mixture of sound sources in the encoded channels. For the sound source separation from multichannel audio recorded by a microphone array, a method combining an object-based magnitude model and spatial covariance matrix estimation is considered. A direction of arrival-based model for the spatial covariance matrices of the sound sources is proposed. Unlike the conventional approaches, the estimation of the parameters of the proposed spatial covariance matrix model ensures a spatially coherent solution for the spatial parameterization of the sound sources. The separation quality is measured with objective criteria and the proposed method is shown to improve over the state-of-the-art sound source separation methods, with recordings done using a small microphone array

    Efficient Techniques for Wave-based Sound Propagation in Interactive Applications

    Get PDF
    Sound propagation techniques model the effect of the environment on sound waves and predict their behavior from point of emission at the source to the final point of arrival at the listener. Sound is a pressure wave produced by mechanical vibration of a surface that propagates through a medium such as air or water, and the problem of sound propagation can be formulated mathematically as a second-order partial differential equation called the wave equation. Accurate techniques based on solving the wave equation, also called the wave-based techniques, are too expensive computationally and memory-wise. Therefore, these techniques face many challenges in terms of their applicability in interactive applications including sound propagation in large environments, time-varying source and listener directivity, and high simulation cost for mid-frequencies. In this dissertation, we propose a set of efficient wave-based sound propagation techniques that solve these three challenges and enable the use of wave-based sound propagation in interactive applications. Firstly, we propose a novel equivalent source technique for interactive wave-based sound propagation in large scenes spanning hundreds of meters. It is based on the equivalent source theory used for solving radiation and scattering problems in acoustics and electromagnetics. Instead of using a volumetric or surface-based approach, this technique takes an object-centric approach to sound propagation. The proposed equivalent source technique generates realistic acoustic effects and takes orders of magnitude less runtime memory compared to prior wave-based techniques. Secondly, we present an efficient framework for handling time-varying source and listener directivity for interactive wave-based sound propagation. The source directivity is represented as a linear combination of elementary spherical harmonic sources. This spherical harmonic-based representation of source directivity can support analytical, data-driven, rotating or time-varying directivity function at runtime. Unlike previous approaches, the listener directivity approach can be used to compute spatial audio (3D audio) for a moving, rotating listener at interactive rates. Lastly, we propose an efficient GPU-based time-domain solver for the wave equation that enables wave simulation up to the mid-frequency range in tens of minutes on a desktop computer. It is demonstrated that by carefully mapping all the components of the wave simulator to match the parallel processing capabilities of the graphics processors, significant improvement in performance can be achieved compared to the CPU-based simulators, while maintaining numerical accuracy. We validate these techniques with offline numerical simulations and measured data recorded in an outdoor scene. We present results of preliminary user evaluations conducted to study the impact of these techniques on user's immersion in virtual environment. We have integrated these techniques with the Half-Life 2 game engine, Oculus Rift head-mounted display, and Xbox game controller to enable users to experience high-quality acoustics effects and spatial audio in the virtual environment.Doctor of Philosoph

    Proceedings of the EAA Spatial Audio Signal Processing symposium: SASP 2019

    Get PDF
    International audienc

    Fast Numerical and Machine Learning Algorithms for Spatial Audio Reproduction

    Get PDF
    Audio reproduction technologies have underwent several revolutions from a purely mechanical, to electromagnetic, and into a digital process. These changes have resulted in steady improvements in the objective qualities of sound capture/playback on increasingly portable devices. However, most mobile playback devices remove important spatial-directional components of externalized sound which are natural to the subjective experience of human hearing. Fortunately, the missing spatial-directional parts can be integrated back into audio through a combination of computational methods and physical knowledge of how sound scatters off of the listener's anthropometry in the sound-field. The former employs signal processing techniques for rendering the sound-field. The latter employs approximations of the sound-field through the measurement of so-called Head-Related Impulse Responses/Transfer Functions (HRIRs/HRTFs). This dissertation develops several numerical and machine learning algorithms for accelerating and personalizing spatial audio reproduction in light of available mobile computing power. First, spatial audio synthesis between a sound-source and sound-field requires fast convolution algorithms between the audio-stream and the HRIRs. We introduce a novel sparse decomposition algorithm for HRIRs based on non-negative matrix factorization that allows for faster time-domain convolution than frequency-domain fast-Fourier-transform variants. Second, the full sound-field over the spherical coordinate domain must be efficiently approximated from a finite collection of HRTFs. We develop a joint spatial-frequency covariance model for Gaussian process regression (GPR) and sparse-GPR methods that supports the fast interpolation and data fusion of HRTFs across multiple data-sets. Third, the direct measurement of HRTFs requires specialized equipment that is unsuited for widespread acquisition. We ``bootstrap'' the human ability to localize sound in listening tests with Gaussian process active-learning techniques over graphical user interfaces that allows the listener to infer his/her own HRTFs. Experiments are conducted on publicly available HRTF datasets and human listeners

    Efficient algorithms and data structures for compressive sensing

    Get PDF
    Wegen der kontinuierlich anwachsenden Anzahl von Sensoren, und den stetig wachsenden Datenmengen, die jene produzieren, stößt die konventielle Art Signale zu verarbeiten, beruhend auf dem Nyquist-Kriterium, auf immer mehr Hindernisse und Probleme. Die kürzlich entwickelte Theorie des Compressive Sensing (CS) formuliert das Versprechen einige dieser Hindernisse zu beseitigen, indem hier allgemeinere Signalaufnahme und -rekonstruktionsverfahren zum Einsatz kommen können. Dies erlaubt, dass hierbei einzelne Abtastwerte komplexer strukturierte Informationen über das Signal enthalten können als dies bei konventiellem Nyquistsampling der Fall ist. Gleichzeitig verändert sich die Signalrekonstruktion notwendigerweise zu einem nicht-linearen Vorgang und ebenso müssen viele Hardwarekonzepte für praktische Anwendungen neu überdacht werden. Das heißt, dass man zwischen der Menge an Information, die man über Signale gewinnen kann, und dem Aufwand für das Design und Betreiben eines Signalverarbeitungssystems abwägen kann und muss. Die hier vorgestellte Arbeit trägt dazu bei, dass bei diesem Abwägen CS mehr begünstigt werden kann, indem neue Resultate vorgestellt werden, die es erlauben, dass CS einfacher in der Praxis Anwendung finden kann, wobei die zu erwartende Leistungsfähigkeit des Systems theoretisch fundiert ist. Beispielsweise spielt das Konzept der Sparsity eine zentrale Rolle, weshalb diese Arbeit eine Methode präsentiert, womit der Grad der Sparsity eines Vektors mittels einer einzelnen Beobachtung geschätzt werden kann. Wir zeigen auf, dass dieser Ansatz für Sparsity Order Estimation zu einem niedrigeren Rekonstruktionsfehler führt, wenn man diesen mit einer Rekonstruktion vergleicht, welcher die Sparsity des Vektors unbekannt ist. Um die Modellierung von Signalen und deren Rekonstruktion effizienter zu gestalten, stellen wir das Konzept von der matrixfreien Darstellung linearer Operatoren vor. Für die einfachere Anwendung dieser Darstellung präsentieren wir eine freie Softwarearchitektur und demonstrieren deren Vorzüge, wenn sie für die Rekonstruktion in einem CS-System genutzt wird. Konkret wird der Nutzen dieser Bibliothek, einerseits für das Ermitteln von Defektpositionen in Prüfkörpern mittels Ultraschall, und andererseits für das Schätzen von Streuern in einem Funkkanal aus Ultrabreitbanddaten, demonstriert. Darüber hinaus stellen wir für die Verarbeitung der Ultraschalldaten eine Rekonstruktionspipeline vor, welche Daten verarbeitet, die im Frequenzbereich Unterabtastung erfahren haben. Wir beschreiben effiziente Algorithmen, die bei der Modellierung und der Rekonstruktion zum Einsatz kommen und wir leiten asymptotische Resultate für die benötigte Anzahl von Messwerten, sowie die zu erwartenden Lokalisierungsgenauigkeiten der Defekte her. Wir zeigen auf, dass das vorgestellte System starke Kompression zulässt, ohne die Bildgebung und Defektlokalisierung maßgeblich zu beeinträchtigen. Für die Lokalisierung von Streuern mittels Ultrabreitbandradaren stellen wir ein CS-System vor, welches auf einem Random Demodulators basiert. Im Vergleich zu existierenden Messverfahren ist die hieraus resultierende Schätzung der Kanalimpulsantwort robuster gegen die Effekte von zeitvarianten Funkkanälen. Um den inhärenten Modellfehler, den gitterbasiertes CS begehen muss, zu beseitigen, zeigen wir auf wie Atomic Norm Minimierung es erlaubt ohne die Einschränkung auf ein endliches und diskretes Gitter R-dimensionale spektrale Komponenten aus komprimierten Beobachtungen zu schätzen. Hierzu leiten wir eine R-dimensionale Variante des ADMM her, welcher dazu in der Lage ist die Signalkovarianz in diesem allgemeinen Szenario zu schätzen. Weiterhin zeigen wir, wie dieser Ansatz zur Richtungsschätzung mit realistischen Antennenarraygeometrien genutzt werden kann. In diesem Zusammenhang präsentieren wir auch eine Methode, welche mittels Stochastic gradient descent Messmatrizen ermitteln kann, die sich gut für Parameterschätzung eignen. Die hieraus resultierenden Kompressionsverfahren haben die Eigenschaft, dass die Schätzgenauigkeit über den gesamten Parameterraum ein möglichst uniformes Verhalten zeigt. Zuletzt zeigen wir auf, dass die Kombination des ADMM und des Stochastic Gradient descent das Design eines CS-Systems ermöglicht, welches in diesem gitterfreien Szenario wünschenswerte Eigenschaften hat.Along with the ever increasing number of sensors, which are also generating rapidly growing amounts of data, the traditional paradigm of sampling adhering the Nyquist criterion is facing an equally increasing number of obstacles. The rather recent theory of Compressive Sensing (CS) promises to alleviate some of these drawbacks by proposing to generalize the sampling and reconstruction schemes such that the acquired samples can contain more complex information about the signal than Nyquist samples. The proposed measurement process is more complex and the reconstruction algorithms necessarily need to be nonlinear. Additionally, the hardware design process needs to be revisited as well in order to account for this new acquisition scheme. Hence, one can identify a trade-off between information that is contained in individual samples of a signal and effort during development and operation of the sensing system. This thesis addresses the necessary steps to shift the mentioned trade-off more to the favor of CS. We do so by providing new results that make CS easier to deploy in practice while also maintaining the performance indicated by theoretical results. The sparsity order of a signal plays a central role in any CS system. Hence, we present a method to estimate this crucial quantity prior to recovery from a single snapshot. As we show, this proposed Sparsity Order Estimation method allows to improve the reconstruction error compared to an unguided reconstruction. During the development of the theory we notice that the matrix-free view on the involved linear mappings offers a lot of possibilities to render the reconstruction and modeling stage much more efficient. Hence, we present an open source software architecture to construct these matrix-free representations and showcase its ease of use and performance when used for sparse recovery to detect defects from ultrasound data as well as estimating scatterers in a radio channel using ultra-wideband impulse responses. For the former of these two applications, we present a complete reconstruction pipeline when the ultrasound data is compressed by means of sub-sampling in the frequency domain. Here, we present the algorithms for the forward model, the reconstruction stage and we give asymptotic bounds for the number of measurements and the expected reconstruction error. We show that our proposed system allows significant compression levels without substantially deteriorating the imaging quality. For the second application, we develop a sampling scheme to acquire the channel Impulse Response (IR) based on a Random Demodulator that allows to capture enough information in the recorded samples to reliably estimate the IR when exploiting sparsity. Compared to the state of the art, this in turn allows to improve the robustness to the effects of time-variant radar channels while also outperforming state of the art methods based on Nyquist sampling in terms of reconstruction error. In order to circumvent the inherent model mismatch of early grid-based compressive sensing theory, we make use of the Atomic Norm Minimization framework and show how it can be used for the estimation of the signal covariance with R-dimensional parameters from multiple compressive snapshots. To this end, we derive a variant of the ADMM that can estimate this covariance in a very general setting and we show how to use this for direction finding with realistic antenna geometries. In this context we also present a method based on a Stochastic gradient descent iteration scheme to find compression schemes that are well suited for parameter estimation, since the resulting sub-sampling has a uniform effect on the whole parameter space. Finally, we show numerically that the combination of these two approaches yields a well performing grid-free CS pipeline

    Listening to Distances and Hearing Shapes:Inverse Problems in Room Acoustics and Beyond

    Get PDF
    A central theme of this thesis is using echoes to achieve useful, interesting, and sometimes surprising results. One should have no doubts about the echoes' constructive potential; it is, after all, demonstrated masterfully by Nature. Just think about the bat's intriguing ability to navigate in unknown spaces and hunt for insects by listening to echoes of its calls, or about similar (albeit less well-known) abilities of toothed whales, some birds, shrews, and ultimately people. We show that, perhaps contrary to conventional wisdom, multipath propagation resulting from echoes is our friend. When we think about it the right way, it reveals essential geometric information about the sources--channel--receivers system. The key idea is to think of echoes as being more than just delayed and attenuated peaks in 1D impulse responses; they are actually additional sources with their corresponding 3D locations. This transformation allows us to forget about the abstract \emph{room}, and to replace it by more familiar \emph{point sets}. We can then engage the powerful machinery of Euclidean distance geometry. A problem that always arises is that we do not know \emph{a priori} the matching between the peaks and the points in space, and solving the inverse problem is achieved by \emph{echo sorting}---a tool we developed for learning correct labelings of echoes. This has applications beyond acoustics, whenever one deals with waves and reflections, or more generally, time-of-flight measurements. Equipped with this perspective, we first address the ``Can one hear the shape of a room?'' question, and we answer it with a qualified ``yes''. Even a single impulse response uniquely describes a convex polyhedral room, whereas a more practical algorithm to reconstruct the room's geometry uses only first-order echoes and a few microphones. Next, we show how different problems of localization benefit from echoes. The first one is multiple indoor sound source localization. Assuming the room is known, we show that discretizing the Helmholtz equation yields a system of sparse reconstruction problems linked by the common sparsity pattern. By exploiting the full bandwidth of the sources, we show that it is possible to localize multiple unknown sound sources using only a single microphone. We then look at indoor localization with known pulses from the geometric echo perspective introduced previously. Echo sorting enables localization in non-convex rooms without a line-of-sight path, and localization with a single omni-directional sensor, which is impossible without echoes. A closely related problem is microphone position calibration; we show that echoes can help even without assuming that the room is known. Using echoes, we can localize arbitrary numbers of microphones at unknown locations in an unknown room using only one source at an unknown location---for example a finger snap---and get the room's geometry as a byproduct. Our study of source localization outgrew the initial form factor when we looked at source localization with spherical microphone arrays. Spherical signals appear well beyond spherical microphone arrays; for example, any signal defined on Earth's surface lives on a sphere. This resulted in the first slight departure from the main theme: We develop the theory and algorithms for sampling sparse signals on the sphere using finite rate-of-innovation principles and apply it to various signal processing problems on the sphere
    corecore