6 research outputs found

    Raking the Cocktail Party

    Get PDF
    We present the concept of an acoustic rake receiver---a microphone beamformer that uses echoes to improve the noise and interference suppression. The rake idea is well-known in wireless communications; it involves constructively combining different multipath components that arrive at the receiver antennas. Unlike spread-spectrum signals used in wireless communications, speech signals are not orthogonal to their shifts. Therefore, we focus on the spatial structure, rather than temporal. Instead of explicitly estimating the channel, we create correspondences between early echoes in time and image sources in space. These multiple sources of the desired and the interfering signal offer additional spatial diversity that we can exploit in the beamformer design. We present several "intuitive" and optimal formulations of acoustic rake receivers, and show theoretically and numerically that the rake formulation of the maximum signal-to-interference-and-noise beamformer offers significant performance boosts in terms of noise and interference suppression. Beyond signal-to-noise ratio, we observe gains in terms of the \emph{perceptual evaluation of speech quality} (PESQ) metric for the speech quality. We accompany the paper by the complete simulation and processing chain written in Python. The code and the sound samples are available online at \url{http://lcav.github.io/AcousticRakeReceiver/}.Comment: 12 pages, 11 figures, Accepted for publication in IEEE Journal on Selected Topics in Signal Processing (Special Issue on Spatial Audio

    Raking echoes in the time domain

    Get PDF
    The geometry of room acoustics is such that the reverberant signal can be seen as the same waveform emitted from multiple locations. In analogy with the rake receiver from wireless communications, we propose several beamforming strategies that exploit, rather than suppress, this additional spatio-temporal diversity. Unlike earlier work in the frequency domain, time domain designs allow to shape the impulse response of the beamformer. In particular, we can control perceptually relevant parameters, such as the amount of early echoes or the length of the beamformer response. Relying on the knowledge of the image sources positions, we derive different optimal beamformers. Leveraging perceptual cues, we show how to improve interference and noise reduction without degrading the perceptual quality. The designs are validated through simulation. Using early echoes is shown to strictly improve the signal to interference and noise ratio. Code and speech samples are available online at http://lcav.epfl.ch/Robin_Scheibler

    Space-time receivers for CDMA multipath signals

    No full text

    Rake, Peel, Sketch:The Signal Processing Pipeline Revisited

    Get PDF
    The prototypical signal processing pipeline can be divided into four blocks. Representation of the signal in a basis suitable for processing. Enhancement of the meaningful part of the signal and noise reduction. Estimation of important statistical properties of the signal. Adaptive processing to track and adapt to changes in the signal statistics. This thesis revisits each of these blocks and proposes new algorithms, borrowing ideas from information theory, theoretical computer science, or communications. First, we revisit the Walsh-Hadamard transform (WHT) for the case of a signal sparse in the transformed domain, namely that has only K †N non-zero coefficients. We show that an efficient algorithm exists that can compute these coefficients in O(K log2(K) log2(N/K)) and using only O(K log2(N/K)) samples. This algorithm relies on a fast hashing procedure that computes small linear combinations of transformed domain coefficients. A bipartite graph is formed with linear combinations on one side, and non-zero coefficients on the other. A peeling decoder is then used to recover the non-zero coefficients one by one. A detailed analysis of the algorithm based on error correcting codes over the binary erasure channel is given. The second chapter is about beamforming. Inspired by the rake receiver from wireless communications, we recognize that echoes in a room are an important source of extra signal diversity. We extend several classic beamforming algorithms to take advantage of echoes and also propose new optimal formulations. We explore formulations both in time and frequency domains. We show theoretically and in numerical simulations that the signal-to-interference-and-noise ratio increases proportionally to the number of echoes used. Finally, beyond objective measures, we show that echoes also directly improve speech intelligibility as measured by the perceptual evaluation of speech quality (PESQ) metric. Next, we attack the problem of direction of arrival of acoustic sources, to which we apply a robust finite rate of innovation reconstruction framework. FRIDA â the resulting algorithm â exploits wideband information coherently, works at very low signal-to-noise ratio, and can resolve very close sources. The algorithm can use either raw microphone signals or their cross- correlations. While the former lets us work with correlated sources, the latter creates a quadratic number of measurements that allows to locate many sources with few microphones. Thorough experiments on simulated and recorded data shows that FRIDA compares favorably with the state-of-the-art. We continue by revisiting the classic recursive least squares (RLS) adaptive filter with ideas borrowed from recent results on sketching least squares problems. The exact update of RLS is replaced by a few steps of conjugate gradient descent. We propose then two different precondi- tioners, obtained by sketching the data, to accelerate the convergence of the gradient descent. Experiments on artificial as well as natural signals show that the proposed algorithm has a performance very close to that of RLS at a lower computational burden. The fifth and final chapter is dedicated to the software and hardware tools developed for this thesis. We describe the pyroomacoustics Python package that contains routines for the evaluation of audio processing algorithms and reference implementations of popular algorithms. We then give an overview of the microphone arrays developed

    Antenna arrays for the downlink of FDD wideband CDMA communication systems

    Get PDF
    The main subject of this thesis is the investigation of antenna array techniques for improving the performance of the downlink of wideband code division multiple access (WCDMA) mobile communication systems. These communication systems operate in frequency division duplex (FDD) mode and the antenna arrays are employed in the base station. A number of diversity, beamforming and hybrid techniques are analysed and their bit error ratio (BER) versus signalto- noise ratio (SNR) performance is calculated as a function of the eigenvalues of the mean channel correlation matrix, where this is applicable. Also, their BER versus SNR performance is evaluated by means of computer simulations in various channel environments and using different numbers of transmit antenna elements in the base station. The simulation results of the techniques, along with other characteristics, are compared to examine the relationship among their performance in various channel environments and investigate which technique is most suitable for each channel environment. Next, a combination of the channel correlation matrix eigenvalue decomposition and space-time processing is proposed as a possible open loop approach to the downlink data signal transmission. It decomposes the channel into M components in the form of eigenvectors (M is the number of transmit antennas in the base station), and attempts to minimise the transmit power that is needed to achieve a target BER at the mobile receiver by employing the optimum number of these eigenvectors. The lower transmit power and the directional transmission by means of eigenvectors are expected to lower interference levels to non-desired users (especially to those users who are not physically close to the direction(s) of transmission). Theoretical and simulation results suggest that this approach performs better than other presented open loop techniques, while the performance gain depends on M and the channel environment. In simulations it is usually assumed that the base and mobile station have access to perfect estimates of all needed parameters (e.g. channel coecients). However, in practical systems they make use of pilot and/or feedback signals to obtain estimates of these parameters, which result in noisy estimates. The impact of the noisy estimates on the performance of various techniques is investigated by computer simulations, and the results suggest that there is typically some performance loss. The loss depends on the parameter that is estimated from pilot signals, and may be a function of M, SNR and/or the channel environment. In certain beamforming techniques the base station operates the transmit antenna array in an open loop fashion by estimating the downlink weight vector from the directional information of the uplink channel. Nevertheless, in FDD systems this results in performance loss due to the separation between the uplink and downlink carrier frequencies (`FDD gap'). This loss is quantified and the results show that it is a function of M and the FDD gap. Also, a very simple technique for compensating this loss is proposed, and results obtained after its application suggest that it eliminates most of the loss. Comparison of the proposed technique with an existing compensation technique suggests that, even though the latter is more complex than the former, it yields very little additional improvement

    Listening to Distances and Hearing Shapes:Inverse Problems in Room Acoustics and Beyond

    Get PDF
    A central theme of this thesis is using echoes to achieve useful, interesting, and sometimes surprising results. One should have no doubts about the echoes' constructive potential; it is, after all, demonstrated masterfully by Nature. Just think about the bat's intriguing ability to navigate in unknown spaces and hunt for insects by listening to echoes of its calls, or about similar (albeit less well-known) abilities of toothed whales, some birds, shrews, and ultimately people. We show that, perhaps contrary to conventional wisdom, multipath propagation resulting from echoes is our friend. When we think about it the right way, it reveals essential geometric information about the sources--channel--receivers system. The key idea is to think of echoes as being more than just delayed and attenuated peaks in 1D impulse responses; they are actually additional sources with their corresponding 3D locations. This transformation allows us to forget about the abstract \emph{room}, and to replace it by more familiar \emph{point sets}. We can then engage the powerful machinery of Euclidean distance geometry. A problem that always arises is that we do not know \emph{a priori} the matching between the peaks and the points in space, and solving the inverse problem is achieved by \emph{echo sorting}---a tool we developed for learning correct labelings of echoes. This has applications beyond acoustics, whenever one deals with waves and reflections, or more generally, time-of-flight measurements. Equipped with this perspective, we first address the ``Can one hear the shape of a room?'' question, and we answer it with a qualified ``yes''. Even a single impulse response uniquely describes a convex polyhedral room, whereas a more practical algorithm to reconstruct the room's geometry uses only first-order echoes and a few microphones. Next, we show how different problems of localization benefit from echoes. The first one is multiple indoor sound source localization. Assuming the room is known, we show that discretizing the Helmholtz equation yields a system of sparse reconstruction problems linked by the common sparsity pattern. By exploiting the full bandwidth of the sources, we show that it is possible to localize multiple unknown sound sources using only a single microphone. We then look at indoor localization with known pulses from the geometric echo perspective introduced previously. Echo sorting enables localization in non-convex rooms without a line-of-sight path, and localization with a single omni-directional sensor, which is impossible without echoes. A closely related problem is microphone position calibration; we show that echoes can help even without assuming that the room is known. Using echoes, we can localize arbitrary numbers of microphones at unknown locations in an unknown room using only one source at an unknown location---for example a finger snap---and get the room's geometry as a byproduct. Our study of source localization outgrew the initial form factor when we looked at source localization with spherical microphone arrays. Spherical signals appear well beyond spherical microphone arrays; for example, any signal defined on Earth's surface lives on a sphere. This resulted in the first slight departure from the main theme: We develop the theory and algorithms for sampling sparse signals on the sphere using finite rate-of-innovation principles and apply it to various signal processing problems on the sphere