109 research outputs found

    ON APPLICATION OF NON-NEGATIVE MATRIX FACTORIZATION FOR AD HOC MICROPHONE ARRAY CALIBRATION FROM INCOMPLETE NOISY DISTANCES

    Get PDF
    ABSTRACT We propose to use non-negative matrix factorization (NMF) to estimate the unknown pairwise distances and reconstruct a distance matrix for microphone array position calibration. We develop new multiplicative update rules for NMF with incomplete input matrix that take into account the symmetry of the distance matrix. Additionally, we develop a convex matrix completion method which is related to an l2-regularized symmetric NMF. Thorough experiments demonstrate that the proposed methods lead to substantial improvement over the state-of-the-art techniques in a wide range of signalto-noise and unknown-distance ratios. The convex symmetric matrix completion method was found to be the most robust method with less computational cost

    On Application Of Non-Negative Matrix Factorization for Ad Hoc Microphone Array Calibration from Incomplete Noisy Distances

    Get PDF
    We propose to use non-negative matrix factorization (NMF) to estimate the unknown pairwise distances and reconstruct a distance matrix for microphone array position calibration. We develop new multiplicative update rules for NMF with incomplete input matrix that take into account the symmetry of the distance matrix. Additionally, we develop a convex matrix completion method which is related to an l2l_2-regularized symmetric NMF. Thorough experiments demonstrate that the proposed methods lead to substantial improvement over the state-of-the-art techniques in a wide range of signal-to-noise and unknown-distance ratios. The convex symmetric matrix completion method was found to be the most robust method with less computational cost

    Ad Hoc Microphone Array Calibration: Euclidean Distance Matrix Completion Algorithm and Theoretical Guarantees

    Get PDF
    This paper addresses the problem of ad hoc microphone array calibration where only partial information about the distances between microphones is available. We construct a matrix consisting of the pairwise distances and propose to estimate the missing entries based on a novel Euclidean distance matrix completion algorithm by alternative low-rank matrix completion and projection onto the Euclidean distance space. This approach confines the recovered matrix to the EDM cone at each iteration of the matrix completion algorithm. The theoretical guarantees of the calibration performance are obtained considering the random and locally structured missing entries as well as the measurement noise on the known distances. This study elucidates the links between the calibration error and the number of microphones along with the noise level and the ratio of missing distances. Thorough experiments on real data recordings and simulated setups are conducted to demonstrate these theoretical insights. A significant improvement is achieved by the proposed Euclidean distance matrix completion algorithm over the state-of-the-art techniques for ad hoc microphone array calibration.Comment: In Press, available online, August 1, 2014. http://www.sciencedirect.com/science/article/pii/S0165168414003508, Signal Processing, 201

    Listening to Distances and Hearing Shapes:Inverse Problems in Room Acoustics and Beyond

    Get PDF
    A central theme of this thesis is using echoes to achieve useful, interesting, and sometimes surprising results. One should have no doubts about the echoes' constructive potential; it is, after all, demonstrated masterfully by Nature. Just think about the bat's intriguing ability to navigate in unknown spaces and hunt for insects by listening to echoes of its calls, or about similar (albeit less well-known) abilities of toothed whales, some birds, shrews, and ultimately people. We show that, perhaps contrary to conventional wisdom, multipath propagation resulting from echoes is our friend. When we think about it the right way, it reveals essential geometric information about the sources--channel--receivers system. The key idea is to think of echoes as being more than just delayed and attenuated peaks in 1D impulse responses; they are actually additional sources with their corresponding 3D locations. This transformation allows us to forget about the abstract \emph{room}, and to replace it by more familiar \emph{point sets}. We can then engage the powerful machinery of Euclidean distance geometry. A problem that always arises is that we do not know \emph{a priori} the matching between the peaks and the points in space, and solving the inverse problem is achieved by \emph{echo sorting}---a tool we developed for learning correct labelings of echoes. This has applications beyond acoustics, whenever one deals with waves and reflections, or more generally, time-of-flight measurements. Equipped with this perspective, we first address the ``Can one hear the shape of a room?'' question, and we answer it with a qualified ``yes''. Even a single impulse response uniquely describes a convex polyhedral room, whereas a more practical algorithm to reconstruct the room's geometry uses only first-order echoes and a few microphones. Next, we show how different problems of localization benefit from echoes. The first one is multiple indoor sound source localization. Assuming the room is known, we show that discretizing the Helmholtz equation yields a system of sparse reconstruction problems linked by the common sparsity pattern. By exploiting the full bandwidth of the sources, we show that it is possible to localize multiple unknown sound sources using only a single microphone. We then look at indoor localization with known pulses from the geometric echo perspective introduced previously. Echo sorting enables localization in non-convex rooms without a line-of-sight path, and localization with a single omni-directional sensor, which is impossible without echoes. A closely related problem is microphone position calibration; we show that echoes can help even without assuming that the room is known. Using echoes, we can localize arbitrary numbers of microphones at unknown locations in an unknown room using only one source at an unknown location---for example a finger snap---and get the room's geometry as a byproduct. Our study of source localization outgrew the initial form factor when we looked at source localization with spherical microphone arrays. Spherical signals appear well beyond spherical microphone arrays; for example, any signal defined on Earth's surface lives on a sphere. This resulted in the first slight departure from the main theme: We develop the theory and algorithms for sampling sparse signals on the sphere using finite rate-of-innovation principles and apply it to various signal processing problems on the sphere

    Audio computing in the wild: frameworks for big data and small computers

    Get PDF
    This dissertation presents some machine learning algorithms that are designed to process as much data as needed while spending the least possible amount of resources, such as time, energy, and memory. Examples of those applications, but not limited to, can be a large-scale multimedia information retrieval system where both queries and the items in the database are noisy signals; collaborative audio enhancement from hundreds of user-created clips of a music concert; an event detection system running in a small device that has to process various sensor signals in real time; a lightweight custom chipset for speech enhancement on hand-held devices; instant music analysis engine running on smartphone apps. In all those applications, efficient machine learning algorithms are supposed to achieve not only a good performance, but also a great resource-efficiency. We start from some efficient dictionary-based single-channel source separation algorithms. We can train this kind of source-specific dictionaries by using some matrix factorization or topic modeling, whose elements form a representative set of spectra for the particular source. During the test time, the system estimates the contribution of the participating dictionary items for an unknown mixture spectrum. In this way we can estimate the activation of each source separately, and then recover the source of interest by using that particular source's reconstruction. There are some efficiency issues during this procedure. First off, searching for the optimal dictionary size is time consuming. Although for some very common types of sources, e.g. English speech, we know the optimal rank of the model by trial and error, it is hard to know in advance as to what is the optimal number of dictionary elements for the unknown sources, which are usually modeled during the test time in the semi-supervised separation scenarios. On top of that, when it comes to the non-stationary unknown sources, we had better maintain a dictionary that adapts its size and contents to the change of the source's nature. In this online semi-supervised separation scenario, a mechanism that can efficiently learn the optimal rank is helpful. To this end, a deflation method is proposed for modeling this unknown source with a nonnegative dictionary whose size is optimal. Since it has to be done during the test time, the deflation method that incrementally adds up new dictionary items shows better efficiency than a corresponding na\"ive approach where we simply try a bunch of different models. We have another efficiency issue when we are to use a large dictionary for better separation. It has been known that considering the manifold of the training data can help enhance the performance for the separation. This is because of the symptom that the usual manifold-ignorant convex combination models, such as from low-rank matrix decomposition or topic modeling, tend to result in ambiguous regions in the source-specific subspace defined by the dictionary items as the bases. For example, in those ambiguous regions, the original data samples cannot reside. Although some source separation techniques that respect data manifold could increase the performance, they call for more memory and computational resources due to the fact that the models call for larger dictionaries and involve sparse coding during the test time. This limitation led the development of hashing-based encoding of the audio spectra, so that some computationally heavy routines, such as nearest neighbor searches for sparse coding, can be performed in a cheaper bit-wise fashion. Matching audio signals can be challenging as well, especially if the signals are noisy and the matching task involves a big amount of signals. If it is an information retrieval application, for example, the bigger size of the data leads to a longer response time. On top of that, if the signals are defective, we have to perform the enhancement or separation job in the first place before matching, or we might need a matching mechanism that is robust to all those different kinds of artifacts. Likewise, the noisy nature of signals can add an additional complexity to the system. In this dissertation we will also see some compact integer (and eventually binary) representations for those matching systems. One of the possible compact representations would be a hashing-based matching method, where we can employ a particular kind of hash functions to preserve the similarity among original signals in the hash code domain. We will see that a variant of Winner Take All hashing can provide Hamming distance from noise-robust binary features, and that matching using the hash codes works well for some keyword spotting tasks. From the fact that some landmark hashes (e.g. local maxima from non-maximum suppression on the magnitudes of a mel-scaled spectrogram) can also robustly represent the time-frequency domain signal efficiently, a matrix decomposition algorithm is also proposed to take those irregular sparse matrices as input. Based on the assumption that the number of landmarks is a lot smaller than the number of all the time-frequency coefficients, we can think of this matching algorithm efficient if it operates entirely on the landmark representation. On the contrary to the usual landmark matching schemes, where matching is defined rigorously, we see the audio matching problem as soft matching where we find a similar constellation of landmarks to the query. In order to perform this soft matching job, the landmark positions are smoothed by a fixed-width Gaussian caps, with which the matching job is reduced down to calculating the amount of overlaps in-between those Gaussians. The Gaussian-based density approximation is also useful when we perform decomposition on this landmark representation, because otherwise the landmarks are usually too sparse to perform an ordinary matrix factorization algorithm, which are originally for a dense input matrix. We also expand this concept to the matrix deconvolution problem as well, where we see the input landmark representation of a source as a two-dimensional convolution between a source pattern and its corresponding sparse activations. If there are more than one source, as a noisy signal, we can think of this problem as factor deconvolution where the mixture is the combination of all the source-specific convolutions. The dissertation also covers Collaborative Audio Enhancement (CAE) algorithms that aim to recover the dominant source at a sound scene (e.g. music signals of a concert rather than the noise from the crowd) from multiple low-quality recordings (e.g. Youtube video clips uploaded by the audience). CAE can be seen as crowdsourcing a recording job, which needs a substantial amount of denoising effort afterward, because the user-created recordings might have been contaminated with various artifacts. In the sense that the recordings are from not-synchronized heterogenous sensors, we can also think of CAE as big ad-hoc sensor array processing. In CAE, each recording is assumed to be uniquely corrupted by a specific frequency response of the microphone, an aggressive audio coding algorithm, interference, band-pass filtering, clipping, etc. To consolidate all these recordings and come up with an enhanced audio, Probabilistic Latent Component Sharing (PLCS) has been proposed as a method of simultaneous probabilistic topic modeling on synchronized input signals. In PLCS, some of the parameters are fixed to be same during and after the learning process to capture common audio content, while the rest of the parameters are for the unwanted recording-specific interference and artifacts. We can speed up PLCS by incorporating a hashing-based nearest neighbor search so that at every EM iteration PLCS can be applied only to a small number of recordings that are closest to the current source estimation. Experiments on a small simulated CAE setup shows that the proposed PLCS can improve the sound quality from variously contaminated recordings. The nearest neighbor search technique during PLCS provides sensible speed-up at larger scaled experiments (up to 1000 recordings). Finally, to describe an extremely optimized deep learning deployment system, Bitwise Neural Networks (BNN) will be also discussed. In the proposed BNN, all the input, hidden, and output nodes are binaries (+1 and -1), and so are all the weights and bias. Consequently, the operations on them during the test time are defined with Boolean algebra, too. BNNs are spatially and computationally efficient in implementations, since (a) we represent a real-valued sample or parameter with a bit (b) the multiplication and addition correspond to bitwise XNOR and bit-counting, respectively. Therefore, BNNs can be used to implement a deep learning system in a resource-constrained environment, so that we can deploy a deep learning system on small devices without using up the power, memory, CPU clocks, etc. The training procedure for BNNs is based on a straightforward extension of backpropagation, which is characterized by the use of the quantization noise injection scheme, and the initialization strategy that learns a weight-compressed real-valued network only for the initialization purpose. Some preliminary results on the MNIST dataset and speech denoising demonstrate that a straightforward extension of backpropagation can successfully train BNNs whose performance is comparable while necessitating vastly fewer computational resources

    Mathematics and Digital Signal Processing

    Get PDF
    Modern computer technology has opened up new opportunities for the development of digital signal processing methods. The applications of digital signal processing have expanded significantly and today include audio and speech processing, sonar, radar, and other sensor array processing, spectral density estimation, statistical signal processing, digital image processing, signal processing for telecommunications, control systems, biomedical engineering, and seismology, among others. This Special Issue is aimed at wide coverage of the problems of digital signal processing, from mathematical modeling to the implementation of problem-oriented systems. The basis of digital signal processing is digital filtering. Wavelet analysis implements multiscale signal processing and is used to solve applied problems of de-noising and compression. Processing of visual information, including image and video processing and pattern recognition, is actively used in robotic systems and industrial processes control today. Improving digital signal processing circuits and developing new signal processing systems can improve the technical characteristics of many digital devices. The development of new methods of artificial intelligence, including artificial neural networks and brain-computer interfaces, opens up new prospects for the creation of smart technology. This Special Issue contains the latest technological developments in mathematics and digital signal processing. The stated results are of interest to researchers in the field of applied mathematics and developers of modern digital signal processing systems

    Acoustic Echo Estimation using the model-based approach with Application to Spatial Map Construction in Robotics

    Get PDF

    Inaudible acoustics: Techniques and applications

    Get PDF
    This dissertation is focused on developing a sub-area of acoustics that we call inaudible acoustics. We have developed two core capabilities, (1) BackDoor and (2) Ripple, and demonstrated their use in various mobile and IoT applications. In BackDoor, we synthesize ultrasound signals that are inaudible to humans yet naturally recordable by all microphones. Importantly, the microphone does not require any modification, enabling billions of microphone-enabled devices, including phones, laptops, voice assistants, and IoT devices, to leverage the capability. Example applications include acoustic data beacons, acoustic watermarking, and spy-microphone jamming. In Ripple, we develop modulation and sensing techniques for vibratory signals that traverse through solid surfaces, enabling a new form of secure proximal communication. Applications of the vibratory communication system include on-body communication through imperceptible physical vibrations and device-device secure data transfer through physical contacts. Our prototypes include an inaudible jammer that secures private conversations from electronic eavesdropping, acoustic beacons for location-based information sharing, and vibratory communication in a smart-ring sending password through a finger touch. Our research also uncovers new security threats to acoustic devices. While simple abuse of inaudible jammer can disable hearing aids and cell phones, our work shows that voice interfaces, such as Amazon Echo, Google Home, Siri, etc., can be compromised through carefully designed inaudible voice commands. The contributions of this dissertation can be summarized in three primitives: (1) exploiting inherent hardware nonlinearity for sensing out-of-band signals, (2) developing the vibratory communication system for secure touch-based data exchange, and (3) structured information reconstruction from noisy acoustic signals. In developing these primitives, we draw from principles in wireless networking, digital communications, signal processing, and embedded design and translate them to completely functional systems

    Intelligent Sensor Networks

    Get PDF
    In the last decade, wireless or wired sensor networks have attracted much attention. However, most designs target general sensor network issues including protocol stack (routing, MAC, etc.) and security issues. This book focuses on the close integration of sensing, networking, and smart signal processing via machine learning. Based on their world-class research, the authors present the fundamentals of intelligent sensor networks. They cover sensing and sampling, distributed signal processing, and intelligent signal learning. In addition, they present cutting-edge research results from leading experts
    • …
    corecore