66 research outputs found

    Multichannel source separation and tracking with phase differences by random sample consensus

    Get PDF
    Blind audio source separation (BASS) is a fascinating problem that has been tackled from many different angles. The use case of interest in this thesis is that of multiple moving and simultaneously-active speakers in a reverberant room. This is a common situation, for example, in social gatherings. We human beings have the remarkable ability to focus attention on a particular speaker while effectively ignoring the rest. This is referred to as the ``cocktail party effect'' and has been the holy grail of source separation for many decades. Replicating this feat in real-time with a machine is the goal of BASS. Single-channel methods attempt to identify the individual speakers from a single recording. However, with the advent of hand-held consumer electronics, techniques based on microphone array processing are becoming increasingly popular. Multichannel methods record a sound field from various locations to incorporate spatial information. If the speakers move over time, we need an algorithm capable of tracking their positions in the room. For compact arrays with 1-10 cm of separation between the microphones, this can be accomplished by applying a temporal filter on estimates of the directions-of-arrival (DOA) of the speakers. In this thesis, we review recent work on BSS with inter-channel phase difference (IPD) features and provide extensions to the case of moving speakers. It is shown that IPD features compose a noisy circular-linear dataset. This data is clustered with the RANdom SAmple Consensus (RANSAC) algorithm in the presence of strong reverberation to simultaneously localize and separate speakers. The remarkable performance of RANSAC is due to its natural tendency to reject outliers. To handle the case of non-stationary speakers, a factorial wrapped Kalman filter (FWKF) and a factorial von Mises-Fisher particle filter (FvMFPF) are proposed that track source DOAs directly on the unit circle and unit sphere, respectively. These algorithms combine directional statistics, Bayesian filtering theory, and probabilistic data association techniques to track the speakers with mixtures of directional distributions

    Phase difference and tensor factorization models for audio source separation

    Get PDF
    Audio source separation is a well-known problem in the speech community. Many methods have been proposed to isolate speech signals from a multichannel mixture. In this thesis, we will explore a number of techniques involving interchannel phase difference (IPD) features within a tensor factorization framework. IPD features can be extracted on a time-frequency (TF) grid and are a function of the phase characteristics of the mixing process. Thus, the ultimate goal is to form a clustering of these features and produce TF masks that can be used to perform the separation. We discuss various non-tensor-based methods that are capable of modeling linear and nonlinear IPD trends. Then, we discuss generalizations to both nonnegative and complex tensor factorizations (NTF, CTF). We show that each method performs best in certain circumstances and we conclude by saying that more work is needed to devise a generally superior approach

    Enhancing 3D Visual Odometry with Single-Camera Stereo Omnidirectional Systems

    Full text link
    We explore low-cost solutions for efficiently improving the 3D pose estimation problem of a single camera moving in an unfamiliar environment. The visual odometry (VO) task -- as it is called when using computer vision to estimate egomotion -- is of particular interest to mobile robots as well as humans with visual impairments. The payload capacity of small robots like micro-aerial vehicles (drones) requires the use of portable perception equipment, which is constrained by size, weight, energy consumption, and processing power. Using a single camera as the passive sensor for the VO task satisfies these requirements, and it motivates the proposed solutions presented in this thesis. To deliver the portability goal with a single off-the-shelf camera, we have taken two approaches: The first one, and the most extensively studied here, revolves around an unorthodox camera-mirrors configuration (catadioptrics) achieving a stereo omnidirectional system (SOS). The second approach relies on expanding the visual features from the scene into higher dimensionalities to track the pose of a conventional camera in a photogrammetric fashion. The first goal has many interdependent challenges, which we address as part of this thesis: SOS design, projection model, adequate calibration procedure, and application to VO. We show several practical advantages for the single-camera SOS due to its complete 360-degree stereo views, that other conventional 3D sensors lack due to their limited field of view. Since our omnidirectional stereo (omnistereo) views are captured by a single camera, a truly instantaneous pair of panoramic images is possible for 3D perception tasks. Finally, we address the VO problem as a direct multichannel tracking approach, which increases the pose estimation accuracy of the baseline method (i.e., using only grayscale or color information) under the photometric error minimization as the heart of the “direct” tracking algorithm. Currently, this solution has been tested on standard monocular cameras, but it could also be applied to an SOS. We believe the challenges that we attempted to solve have not been considered previously with the level of detail needed for successfully performing VO with a single camera as the ultimate goal in both real-life and simulated scenes

    Localization using Distance Geometry : Minimal Solvers and Robust Methods for Sensor Network Self-Calibration

    Get PDF
    In this thesis, we focus on the problem of estimating receiver and sender node positions given some form of distance measurements between them. This kind of localization problem has several applications, e.g., global and indoor positioning, sensor network calibration, molecular conformations, data visualization, graph embedding, and robot kinematics. More concretely, this thesis makes contributions in three different areas.First, we present a method for simultaneously registering and merging maps. The merging problem occurs when multiple maps of an area have been constructed and need to be combined into a single representation. If there are no absolute references and the maps are in different coordinate systems, they also need to be registered. In the second part, we construct robust methods for sensor network self-calibration using both Time of Arrival (TOA) and Time Difference of Arrival (TDOA) measurements. One of the difficulties is that corrupt measurements, so-called outliers, are present and should be excluded from the model fitting. To achieve this, we use hypothesis-and-test frameworks together with minimal solvers, resulting in methods that are robust to noise, outliers, and missing data. Several new minimal solvers are introduced to accommodate a range of receiver and sender configurations in 2D and 3D space. These solvers are formulated as polynomial equation systems which are solvedusing methods from algebraic geometry.In the third part, we focus specifically on the problems of trilateration and multilateration, and we present a method that approximates the Maximum Likelihood (ML) estimator for different noise distributions. The proposed approach reduces to an eigendecomposition problem for which there are good solvers. This results in a method that is faster and more numerically stable than the state-of-the-art, while still being easy to implement. Furthermore, we present a robust trilateration method that incorporates a motion model. This enables the removal of outliers in the distance measurements at the same time as drift in the motion model is canceled
    corecore