36,793 research outputs found

    Jointly Tracking and Separating Speech Sources Using Multiple Features and the generalized labeled multi-Bernoulli Framework

    Full text link
    This paper proposes a novel joint multi-speaker tracking-and-separation method based on the generalized labeled multi-Bernoulli (GLMB) multi-target tracking filter, using sound mixtures recorded by microphones. Standard multi-speaker tracking algorithms usually only track speaker locations, and ambiguity occurs when speakers are spatially close. The proposed multi-feature GLMB tracking filter treats the set of vectors of associated speaker features (location, pitch and sound) as the multi-target multi-feature observation, characterizes transitioning features with corresponding transition models and overall likelihood function, thus jointly tracks and separates each multi-feature speaker, and addresses the spatial ambiguity problem. Numerical evaluation verifies that the proposed method can correctly track locations of multiple speakers and meanwhile separate speech signals

    Raking the Cocktail Party

    Get PDF
    We present the concept of an acoustic rake receiver---a microphone beamformer that uses echoes to improve the noise and interference suppression. The rake idea is well-known in wireless communications; it involves constructively combining different multipath components that arrive at the receiver antennas. Unlike spread-spectrum signals used in wireless communications, speech signals are not orthogonal to their shifts. Therefore, we focus on the spatial structure, rather than temporal. Instead of explicitly estimating the channel, we create correspondences between early echoes in time and image sources in space. These multiple sources of the desired and the interfering signal offer additional spatial diversity that we can exploit in the beamformer design. We present several "intuitive" and optimal formulations of acoustic rake receivers, and show theoretically and numerically that the rake formulation of the maximum signal-to-interference-and-noise beamformer offers significant performance boosts in terms of noise and interference suppression. Beyond signal-to-noise ratio, we observe gains in terms of the \emph{perceptual evaluation of speech quality} (PESQ) metric for the speech quality. We accompany the paper by the complete simulation and processing chain written in Python. The code and the sound samples are available online at \url{http://lcav.github.io/AcousticRakeReceiver/}.Comment: 12 pages, 11 figures, Accepted for publication in IEEE Journal on Selected Topics in Signal Processing (Special Issue on Spatial Audio

    Compressive Matched-Field Processing

    Full text link
    Source localization by matched-field processing (MFP) generally involves solving a number of computationally intensive partial differential equations. This paper introduces a technique that mitigates this computational workload by "compressing" these computations. Drawing on key concepts from the recently developed field of compressed sensing, it shows how a low-dimensional proxy for the Green's function can be constructed by backpropagating a small set of random receiver vectors. Then, the source can be located by performing a number of "short" correlations between this proxy and the projection of the recorded acoustic data in the compressed space. Numerical experiments in a Pekeris ocean waveguide are presented which demonstrate that this compressed version of MFP is as effective as traditional MFP even when the compression is significant. The results are particularly promising in the broadband regime where using as few as two random backpropagations per frequency performs almost as well as the traditional broadband MFP, but with the added benefit of generic applicability. That is, the computationally intensive backpropagations may be computed offline independently from the received signals, and may be reused to locate any source within the search grid area

    Improving acoustic vehicle classification by information fusion

    No full text
    We present an information fusion approach for ground vehicle classification based on the emitted acoustic signal. Many acoustic factors can contribute to the classification accuracy of working ground vehicles. Classification relying on a single feature set may lose some useful information if its underlying sound production model is not comprehensive. To improve classification accuracy, we consider an information fusion diagram, in which various aspects of an acoustic signature are taken into account and emphasized separately by two different feature extraction methods. The first set of features aims to represent internal sound production, and a number of harmonic components are extracted to characterize the factors related to the vehicle’s resonance. The second set of features is extracted based on a computationally effective discriminatory analysis, and a group of key frequency components are selected by mutual information, accounting for the sound production from the vehicle’s exterior parts. In correspondence with this structure, we further put forward a modifiedBayesian fusion algorithm, which takes advantage of matching each specific feature set with its favored classifier. To assess the proposed approach, experiments are carried out based on a data set containing acoustic signals from different types of vehicles. Results indicate that the fusion approach can effectively increase classification accuracy compared to that achieved using each individual features set alone. The Bayesian-based decision level fusion is found fusion is found to be improved than a feature level fusion approac

    Scan and paint: theory and practice of a sound field visualization method

    No full text
    Sound visualization techniques have played a key role in the development of acoustics throughout history. The development of measurement apparatus and techniques for displaying sound and vibration phenomena has provided excellent tools for building understanding about specific problems. Traditional methods, such as step-by-step measurements or simultaneous multichannel systems, have a strong tradeoff between time requirements, flexibility, and cost. However, if the sound field can be assumed time stationary, scanning methods allow us to assess variations across space with a single transducer, as long as the position of the sensor is known. The proposed technique, Scan and Paint, is based on the acquisition of sound pressure and particle velocity by manually moving a P-U probe (pressure-particle velocity sensors) across a sound field whilst filming the event with a camera. The sensor position is extracted by applying automatic color tracking to each frame of the recorded video. It is then possible to visualize sound variations across the space in terms of sound pressure, particle velocity, or acoustic intensity. In this paper, not only the theoretical foundations of the method, but also its practical applications are explored such as scanning transfer path analysis, source radiation characterization, operational deflection shapes, virtual phased arrays, material characterization, and acoustic intensity vector field mapping
    • …
    corecore