681 research outputs found

    Co-Localization of Audio Sources in Images Using Binaural Features and Locally-Linear Regression

    Get PDF
    This paper addresses the problem of localizing audio sources using binaural measurements. We propose a supervised formulation that simultaneously localizes multiple sources at different locations. The approach is intrinsically efficient because, contrary to prior work, it relies neither on source separation, nor on monaural segregation. The method starts with a training stage that establishes a locally-linear Gaussian regression model between the directional coordinates of all the sources and the auditory features extracted from binaural measurements. While fixed-length wide-spectrum sounds (white noise) are used for training to reliably estimate the model parameters, we show that the testing (localization) can be extended to variable-length sparse-spectrum sounds (such as speech), thus enabling a wide range of realistic applications. Indeed, we demonstrate that the method can be used for audio-visual fusion, namely to map speech signals onto images and hence to spatially align the audio and visual modalities, thus enabling to discriminate between speaking and non-speaking faces. We release a novel corpus of real-room recordings that allow quantitative evaluation of the co-localization method in the presence of one or two sound sources. Experiments demonstrate increased accuracy and speed relative to several state-of-the-art methods.Comment: 15 pages, 8 figure

    Multichannel Speech Separation and Enhancement Using the Convolutive Transfer Function

    Get PDF
    This paper addresses the problem of speech separation and enhancement from multichannel convolutive and noisy mixtures, \emph{assuming known mixing filters}. We propose to perform the speech separation and enhancement task in the short-time Fourier transform domain, using the convolutive transfer function (CTF) approximation. Compared to time-domain filters, CTF has much less taps, consequently it has less near-common zeros among channels and less computational complexity. The work proposes three speech-source recovery methods, namely: i) the multichannel inverse filtering method, i.e. the multiple input/output inverse theorem (MINT), is exploited in the CTF domain, and for the multi-source case, ii) a beamforming-like multichannel inverse filtering method applying single source MINT and using power minimization, which is suitable whenever the source CTFs are not all known, and iii) a constrained Lasso method, where the sources are recovered by minimizing the â„“1\ell_1-norm to impose their spectral sparsity, with the constraint that the â„“2\ell_2-norm fitting cost, between the microphone signals and the mixing model involving the unknown source signals, is less than a tolerance. The noise can be reduced by setting a tolerance onto the noise power. Experiments under various acoustic conditions are carried out to evaluate the three proposed methods. The comparison between them as well as with the baseline methods is presented.Comment: Submitted to IEEE/ACM Transactions on Audio, Speech and Language Processin

    Development of a Multi-Projection Approach for Global Web Map Visualization

    Get PDF
    The popularity of web mapping services such as Google and Bing Maps is growing. However, professional users experience several limitations while using these on-line mapping services. The first problem is the limited global coverage. The coverage ends at latitude of 85° north and south. The second problem is the systematic distortion that increases with latitude. For example, in Google Maps Greenland appears to be larger than South America, whereas in reality Greenland is 8 times smaller. The third problem is the lack of mathematical rigour for the cartographic projections because the Earth is treated as a sphere instead of an ellipsoid. Thus, a better web mapping system is needed for professional users and users interested in polar regions. This thesis presents a multi-projection approach for global web map visualization. The multi-projection approach minimizes the cartographic distortions by using different projections across the globe and for ranges of mapping detail levels

    Design, development and evaluation of the ruggedized edge computing node (RECON)

    Get PDF
    The increased quality and quantity of sensors provide an ever-increasing capability to collect large quantities of high-quality data in the field. Research devoted to translating that data is progressing rapidly; however, translating field data into usable information can require high performance computing capabilities. While high performance computing (HPC) resources are available in centralized facilities, bandwidth, latency, security and other limitations inherent to edge location in field sensor applications may prevent HPC resources from being used in a timely fashion necessary for potential United States Army Corps of Engineers (USACE) field applications. To address these limitations, the design requirements for RECON are established and derived from a review of edge computing, in order to develop and evaluate a novel high-power, field-deployable HPC platform capable of operating in austere environments at the edge

    Online Localization and Tracking of Multiple Moving Speakers in Reverberant Environments

    Get PDF
    We address the problem of online localization and tracking of multiple moving speakers in reverberant environments. The paper has the following contributions. We use the direct-path relative transfer function (DP-RTF), an inter-channel feature that encodes acoustic information robust against reverberation, and we propose an online algorithm well suited for estimating DP-RTFs associated with moving audio sources. Another crucial ingredient of the proposed method is its ability to properly assign DP-RTFs to audio-source directions. Towards this goal, we adopt a maximum-likelihood formulation and we propose to use an exponentiated gradient (EG) to efficiently update source-direction estimates starting from their currently available values. The problem of multiple speaker tracking is computationally intractable because the number of possible associations between observed source directions and physical speakers grows exponentially with time. We adopt a Bayesian framework and we propose a variational approximation of the posterior filtering distribution associated with multiple speaker tracking, as well as an efficient variational expectation-maximization (VEM) solver. The proposed online localization and tracking method is thoroughly evaluated using two datasets that contain recordings performed in real environments.Comment: IEEE Journal of Selected Topics in Signal Processing, 201

    A single server Markovian queuing system with limited buffer and reverse balking

    Get PDF
    The phenomena are balking can be said to have been observed when a customer who has arrived into queuing system decides not to join it. Reverse balking is a particular type of balking wherein the probability that a customer will balk goes down as the system size goes up and vice versa. Such behavior can be observed in investment firms (insurance company, Mutual Fund Company, banks etc.). As the number of customers in the firm goes up, it creates trust among potential investors. Fewer customers would like to balk as the number of customers goes up. In this paper, we develop an M/M/1/k queuing system with reverse balking. The steady-state probabilities of the model are obtained and closed forms of expression of a number of performance measures are derived

    Informed Source Separation from compressed mixtures using spatial wiener filter and quantization noise estimation

    No full text
    International audienceIn a previous work, we proposed an Informed Source Separation sys- tem based on Wiener filtering for active listening of music from un- compressed (16-bit PCM) multichannel mix signals. In the present work, the system is improved to work with (MPEG-2 AAC) com- pressed mix signals: quantization noise is estimated from the AAC bitstream at the decoder and explicitly taken into account in the source separation process. Also a direct MDCT-to-STFT transform is used to optimize the computational efficiency of the process in the STFT domain from AAC-decoded MDCT coefficients

    Semi-supervised multichannel speech enhancement with variational autoencoders and non-negative matrix factorization

    Get PDF
    In this paper we address speaker-independent multichannel speech enhancement in unknown noisy environments. Our work is based on a well-established multichannel local Gaussian modeling framework. We propose to use a neural network for modeling the speech spectro-temporal content. The parameters of this supervised model are learned using the framework of variational autoencoders. The noisy recording environment is supposed to be unknown, so the noise spectro-temporal modeling remains unsupervised and is based on non-negative matrix factorization (NMF). We develop a Monte Carlo expectation-maximization algorithm and we experimentally show that the proposed approach outperforms its NMF-based counterpart, where speech is modeled using supervised NMF.Comment: 5 pages, 2 figures, audio examples and code available online at https://team.inria.fr/perception/icassp-2019-mvae
    • …
    corecore