8,403 research outputs found

    Distributed and adaptive location identification system for mobile devices

    Full text link
    Indoor location identification and navigation need to be as simple, seamless, and ubiquitous as its outdoor GPS-based counterpart is. It would be of great convenience to the mobile user to be able to continue navigating seamlessly as he or she moves from a GPS-clear outdoor environment into an indoor environment or a GPS-obstructed outdoor environment such as a tunnel or forest. Existing infrastructure-based indoor localization systems lack such capability, on top of potentially facing several critical technical challenges such as increased cost of installation, centralization, lack of reliability, poor localization accuracy, poor adaptation to the dynamics of the surrounding environment, latency, system-level and computational complexities, repetitive labor-intensive parameter tuning, and user privacy. To this end, this paper presents a novel mechanism with the potential to overcome most (if not all) of the abovementioned challenges. The proposed mechanism is simple, distributed, adaptive, collaborative, and cost-effective. Based on the proposed algorithm, a mobile blind device can potentially utilize, as GPS-like reference nodes, either in-range location-aware compatible mobile devices or preinstalled low-cost infrastructure-less location-aware beacon nodes. The proposed approach is model-based and calibration-free that uses the received signal strength to periodically and collaboratively measure and update the radio frequency characteristics of the operating environment to estimate the distances to the reference nodes. Trilateration is then used by the blind device to identify its own location, similar to that used in the GPS-based system. Simulation and empirical testing ascertained that the proposed approach can potentially be the core of future indoor and GPS-obstructed environments

    Block-Online Multi-Channel Speech Enhancement Using DNN-Supported Relative Transfer Function Estimates

    Get PDF
    This work addresses the problem of block-online processing for multi-channel speech enhancement. Such processing is vital in scenarios with moving speakers and/or when very short utterances are processed, e.g., in voice assistant scenarios. We consider several variants of a system that performs beamforming supported by DNN-based voice activity detection (VAD) followed by post-filtering. The speaker is targeted through estimating relative transfer functions between microphones. Each block of the input signals is processed independently in order to make the method applicable in highly dynamic environments. Owing to the short length of the processed block, the statistics required by the beamformer are estimated less precisely. The influence of this inaccuracy is studied and compared to the processing regime when recordings are treated as one block (batch processing). The experimental evaluation of the proposed method is performed on large datasets of CHiME-4 and on another dataset featuring moving target speaker. The experiments are evaluated in terms of objective and perceptual criteria (such as signal-to-interference ratio (SIR) or perceptual evaluation of speech quality (PESQ), respectively). Moreover, word error rate (WER) achieved by a baseline automatic speech recognition system is evaluated, for which the enhancement method serves as a front-end solution. The results indicate that the proposed method is robust with respect to short length of the processed block. Significant improvements in terms of the criteria and WER are observed even for the block length of 250 ms.Comment: 10 pages, 8 figures, 4 tables. Modified version of the article accepted for publication in IET Signal Processing journal. Original results unchanged, additional experiments presented, refined discussion and conclusion

    Blind MultiChannel Identification and Equalization for Dereverberation and Noise Reduction based on Convolutive Transfer Function

    Get PDF
    This paper addresses the problems of blind channel identification and multichannel equalization for speech dereverberation and noise reduction. The time-domain cross-relation method is not suitable for blind room impulse response identification, due to the near-common zeros of the long impulse responses. We extend the cross-relation method to the short-time Fourier transform (STFT) domain, in which the time-domain impulse responses are approximately represented by the convolutive transfer functions (CTFs) with much less coefficients. The CTFs suffer from the common zeros caused by the oversampled STFT. We propose to identify CTFs based on the STFT with the oversampled signals and the critical sampled CTFs, which is a good compromise between the frequency aliasing of the signals and the common zeros problem of CTFs. In addition, a normalization of the CTFs is proposed to remove the gain ambiguity across sub-bands. In the STFT domain, the identified CTFs is used for multichannel equalization, in which the sparsity of speech signals is exploited. We propose to perform inverse filtering by minimizing the â„“1\ell_1-norm of the source signal with the relaxed â„“2\ell_2-norm fitting error between the micophone signals and the convolution of the estimated source signal and the CTFs used as a constraint. This method is advantageous in that the noise can be reduced by relaxing the â„“2\ell_2-norm to a tolerance corresponding to the noise power, and the tolerance can be automatically set. The experiments confirm the efficiency of the proposed method even under conditions with high reverberation levels and intense noise.Comment: 13 pages, 5 figures, 5 table

    Joint Tensor Factorization and Outlying Slab Suppression with Applications

    Full text link
    We consider factoring low-rank tensors in the presence of outlying slabs. This problem is important in practice, because data collected in many real-world applications, such as speech, fluorescence, and some social network data, fit this paradigm. Prior work tackles this problem by iteratively selecting a fixed number of slabs and fitting, a procedure which may not converge. We formulate this problem from a group-sparsity promoting point of view, and propose an alternating optimization framework to handle the corresponding ℓp\ell_p (0<p≤10<p\leq 1) minimization-based low-rank tensor factorization problem. The proposed algorithm features a similar per-iteration complexity as the plain trilinear alternating least squares (TALS) algorithm. Convergence of the proposed algorithm is also easy to analyze under the framework of alternating optimization and its variants. In addition, regularization and constraints can be easily incorporated to make use of \emph{a priori} information on the latent loading factors. Simulations and real data experiments on blind speech separation, fluorescence data analysis, and social network mining are used to showcase the effectiveness of the proposed algorithm

    Robust Reduced-Rank Adaptive Processing Based on Parallel Subgradient Projection and Krylov Subspace Techniques

    Full text link
    In this paper, we propose a novel reduced-rank adaptive filtering algorithm by blending the idea of the Krylov subspace methods with the set-theoretic adaptive filtering framework. Unlike the existing Krylov-subspace-based reduced-rank methods, the proposed algorithm tracks the optimal point in the sense of minimizing the \sinq{true} mean square error (MSE) in the Krylov subspace, even when the estimated statistics become erroneous (e.g., due to sudden changes of environments). Therefore, compared with those existing methods, the proposed algorithm is more suited to adaptive filtering applications. The algorithm is analyzed based on a modified version of the adaptive projected subgradient method (APSM). Numerical examples demonstrate that the proposed algorithm enjoys better tracking performance than the existing methods for the interference suppression problem in code-division multiple-access (CDMA) systems as well as for simple system identification problems.Comment: 10 figures. In IEEE Transactions on Signal Processing, 201

    Multichannel Speech Separation and Enhancement Using the Convolutive Transfer Function

    Get PDF
    This paper addresses the problem of speech separation and enhancement from multichannel convolutive and noisy mixtures, \emph{assuming known mixing filters}. We propose to perform the speech separation and enhancement task in the short-time Fourier transform domain, using the convolutive transfer function (CTF) approximation. Compared to time-domain filters, CTF has much less taps, consequently it has less near-common zeros among channels and less computational complexity. The work proposes three speech-source recovery methods, namely: i) the multichannel inverse filtering method, i.e. the multiple input/output inverse theorem (MINT), is exploited in the CTF domain, and for the multi-source case, ii) a beamforming-like multichannel inverse filtering method applying single source MINT and using power minimization, which is suitable whenever the source CTFs are not all known, and iii) a constrained Lasso method, where the sources are recovered by minimizing the â„“1\ell_1-norm to impose their spectral sparsity, with the constraint that the â„“2\ell_2-norm fitting cost, between the microphone signals and the mixing model involving the unknown source signals, is less than a tolerance. The noise can be reduced by setting a tolerance onto the noise power. Experiments under various acoustic conditions are carried out to evaluate the three proposed methods. The comparison between them as well as with the baseline methods is presented.Comment: Submitted to IEEE/ACM Transactions on Audio, Speech and Language Processin

    Online Localization and Tracking of Multiple Moving Speakers in Reverberant Environments

    Get PDF
    We address the problem of online localization and tracking of multiple moving speakers in reverberant environments. The paper has the following contributions. We use the direct-path relative transfer function (DP-RTF), an inter-channel feature that encodes acoustic information robust against reverberation, and we propose an online algorithm well suited for estimating DP-RTFs associated with moving audio sources. Another crucial ingredient of the proposed method is its ability to properly assign DP-RTFs to audio-source directions. Towards this goal, we adopt a maximum-likelihood formulation and we propose to use an exponentiated gradient (EG) to efficiently update source-direction estimates starting from their currently available values. The problem of multiple speaker tracking is computationally intractable because the number of possible associations between observed source directions and physical speakers grows exponentially with time. We adopt a Bayesian framework and we propose a variational approximation of the posterior filtering distribution associated with multiple speaker tracking, as well as an efficient variational expectation-maximization (VEM) solver. The proposed online localization and tracking method is thoroughly evaluated using two datasets that contain recordings performed in real environments.Comment: IEEE Journal of Selected Topics in Signal Processing, 201

    Redundancy in block coded modulations for channel equalization based on spatial and temporal diversity

    Get PDF
    Linear block codes in the complex field can be applied in spatial and/or temporal diversity receivers in order to develop high performance schemes for (almost-) blind equalization in mobile communications. The proposed technique uses the structure of the encoded transmitted information (with redundancy) to achieve equalization schemes based on a deterministic criterion. Simulations show that the proposed technique is more efficient than other schemes that follow similar equalizer structures. The result is an algorithm that provides the design of channel equalizers in low EbNo scenarios.Peer ReviewedPostprint (published version
    • …
    corecore