39,159 research outputs found

    Block-Online Multi-Channel Speech Enhancement Using DNN-Supported Relative Transfer Function Estimates

    Get PDF
    This work addresses the problem of block-online processing for multi-channel speech enhancement. Such processing is vital in scenarios with moving speakers and/or when very short utterances are processed, e.g., in voice assistant scenarios. We consider several variants of a system that performs beamforming supported by DNN-based voice activity detection (VAD) followed by post-filtering. The speaker is targeted through estimating relative transfer functions between microphones. Each block of the input signals is processed independently in order to make the method applicable in highly dynamic environments. Owing to the short length of the processed block, the statistics required by the beamformer are estimated less precisely. The influence of this inaccuracy is studied and compared to the processing regime when recordings are treated as one block (batch processing). The experimental evaluation of the proposed method is performed on large datasets of CHiME-4 and on another dataset featuring moving target speaker. The experiments are evaluated in terms of objective and perceptual criteria (such as signal-to-interference ratio (SIR) or perceptual evaluation of speech quality (PESQ), respectively). Moreover, word error rate (WER) achieved by a baseline automatic speech recognition system is evaluated, for which the enhancement method serves as a front-end solution. The results indicate that the proposed method is robust with respect to short length of the processed block. Significant improvements in terms of the criteria and WER are observed even for the block length of 250 ms.Comment: 10 pages, 8 figures, 4 tables. Modified version of the article accepted for publication in IET Signal Processing journal. Original results unchanged, additional experiments presented, refined discussion and conclusion

    2D Reconstruction of Small Intestine's Interior Wall

    Full text link
    Examining and interpreting of a large number of wireless endoscopic images from the gastrointestinal tract is a tiresome task for physicians. A practical solution is to automatically construct a two dimensional representation of the gastrointestinal tract for easy inspection. However, little has been done on wireless endoscopic image stitching, let alone systematic investigation. The proposed new wireless endoscopic image stitching method consists of two main steps to improve the accuracy and efficiency of image registration. First, the keypoints are extracted by Principle Component Analysis and Scale Invariant Feature Transform (PCA-SIFT) algorithm and refined with Maximum Likelihood Estimation SAmple Consensus (MLESAC) outlier removal to find the most reliable keypoints. Second, the optimal transformation parameters obtained from first step are fed to the Normalised Mutual Information (NMI) algorithm as an initial solution. With modified Marquardt-Levenberg search strategy in a multiscale framework, the NMI can find the optimal transformation parameters in the shortest time. The proposed methodology has been tested on two different datasets - one with real wireless endoscopic images and another with images obtained from Micro-Ball (a new wireless cubic endoscopy system with six image sensors). The results have demonstrated the accuracy and robustness of the proposed methodology both visually and quantitatively.Comment: Journal draf

    DPC-Net: Deep Pose Correction for Visual Localization

    Full text link
    We present a novel method to fuse the power of deep networks with the computational efficiency of geometric and probabilistic localization algorithms. In contrast to other methods that completely replace a classical visual estimator with a deep network, we propose an approach that uses a convolutional neural network to learn difficult-to-model corrections to the estimator from ground-truth training data. To this end, we derive a novel loss function for learning SE(3) corrections based on a matrix Lie groups approach, with a natural formulation for balancing translation and rotation errors. We use this loss to train a Deep Pose Correction network (DPC-Net) that predicts corrections for a particular estimator, sensor and environment. Using the KITTI odometry dataset, we demonstrate significant improvements to the accuracy of a computationally-efficient sparse stereo visual odometry pipeline, that render it as accurate as a modern computationally-intensive dense estimator. Further, we show how DPC-Net can be used to mitigate the effect of poorly calibrated lens distortion parameters.Comment: In IEEE Robotics and Automation Letters (RA-L) and presented at the IEEE International Conference on Robotics and Automation (ICRA'18), Brisbane, Australia, May 21-25, 201
    • …
    corecore