52,602 research outputs found

    Information-Theoretic Active Learning for Content-Based Image Retrieval

    Full text link
    We propose Information-Theoretic Active Learning (ITAL), a novel batch-mode active learning method for binary classification, and apply it for acquiring meaningful user feedback in the context of content-based image retrieval. Instead of combining different heuristics such as uncertainty, diversity, or density, our method is based on maximizing the mutual information between the predicted relevance of the images and the expected user feedback regarding the selected batch. We propose suitable approximations to this computationally demanding problem and also integrate an explicit model of user behavior that accounts for possible incorrect labels and unnameable instances. Furthermore, our approach does not only take the structure of the data but also the expected model output change caused by the user feedback into account. In contrast to other methods, ITAL turns out to be highly flexible and provides state-of-the-art performance across various datasets, such as MIRFLICKR and ImageNet.Comment: GCPR 2018 paper (14 pages text + 2 pages references + 6 pages appendix

    Block-Online Multi-Channel Speech Enhancement Using DNN-Supported Relative Transfer Function Estimates

    Get PDF
    This work addresses the problem of block-online processing for multi-channel speech enhancement. Such processing is vital in scenarios with moving speakers and/or when very short utterances are processed, e.g., in voice assistant scenarios. We consider several variants of a system that performs beamforming supported by DNN-based voice activity detection (VAD) followed by post-filtering. The speaker is targeted through estimating relative transfer functions between microphones. Each block of the input signals is processed independently in order to make the method applicable in highly dynamic environments. Owing to the short length of the processed block, the statistics required by the beamformer are estimated less precisely. The influence of this inaccuracy is studied and compared to the processing regime when recordings are treated as one block (batch processing). The experimental evaluation of the proposed method is performed on large datasets of CHiME-4 and on another dataset featuring moving target speaker. The experiments are evaluated in terms of objective and perceptual criteria (such as signal-to-interference ratio (SIR) or perceptual evaluation of speech quality (PESQ), respectively). Moreover, word error rate (WER) achieved by a baseline automatic speech recognition system is evaluated, for which the enhancement method serves as a front-end solution. The results indicate that the proposed method is robust with respect to short length of the processed block. Significant improvements in terms of the criteria and WER are observed even for the block length of 250 ms.Comment: 10 pages, 8 figures, 4 tables. Modified version of the article accepted for publication in IET Signal Processing journal. Original results unchanged, additional experiments presented, refined discussion and conclusion

    Dynamical and Stationary Properties of On-line Learning from Finite Training Sets

    Full text link
    The dynamical and stationary properties of on-line learning from finite training sets are analysed using the cavity method. For large input dimensions, we derive equations for the macroscopic parameters, namely, the student-teacher correlation, the student-student autocorrelation and the learning force uctuation. This enables us to provide analytical solutions to Adaline learning as a benchmark. Theoretical predictions of training errors in transient and stationary states are obtained by a Monte Carlo sampling procedure. Generalization and training errors are found to agree with simulations. The physical origin of the critical learning rate is presented. Comparison with batch learning is discussed throughout the paper.Comment: 30 pages, 4 figure
    • …
    corecore