56 research outputs found

    Speech dereverberation and speaker separation using microphone arrays in realistic environments

    Get PDF
    This thesis concentrates on comparing novel and existing dereverberation and speaker separation techniques using multiple corpora, including a new corpus collected using a microphone array. Many corpora currently used for these techniques are recorded using head-mounted microphones in anechoic chambers. This novel corpus contains recordings with noise and reverberation made in office and workshop environments. Novel algorithms present a different way of approximating the reverberation, producing results that are competitive with existing algorithms. Dereverberation is evaluated using seven correlation-based algorithms and applied to two different corpora. Three of these are novel algorithms (Hs NTF, Cauchy WPE and Cauchy MIMO WPE). Both non-learning and learning algorithms are tested, with the learning algorithms performing better. For single and multi-channel speaker separation, unsupervised non-negative matrix factorization (NMF) algorithms are compared using three cost functions combined with sparsity, convolution and direction of arrival. The results show that the choice of cost function is important for improving the separation result. Furthermore, six different supervised deep learning algorithms are applied to single channel speaker separation. Historic information improves the result. When comparing NMF to deep learning, NMF is able to converge faster to a solution and provides a better result for the corpora used in this thesis

    Single- and multi-microphone speech dereverberation using spectral enhancement

    Get PDF
    In speech communication systems, such as voice-controlled systems, hands-free mobile telephones, and hearing aids, the received microphone signals are degraded by room reverberation, background noise, and other interferences. This signal degradation may lead to total unintelligibility of the speech and decreases the performance of automatic speech recognition systems. In the context of this work reverberation is the process of multi-path propagation of an acoustic sound from its source to one or more microphones. The received microphone signal generally consists of a direct sound, reflections that arrive shortly after the direct sound (commonly called early reverberation), and reflections that arrive after the early reverberation (commonly called late reverberation). Reverberant speech can be described as sounding distant with noticeable echo and colouration. These detrimental perceptual effects are primarily caused by late reverberation, and generally increase with increasing distance between the source and microphone. Conversely, early reverberations tend to improve the intelligibility of speech. In combination with the direct sound it is sometimes referred to as the early speech component. Reduction of the detrimental effects of reflections is evidently of considerable practical importance, and is the focus of this dissertation. More specifically the dissertation deals with dereverberation techniques, i.e., signal processing techniques to reduce the detrimental effects of reflections. In the dissertation, novel single- and multimicrophone speech dereverberation algorithms are developed that aim at the suppression of late reverberation, i.e., at estimation of the early speech component. This is done via so-called spectral enhancement techniques that require a specific measure of the late reverberant signal. This measure, called spectral variance, can be estimated directly from the received (possibly noisy) reverberant signal(s) using a statistical reverberation model and a limited amount of a priori knowledge about the acoustic channel(s) between the source and the microphone(s). In our work an existing single-channel statistical reverberation model serves as a starting point. The model is characterized by one parameter that depends on the acoustic characteristics of the environment. We show that the spectral variance estimator that is based on this model, can only be used when the source-microphone distance is larger than the so-called critical distance. This is, crudely speaking, the distance where the direct sound power is equal to the total reflective power. A generalization of the statistical reverberation model in which the direct sound is incorporated is developed. This model requires one additional parameter that is related to the ratio between the direct sound energy and the sound energy of all reflections. The generalized model is used to derive a novel spectral variance estimator. When the novel estimator is used for dereverberation rather than the existing estimator, and the source-microphone distance is smaller than the critical distance, the dereverberation performance is significantly increased. Single-microphone systems only exploit the temporal and spectral diversity of the received signal. Reverberation, of course, also induces spatial diversity. To additionally exploit this diversity, multiple microphones must be used, and their outputs must be combined by a suitable spatial processor such as the so-called delay and sum beamformer. It is not a priori evident whether spectral enhancement is best done before or after the spatial processor. For this reason we investigate both possibilities, as well as a merge of the spatial processor and the spectral enhancement technique. An advantage of the latter option is that the spectral variance estimator can be further improved. Our experiments show that the use of multiple microphones affords a significant improvement of the perceptual speech quality. The applicability of the theory developed in this dissertation is demonstrated using a hands-free communication system. Since hands-free systems are often used in a noisy and reverberant environment, the received microphone signal does not only contain the desired signal but also interferences such as room reverberation that is caused by the desired source, background noise, and a far-end echo signal that results from a sound that is produced by the loudspeaker. Usually an acoustic echo canceller is used to cancel the far-end echo. Additionally a post-processor is used to suppress background noise and residual echo, i.e., echo which could not be cancelled by the echo canceller. In this work a novel structure and post-processor for an acoustic echo canceller are developed. The post-processor suppresses late reverberation caused by the desired source, residual echo, and background noise. The late reverberation and late residual echo are estimated using the generalized statistical reverberation model. Experimental results convincingly demonstrate the benefits of the proposed system for suppressing late reverberation, residual echo and background noise. The proposed structure and post-processor have a low computational complexity, a highly modular structure, can be seamlessly integrated into existing hands-free communication systems, and affords a significant increase of the listening comfort and speech intelligibility

    SpectralDiff: A Generative Framework for Hyperspectral Image Classification with Diffusion Models

    Full text link
    Hyperspectral Image (HSI) classification is an important issue in remote sensing field with extensive applications in earth science. In recent years, a large number of deep learning-based HSI classification methods have been proposed. However, existing methods have limited ability to handle high-dimensional, highly redundant, and complex data, making it challenging to capture the spectral-spatial distributions of data and relationships between samples. To address this issue, we propose a generative framework for HSI classification with diffusion models (SpectralDiff) that effectively mines the distribution information of high-dimensional and highly redundant data by iteratively denoising and explicitly constructing the data generation process, thus better reflecting the relationships between samples. The framework consists of a spectral-spatial diffusion module, and an attention-based classification module. The spectral-spatial diffusion module adopts forward and reverse spectral-spatial diffusion processes to achieve adaptive construction of sample relationships without requiring prior knowledge of graphical structure or neighborhood information. It captures spectral-spatial distribution and contextual information of objects in HSI and mines unsupervised spectral-spatial diffusion features within the reverse diffusion process. Finally, these features are fed into the attention-based classification module for per-pixel classification. The diffusion features can facilitate cross-sample perception via reconstruction distribution, leading to improved classification performance. Experiments on three public HSI datasets demonstrate that the proposed method can achieve better performance than state-of-the-art methods. For the sake of reproducibility, the source code of SpectralDiff will be publicly available at https://github.com/chenning0115/SpectralDiff

    Mathematics and Digital Signal Processing

    Get PDF
    Modern computer technology has opened up new opportunities for the development of digital signal processing methods. The applications of digital signal processing have expanded significantly and today include audio and speech processing, sonar, radar, and other sensor array processing, spectral density estimation, statistical signal processing, digital image processing, signal processing for telecommunications, control systems, biomedical engineering, and seismology, among others. This Special Issue is aimed at wide coverage of the problems of digital signal processing, from mathematical modeling to the implementation of problem-oriented systems. The basis of digital signal processing is digital filtering. Wavelet analysis implements multiscale signal processing and is used to solve applied problems of de-noising and compression. Processing of visual information, including image and video processing and pattern recognition, is actively used in robotic systems and industrial processes control today. Improving digital signal processing circuits and developing new signal processing systems can improve the technical characteristics of many digital devices. The development of new methods of artificial intelligence, including artificial neural networks and brain-computer interfaces, opens up new prospects for the creation of smart technology. This Special Issue contains the latest technological developments in mathematics and digital signal processing. The stated results are of interest to researchers in the field of applied mathematics and developers of modern digital signal processing systems

    Audio source separation into the wild

    Get PDF
    International audienceThis review chapter is dedicated to multichannel audio source separation in real-life environment. We explore some of the major achievements in the field and discuss some of the remaining challenges. We will explore several important practical scenarios, e.g. moving sources and/or microphones, varying number of sources and sensors, high reverberation levels, spatially diffuse sources, and synchronization problems. Several applications such as smart assistants, cellular phones, hearing aids and robots, will be discussed. Our perspectives on the future of the field will be given as concluding remarks of this chapter

    Review : Deep learning in electron microscopy

    Get PDF
    Deep learning is transforming most areas of science and technology, including electron microscopy. This review paper offers a practical perspective aimed at developers with limited familiarity. For context, we review popular applications of deep learning in electron microscopy. Following, we discuss hardware and software needed to get started with deep learning and interface with electron microscopes. We then review neural network components, popular architectures, and their optimization. Finally, we discuss future directions of deep learning in electron microscopy

    Speech Recognition

    Get PDF
    Chapters in the first part of the book cover all the essential speech processing techniques for building robust, automatic speech recognition systems: the representation for speech signals and the methods for speech-features extraction, acoustic and language modeling, efficient algorithms for searching the hypothesis space, and multimodal approaches to speech recognition. The last part of the book is devoted to other speech processing applications that can use the information from automatic speech recognition for speaker identification and tracking, for prosody modeling in emotion-detection systems and in other speech processing applications that are able to operate in real-world environments, like mobile communication services and smart homes
    corecore