3,936 research outputs found

    Detection of dirt impairments from archived film sequences : survey and evaluations

    Get PDF
    Film dirt is the most commonly encountered artifact in archive restoration applications. Since dirt usually appears as a temporally impulsive event, motion-compensated interframe processing is widely applied for its detection. However, motion-compensated prediction requires a high degree of complexity and can be unreliable when motion estimation fails. Consequently, many techniques using spatial or spatiotemporal filtering without motion were also been proposed as alternatives. A comprehensive survey and evaluation of existing methods is presented, in which both qualitative and quantitative performances are compared in terms of accuracy, robustness, and complexity. After analyzing these algorithms and identifying their limitations, we conclude with guidance in choosing from these algorithms and promising directions for future research

    On adaptive decision rules and decision parameter adaptation for automatic speech recognition

    Get PDF
    Recent advances in automatic speech recognition are accomplished by designing a plug-in maximum a posteriori decision rule such that the forms of the acoustic and language model distributions are specified and the parameters of the assumed distributions are estimated from a collection of speech and language training corpora. Maximum-likelihood point estimation is by far the most prevailing training method. However, due to the problems of unknown speech distributions, sparse training data, high spectral and temporal variabilities in speech, and possible mismatch between training and testing conditions, a dynamic training strategy is needed. To cope with the changing speakers and speaking conditions in real operational conditions for high-performance speech recognition, such paradigms incorporate a small amount of speaker and environment specific adaptation data into the training process. Bayesian adaptive learning is an optimal way to combine prior knowledge in an existing collection of general models with a new set of condition-specific adaptation data. In this paper, the mathematical framework for Bayesian adaptation of acoustic and language model parameters is first described. Maximum a posteriori point estimation is then developed for hidden Markov models and a number of useful parameters densities commonly used in automatic speech recognition and natural language processing.published_or_final_versio

    Spectral Reconstruction and Noise Model Estimation Based on a Masking Model for Noise Robust Speech Recognition

    Get PDF
    An effective way to increase noise robustness in automatic speech recognition (ASR) systems is feature enhancement based on an analytical distortion model that describes the effects of noise on the speech features. One of such distortion models that has been reported to achieve a good trade-off between accuracy and simplicity is the masking model. Under this model, speech distortion caused by environmental noise is seen as a spectral mask and, as a result, noisy speech features can be either reliable (speech is not masked by noise) or unreliable (speech is masked). In this paper, we present a detailed overview of this model and its applications to noise robust ASR. Firstly, using the masking model, we derive a spectral reconstruction technique aimed at enhancing the noisy speech features. Two problems must be solved in order to perform spectral reconstruction using the masking model: (1) mask estimation, i.e. determining the reliability of the noisy features, and (2) feature imputation, i.e. estimating speech for the unreliable features. Unlike missing data imputation techniques where the two problems are considered as independent, our technique jointly addresses them by exploiting a priori knowledge of the speech and noise sources in the form of a statistical model. Secondly, we propose an algorithm for estimating the noise model required by the feature enhancement technique. The proposed algorithm fits a Gaussian mixture model to the noise by iteratively maximising the likelihood of the noisy speech signal so that noise can be estimated even during speech-dominating frames. A comprehensive set of experiments carried out on the Aurora-2 and Aurora-4 databases shows that the proposed method achieves significant improvements over the baseline system and other similar missing data imputation techniques

    Studies on noise robust automatic speech recognition

    Get PDF
    Noise in everyday acoustic environments such as cars, traffic environments, and cafeterias remains one of the main challenges in automatic speech recognition (ASR). As a research theme, it has received wide attention in conferences and scientific journals focused on speech technology. This article collection reviews both the classic and novel approaches suggested for noise robust ASR. The articles are literature reviews written for the spring 2009 seminar course on noise robust automatic speech recognition (course code T-61.6060) held at TKK

    Block-Online Multi-Channel Speech Enhancement Using DNN-Supported Relative Transfer Function Estimates

    Get PDF
    This work addresses the problem of block-online processing for multi-channel speech enhancement. Such processing is vital in scenarios with moving speakers and/or when very short utterances are processed, e.g., in voice assistant scenarios. We consider several variants of a system that performs beamforming supported by DNN-based voice activity detection (VAD) followed by post-filtering. The speaker is targeted through estimating relative transfer functions between microphones. Each block of the input signals is processed independently in order to make the method applicable in highly dynamic environments. Owing to the short length of the processed block, the statistics required by the beamformer are estimated less precisely. The influence of this inaccuracy is studied and compared to the processing regime when recordings are treated as one block (batch processing). The experimental evaluation of the proposed method is performed on large datasets of CHiME-4 and on another dataset featuring moving target speaker. The experiments are evaluated in terms of objective and perceptual criteria (such as signal-to-interference ratio (SIR) or perceptual evaluation of speech quality (PESQ), respectively). Moreover, word error rate (WER) achieved by a baseline automatic speech recognition system is evaluated, for which the enhancement method serves as a front-end solution. The results indicate that the proposed method is robust with respect to short length of the processed block. Significant improvements in terms of the criteria and WER are observed even for the block length of 250 ms.Comment: 10 pages, 8 figures, 4 tables. Modified version of the article accepted for publication in IET Signal Processing journal. Original results unchanged, additional experiments presented, refined discussion and conclusion

    Robust speech recognition under noisy environments.

    Get PDF
    Lee Siu Wa.Thesis (M.Phil.)--Chinese University of Hong Kong, 2004.Includes bibliographical references (leaves 116-121).Abstracts in English and Chinese.Abstract --- p.vChapter 1 --- Introduction --- p.1Chapter 1.1 --- An Overview on Automatic Speech Recognition --- p.2Chapter 1.2 --- Thesis Outline --- p.6Chapter 2 --- Baseline Speech Recognition System --- p.8Chapter 2.1 --- Baseline Speech Recognition Framework --- p.8Chapter 2.2 --- Acoustic Feature Extraction --- p.11Chapter 2.2.1 --- Speech Production and Source-Filter Model --- p.12Chapter 2.2.2 --- Review of Feature Representations --- p.14Chapter 2.2.3 --- Mel-frequency Cepstral Coefficients --- p.20Chapter 2.2.4 --- Energy and Dynamic Features --- p.24Chapter 2.3 --- Back-end Decoder --- p.26Chapter 2.4 --- English Digit String Corpus ´ؤ AURORA2 --- p.28Chapter 2.5 --- Baseline Recognition Experiment --- p.31Chapter 3 --- A Simple Recognition Framework with Model Selection --- p.34Chapter 3.1 --- Mismatch between Training and Testing Conditions --- p.34Chapter 3.2 --- Matched Training and Testing Conditions --- p.38Chapter 3.2.1 --- Noise type-Matching --- p.38Chapter 3.2.2 --- SNR-Matching --- p.43Chapter 3.2.3 --- Noise Type and SNR-Matching --- p.44Chapter 3.3 --- Recognition Framework with Model Selection --- p.48Chapter 4 --- Noise Spectral Estimation --- p.53Chapter 4.1 --- Introduction to Statistical Estimation Methods --- p.53Chapter 4.1.1 --- Conventional Estimation Methods --- p.54Chapter 4.1.2 --- Histogram Technique --- p.55Chapter 4.2 --- Quantile-based Noise Estimation (QBNE) --- p.57Chapter 4.2.1 --- Overview of Quantile-based Noise Estimation (QBNE) --- p.58Chapter 4.2.2 --- Time-Frequency Quantile-based Noise Estimation (T-F QBNE) --- p.62Chapter 4.2.3 --- Mainlobe-Resilient Time-Frequency Quantile-based Noise Estimation (M-R T-F QBNE) --- p.65Chapter 4.3 --- Estimation Performance Analysis --- p.72Chapter 4.4 --- Recognition Experiment with Model Selection --- p.74Chapter 5 --- Feature Compensation: Algorithm and Experiment --- p.81Chapter 5.1 --- Feature Deviation from Clean Speech --- p.81Chapter 5.1.1 --- Deviation in MFCC Features --- p.82Chapter 5.1.2 --- Implications for Feature Compensation --- p.84Chapter 5.2 --- Overview of Conventional Compensation Methods --- p.86Chapter 5.3 --- Feature Compensation by In-phase Feature Induction --- p.94Chapter 5.3.1 --- Motivation --- p.94Chapter 5.3.2 --- Methodology --- p.97Chapter 5.4 --- Compensation Framework for Magnitude Spectrum and Segmen- tal Energy --- p.102Chapter 5.5 --- Recognition -Experiments --- p.103Chapter 6 --- Conclusions --- p.112Chapter 6.1 --- Summary and Discussions --- p.112Chapter 6.2 --- Future Directions --- p.114Bibliography --- p.11

    A Comprehensive Review of Deep Learning-based Single Image Super-resolution

    Get PDF
    Image super-resolution (SR) is one of the vital image processing methods that improve the resolution of an image in the field of computer vision. In the last two decades, significant progress has been made in the field of super-resolution, especially by utilizing deep learning methods. This survey is an effort to provide a detailed survey of recent progress in single-image super-resolution in the perspective of deep learning while also informing about the initial classical methods used for image super-resolution. The survey classifies the image SR methods into four categories, i.e., classical methods, supervised learning-based methods, unsupervised learning-based methods, and domain-specific SR methods. We also introduce the problem of SR to provide intuition about image quality metrics, available reference datasets, and SR challenges. Deep learning-based approaches of SR are evaluated using a reference dataset. Some of the reviewed state-of-the-art image SR methods include the enhanced deep SR network (EDSR), cycle-in-cycle GAN (CinCGAN), multiscale residual network (MSRN), meta residual dense network (Meta-RDN), recurrent back-projection network (RBPN), second-order attention network (SAN), SR feedback network (SRFBN) and the wavelet-based residual attention network (WRAN). Finally, this survey is concluded with future directions and trends in SR and open problems in SR to be addressed by the researchers.Comment: 56 Pages, 11 Figures, 5 Table
    • …
    corecore