Search CORE

3,936 research outputs found

Detection of dirt impairments from archived film sequences : survey and evaluations

Author: Ren Jinchang
Vlachos T.
Publication venue: 'SPIE-Intl Soc Optical Eng'
Publication date: 01/07/2010
Field of study

Film dirt is the most commonly encountered artifact in archive restoration applications. Since dirt usually appears as a temporally impulsive event, motion-compensated interframe processing is widely applied for its detection. However, motion-compensated prediction requires a high degree of complexity and can be unreliable when motion estimation fails. Consequently, many techniques using spatial or spatiotemporal filtering without motion were also been proposed as alternatives. A comprehensive survey and evaluation of existing methods is presented, in which both qualitative and quantitative performances are compared in terms of accuracy, robustness, and complexity. After analyzing these algorithms and identifying their limitations, we conclude with guidance in choosing from these algorithms and promising directions for future research

University of Strathclyde Institutional Repository

On adaptive decision rules and decision parameter adaptation for automatic speech recognition

Author: Huo Q
Lee CH
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2000
Field of study

Recent advances in automatic speech recognition are accomplished by designing a plug-in maximum a posteriori decision rule such that the forms of the acoustic and language model distributions are specified and the parameters of the assumed distributions are estimated from a collection of speech and language training corpora. Maximum-likelihood point estimation is by far the most prevailing training method. However, due to the problems of unknown speech distributions, sparse training data, high spectral and temporal variabilities in speech, and possible mismatch between training and testing conditions, a dynamic training strategy is needed. To cope with the changing speakers and speaking conditions in real operational conditions for high-performance speech recognition, such paradigms incorporate a small amount of speaker and environment specific adaptation data into the training process. Bayesian adaptive learning is an optimal way to combine prior knowledge in an existing collection of general models with a new set of condition-specific adaptation data. In this paper, the mathematical framework for Bayesian adaptation of acoustic and language model parameters is first described. Maximum a posteriori point estimation is then developed for hidden Markov models and a number of useful parameters densities commonly used in automatic speech recognition and natural language processing.published_or_final_versio

CiteSeerX

HKU Scholars Hub

Spectral Reconstruction and Noise Model Estimation Based on a Masking Model for Noise Robust Speech Recognition

Author: Barker J.
Gonzalez J.A.
Gómez A.M.
Ma N.
Peinado A.M.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 06/01/2017
Field of study

An effective way to increase noise robustness in automatic speech recognition (ASR) systems is feature enhancement based on an analytical distortion model that describes the effects of noise on the speech features. One of such distortion models that has been reported to achieve a good trade-off between accuracy and simplicity is the masking model. Under this model, speech distortion caused by environmental noise is seen as a spectral mask and, as a result, noisy speech features can be either reliable (speech is not masked by noise) or unreliable (speech is masked). In this paper, we present a detailed overview of this model and its applications to noise robust ASR. Firstly, using the masking model, we derive a spectral reconstruction technique aimed at enhancing the noisy speech features. Two problems must be solved in order to perform spectral reconstruction using the masking model: (1) mask estimation, i.e. determining the reliability of the noisy features, and (2) feature imputation, i.e. estimating speech for the unreliable features. Unlike missing data imputation techniques where the two problems are considered as independent, our technique jointly addresses them by exploiting a priori knowledge of the speech and noise sources in the form of a statistical model. Secondly, we propose an algorithm for estimating the noise model required by the feature enhancement technique. The proposed algorithm fits a Gaussian mixture model to the noise by iteratively maximising the likelihood of the noisy speech signal so that noise can be estimated even during speech-dominating frames. A comprehensive set of experiments carried out on the Aurora-2 and Aurora-4 databases shows that the proposed method achieves significant improvements over the baseline system and other similar missing data imputation techniques

White Rose Research Online

Studies on noise robust automatic speech recognition

Author: Kurimo Mikko
Palomäki Kalle J.
Remes Ulpu
Publication venue: Teknillinen korkeakoulu
Publication date: 01/01/2009
Field of study

Noise in everyday acoustic environments such as cars, traffic environments, and cafeterias remains one of the main challenges in automatic speech recognition (ASR). As a research theme, it has received wide attention in conferences and scientific journals focused on speech technology. This article collection reviews both the classic and novel approaches suggested for noise robust ASR. The articles are literature reviews written for the spring 2009 seminar course on noise robust automatic speech recognition (course code T-61.6060) held at TKK

Aaltodoc Publication Archive

Block-Online Multi-Channel Speech Enhancement Using DNN-Supported Relative Transfer Function Estimates

Author: Bohac Marek
Koldovsky Zbynek
Malek Jiri
Publication venue: 'Institution of Engineering and Technology (IET)'
Publication date: 11/12/2019
Field of study

This work addresses the problem of block-online processing for multi-channel speech enhancement. Such processing is vital in scenarios with moving speakers and/or when very short utterances are processed, e.g., in voice assistant scenarios. We consider several variants of a system that performs beamforming supported by DNN-based voice activity detection (VAD) followed by post-filtering. The speaker is targeted through estimating relative transfer functions between microphones. Each block of the input signals is processed independently in order to make the method applicable in highly dynamic environments. Owing to the short length of the processed block, the statistics required by the beamformer are estimated less precisely. The influence of this inaccuracy is studied and compared to the processing regime when recordings are treated as one block (batch processing). The experimental evaluation of the proposed method is performed on large datasets of CHiME-4 and on another dataset featuring moving target speaker. The experiments are evaluated in terms of objective and perceptual criteria (such as signal-to-interference ratio (SIR) or perceptual evaluation of speech quality (PESQ), respectively). Moreover, word error rate (WER) achieved by a baseline automatic speech recognition system is evaluated, for which the enhancement method serves as a front-end solution. The results indicate that the proposed method is robust with respect to short length of the processed block. Significant improvements in terms of the criteria and WER are observed even for the block length of 250 ms.Comment: 10 pages, 8 figures, 4 tables. Modified version of the article accepted for publication in IET Signal Processing journal. Original results unchanged, additional experiments presented, refined discussion and conclusion

arXiv.org e-Print Archive

DSpace@TUL

Robust speech recognition under noisy environments.

Author
Publication venue
Publication date: 01/01/2004
Field of study

Lee Siu Wa.Thesis (M.Phil.)--Chinese University of Hong Kong, 2004.Includes bibliographical references (leaves 116-121).Abstracts in English and Chinese.Abstract --- p.vChapter 1 --- Introduction --- p.1Chapter 1.1 --- An Overview on Automatic Speech Recognition --- p.2Chapter 1.2 --- Thesis Outline --- p.6Chapter 2 --- Baseline Speech Recognition System --- p.8Chapter 2.1 --- Baseline Speech Recognition Framework --- p.8Chapter 2.2 --- Acoustic Feature Extraction --- p.11Chapter 2.2.1 --- Speech Production and Source-Filter Model --- p.12Chapter 2.2.2 --- Review of Feature Representations --- p.14Chapter 2.2.3 --- Mel-frequency Cepstral Coefficients --- p.20Chapter 2.2.4 --- Energy and Dynamic Features --- p.24Chapter 2.3 --- Back-end Decoder --- p.26Chapter 2.4 --- English Digit String Corpus ´ؤ AURORA2 --- p.28Chapter 2.5 --- Baseline Recognition Experiment --- p.31Chapter 3 --- A Simple Recognition Framework with Model Selection --- p.34Chapter 3.1 --- Mismatch between Training and Testing Conditions --- p.34Chapter 3.2 --- Matched Training and Testing Conditions --- p.38Chapter 3.2.1 --- Noise type-Matching --- p.38Chapter 3.2.2 --- SNR-Matching --- p.43Chapter 3.2.3 --- Noise Type and SNR-Matching --- p.44Chapter 3.3 --- Recognition Framework with Model Selection --- p.48Chapter 4 --- Noise Spectral Estimation --- p.53Chapter 4.1 --- Introduction to Statistical Estimation Methods --- p.53Chapter 4.1.1 --- Conventional Estimation Methods --- p.54Chapter 4.1.2 --- Histogram Technique --- p.55Chapter 4.2 --- Quantile-based Noise Estimation (QBNE) --- p.57Chapter 4.2.1 --- Overview of Quantile-based Noise Estimation (QBNE) --- p.58Chapter 4.2.2 --- Time-Frequency Quantile-based Noise Estimation (T-F QBNE) --- p.62Chapter 4.2.3 --- Mainlobe-Resilient Time-Frequency Quantile-based Noise Estimation (M-R T-F QBNE) --- p.65Chapter 4.3 --- Estimation Performance Analysis --- p.72Chapter 4.4 --- Recognition Experiment with Model Selection --- p.74Chapter 5 --- Feature Compensation: Algorithm and Experiment --- p.81Chapter 5.1 --- Feature Deviation from Clean Speech --- p.81Chapter 5.1.1 --- Deviation in MFCC Features --- p.82Chapter 5.1.2 --- Implications for Feature Compensation --- p.84Chapter 5.2 --- Overview of Conventional Compensation Methods --- p.86Chapter 5.3 --- Feature Compensation by In-phase Feature Induction --- p.94Chapter 5.3.1 --- Motivation --- p.94Chapter 5.3.2 --- Methodology --- p.97Chapter 5.4 --- Compensation Framework for Magnitude Spectrum and Segmen- tal Energy --- p.102Chapter 5.5 --- Recognition -Experiments --- p.103Chapter 6 --- Conclusions --- p.112Chapter 6.1 --- Summary and Discussions --- p.112Chapter 6.2 --- Future Directions --- p.114Bibliography --- p.11

CUHK Digital Repository

A Comprehensive Review of Deep Learning-based Single Image Super-resolution

Author: Bashir Syed Muhammad Arsalan
Khan Mahrukh
Niu Yilong
Wang Yi
Publication venue: 'PeerJ'
Publication date: 01/07/2021
Field of study

Image super-resolution (SR) is one of the vital image processing methods that improve the resolution of an image in the field of computer vision. In the last two decades, significant progress has been made in the field of super-resolution, especially by utilizing deep learning methods. This survey is an effort to provide a detailed survey of recent progress in single-image super-resolution in the perspective of deep learning while also informing about the initial classical methods used for image super-resolution. The survey classifies the image SR methods into four categories, i.e., classical methods, supervised learning-based methods, unsupervised learning-based methods, and domain-specific SR methods. We also introduce the problem of SR to provide intuition about image quality metrics, available reference datasets, and SR challenges. Deep learning-based approaches of SR are evaluated using a reference dataset. Some of the reviewed state-of-the-art image SR methods include the enhanced deep SR network (EDSR), cycle-in-cycle GAN (CinCGAN), multiscale residual network (MSRN), meta residual dense network (Meta-RDN), recurrent back-projection network (RBPN), second-order attention network (SAN), SR feedback network (SRFBN) and the wavelet-based residual attention network (WRAN). Finally, this survey is concluded with future directions and trends in SR and open problems in SR to be addressed by the researchers.Comment: 56 Pages, 11 Figures, 5 Table

arXiv.org e-Print Archive

Directory of Open Access Journals

Adaptive Multimodal Fusion by Uncertainty Compensation With Application to Audiovisual Speech Recognition

Author: Athanassios Katsamanis
George Papandreou
Petros Maragos
Vassilis Pitsikalis
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref