1,699 research outputs found

    Robust ASR using Support Vector Machines

    Get PDF
    The improved theoretical properties of Support Vector Machines with respect to other machine learning alternatives due to their max-margin training paradigm have led us to suggest them as a good technique for robust speech recognition. However, important shortcomings have had to be circumvented, the most important being the normalisation of the time duration of different realisations of the acoustic speech units. In this paper, we have compared two approaches in noisy environments: first, a hybrid HMM–SVM solution where a fixed number of frames is selected by means of an HMM segmentation and second, a normalisation kernel called Dynamic Time Alignment Kernel (DTAK) first introduced in Shimodaira et al. [Shimodaira, H., Noma, K., Nakai, M., Sagayama, S., 2001. Support vector machine with dynamic time-alignment kernel for speech recognition. In: Proc. Eurospeech, Aalborg, Denmark, pp. 1841–1844] and based on DTW (Dynamic Time Warping). Special attention has been paid to the adaptation of both alternatives to noisy environments, comparing two types of parameterisations and performing suitable feature normalisation operations. The results show that the DTA Kernel provides important advantages over the baseline HMM system in medium to bad noise conditions, also outperforming the results of the hybrid system.Publicad

    Online Bearing Remaining Useful Life Prediction Based on a Novel Degradation Indicator and Convolutional Neural Networks

    Full text link
    In industrial applications, nearly half the failures of motors are caused by the degradation of rolling element bearings (REBs). Therefore, accurately estimating the remaining useful life (RUL) for REBs are of crucial importance to ensure the reliability and safety of mechanical systems. To tackle this challenge, model-based approaches are often limited by the complexity of mathematical modeling. Conventional data-driven approaches, on the other hand, require massive efforts to extract the degradation features and construct health index. In this paper, a novel online data-driven framework is proposed to exploit the adoption of deep convolutional neural networks (CNN) in predicting the RUL of bearings. More concretely, the raw vibrations of training bearings are first processed using the Hilbert-Huang transform (HHT) and a novel nonlinear degradation indicator is constructed as the label for learning. The CNN is then employed to identify the hidden pattern between the extracted degradation indicator and the vibration of training bearings, which makes it possible to estimate the degradation of the test bearings automatically. Finally, testing bearings' RULs are predicted by using a ϵ\epsilon-support vector regression model. The superior performance of the proposed RUL estimation framework, compared with the state-of-the-art approaches, is demonstrated through the experimental results. The generality of the proposed CNN model is also validated by transferring to bearings undergoing different operating conditions

    EMG Signal Noise Removal Using Neural Netwoks

    Get PDF

    An application of an auditory periphery model in speaker identification

    Get PDF
    The number of applications of automatic Speaker Identification (SID) is growing due to the advanced technologies for secure access and authentication in services and devices. In 2016, in a study, the Cascade of Asymmetric Resonators with Fast Acting Compression (CAR FAC) cochlear model achieved the best performance among seven recent cochlear models to fit a set of human auditory physiological data. Motivated by the performance of the CAR-FAC, I apply this cochlear model in an SID task for the first time to produce a similar performance to a human auditory system. This thesis investigates the potential of the CAR-FAC model in an SID task. I investigate the capability of the CAR-FAC in text-dependent and text-independent SID tasks. This thesis also investigates contributions of different parameters, nonlinearities, and stages of the CAR-FAC that enhance SID accuracy. The performance of the CAR-FAC is compared with another recent cochlear model called the Auditory Nerve (AN) model. In addition, three FFT-based auditory features – Mel frequency Cepstral Coefficient (MFCC), Frequency Domain Linear Prediction (FDLP), and Gammatone Frequency Cepstral Coefficient (GFCC), are also included to compare their performance with cochlear features. This comparison allows me to investigate a better front-end for a noise-robust SID system. Three different statistical classifiers: a Gaussian Mixture Model with Universal Background Model (GMM-UBM), a Support Vector Machine (SVM), and an I-vector were used to evaluate the performance. These statistical classifiers allow me to investigate nonlinearities in the cochlear front-ends. The performance is evaluated under clean and noisy conditions for a wide range of noise levels. Techniques to improve the performance of a cochlear algorithm are also investigated in this thesis. It was found that the application of a cube root and DCT on cochlear output enhances the SID accuracy substantially

    On generalized adaptive neural filter

    Get PDF
    Linear filters have historically been used in the past as the most useful tools for suppressing noise in signal processing. It has been shown that the optimal filter which minimizes the mean square error (MSE) between the filter output and the desired output is a linear filter provided that the noise is additive white Gaussian noise (AWGN). However, in most signal processing applications, the noise in the channel through which a signal is transmitted is not AWGN; it is not stationary, and it may have unknown characteristics. To overcome the shortcomings of linear filters, nonlinear filters ranging from the median filters to stack filters have been developed. They have been successfully used in a number of applications, such as enhancing the signal-to-noise ratio of the telecommunication receivers, modeling the human vocal tract to synthesize speech in speech processing, and separating out the maternal and fetal electrocardiogram signals to diagnose prenatal ailments. In particular, stack filters have been shown to provide robust noise suppression, and are easily implementable in hardware, but configuring an optimal stack filter remains a challenge. This dissertation takes on this challenge by extending stack filters to a new class of nonlinear adaptive filters called generalized adaptive neural filters (GANFs). The objective of this work is to investigate their performance in terms of the mean absolute error criterion, to evaluate and predict the generalization of various discriminant functions employed for GANFs, and to address issues regarding their applications and implementation. It is shown that GANFs not only extend the class of stack filters, but also have better performance in terms of suppressing non-additive white Gaussian noise. Several results are drawn from the theoretical and experimental work: stack filters can be adaptively configured by neural networks; GANFs encompass a large class of nonlinear sliding-window filters which include stack filters; the mean absolute error (MAE) of the optimal GANF is upper-bounded by that of the optimal stack filter; a suitable class of discriminant functions can be determined before a training scheme is executed; VC dimension (VCdim) theory can be applied to determine the number of training samples; the algorithm presented in configuring GANFs is effective and robust

    State-Space Inference and Learning with Gaussian Processes

    No full text
    State-space inference and learning with Gaussian processes (GPs) is an unsolved problem. We propose a new, general methodology for inference and learning in nonlinear state-space models that are described probabilistically by non-parametric GP models. We apply the expectation maximization algorithm to iterate between inference in the latent state-space and learning the parameters of the underlying GP dynamics model. Copyright 2010 by the authors

    Defocused Image Restoration with Local Polynomial Regression and IWF

    Get PDF

    Adaptive filtering of evoked potentials with radial-basis-function neural network prefilter

    Get PDF
    Evoked potentials (EPs) are time-varying signals typically buried in relatively large background noise. To extract the EP more effectively from noise, we had previously developed an approach using an adaptive signal enhancer (ASE) (Chen et al., 1995). ASE requires a proper reference input signal for its optimal performance. Ensemble- and moving window-averages were formerly used with good results. In this paper, we present a new method to provide even more effective reference inputs for the ASE. Specifically, a Gaussian radial basis function neural network (RBFNN) was used to preprocess raw EP signals before serving as the reference input. Since the RBFNN has built-in nonlinear activation functions that enable it to closely fit any function mapping, the output of RBFNN can effectively track the signal variations of EP. Results confirmed the superior performance of ASE with RBFNN over the previous method.published_or_final_versio

    Enhancement of a Text-Independent Speaker Verification System by using Feature Combination and Parallel-Structure Classifiers

    Full text link
    Speaker Verification (SV) systems involve mainly two individual stages: feature extraction and classification. In this paper, we explore these two modules with the aim of improving the performance of a speaker verification system under noisy conditions. On the one hand, the choice of the most appropriate acoustic features is a crucial factor for performing robust speaker verification. The acoustic parameters used in the proposed system are: Mel Frequency Cepstral Coefficients (MFCC), their first and second derivatives (Deltas and Delta- Deltas), Bark Frequency Cepstral Coefficients (BFCC), Perceptual Linear Predictive (PLP), and Relative Spectral Transform - Perceptual Linear Predictive (RASTA-PLP). In this paper, a complete comparison of different combinations of the previous features is discussed. On the other hand, the major weakness of a conventional Support Vector Machine (SVM) classifier is the use of generic traditional kernel functions to compute the distances among data points. However, the kernel function of an SVM has great influence on its performance. In this work, we propose the combination of two SVM-based classifiers with different kernel functions: Linear kernel and Gaussian Radial Basis Function (RBF) kernel with a Logistic Regression (LR) classifier. The combination is carried out by means of a parallel structure approach, in which different voting rules to take the final decision are considered. Results show that significant improvement in the performance of the SV system is achieved by using the combined features with the combined classifiers either with clean speech or in the presence of noise. Finally, to enhance the system more in noisy environments, the inclusion of the multiband noise removal technique as a preprocessing stage is proposed
    • …
    corecore