942 research outputs found

    Time-Domain Isolated Phoneme Classification Using Reconstructed Phase Spaces

    Get PDF
    This paper introduces a novel time-domain approach to modeling and classifying speech phoneme waveforms. The approach is based on statistical models of reconstructed phase spaces, which offer significant theoretical benefits as representations that are known to be topologically equivalent to the state dynamics of the underlying production system. The lag and dimension parameters of the reconstruction process for speech are examined in detail, comparing common estimation heuristics for these parameters with corresponding maximum likelihood recognition accuracy over the TIMIT data set. Overall accuracies are compared with a Mel-frequency cepstral baseline system across five different phonetic classes within TIMIT, and a composite classifier using both cepstral and phase space features is developed. Results indicate that although the accuracy of the phase space approach by itself is still currently below that of baseline cepstral methods, a combined approach is capable of increasing speaker independent phoneme accuracy

    Statistical Models of Reconstructed Phase Spaces for Signal Classification

    Get PDF
    This paper introduces a novel approach to the analysis and classification of time series signals using statistical models of reconstructed phase spaces. With sufficient dimension, such reconstructed phase spaces are, with probability one, guaranteed to be topologically equivalent to the state dynamics of the generating system, and, therefore, may contain information that is absent in analysis and classification methods rooted in linear assumptions. Parametric and nonparametric distributions are introduced as statistical representations over the multidimensional reconstructed phase space, with classification accomplished through methods such as Bayes maximum likelihood and artificial neural networks (ANNs). The technique is demonstrated on heart arrhythmia classification and speech recognition. This new approach is shown to be a viable and effective alternative to traditional signal classification approaches, particularly for signals with strong nonlinear characteristics

    Sub-Banded Reconstructed Phase Spaces for Speech Recognition

    Get PDF
    A novel method combining filter banks and reconstructed phase spaces is proposed for the modeling and classification of speech. Reconstructed phase spaces, which are based on dynamical systems theory, have advantages over spectral-based analysis methods in that they can capture nonlinear or higher-order statistics. Recent work has shown that the natural measure of a reconstructed phase space can be used for modeling and classification of phonemes. In this work, sub-banding of speech, which has been examined for recognition of noise-corrupted speech, is studied in combination with phase space reconstruction. This sub-banding, which is motivated by empirical psychoacoustical studies, is shown to dramatically improve the phoneme classification accuracy of reconstructed phase space-based approaches. Experiments that examine the performance of fused sub-banded reconstructed phase spaces for phoneme classification are presented. Comparisons against a cepstral-based classifier show that the proposed approach is competitive with state-of-the-art methods for modeling and classification of phonemes. Combination of cepstral-based features and the sub-band RPS features shows improvement over a cepstral-only baseline

    Nonlinear Dynamic Invariants for Continuous Speech Recognition

    Get PDF
    In this work, nonlinear acoustic information is combined with traditional linear acoustic information in order to produce a noise-robust set of features for speech recognition. Classical acoustic modeling techniques for speech recognition have relied on a standard assumption of linear acoustics where signal processing is primarily performed in the signal\u27s frequency domain. While these conventional techniques have demonstrated good performance under controlled conditions, the performance of these systems suffers significant degradations when the acoustic data is contaminated with previously unseen noise. The objective of this thesis was to determine whether nonlinear dynamic invariants are able to boost speech recognition performance when combined with traditional acoustic features. Several sets of experiments are used to evaluate both clean and noisy speech data. The invariants resulted in a maximum relative increase of 11.1% for the clean evaluation set. However, an average relative decrease of 7.6% was observed for the noise-contaminated evaluation sets. The fact that recognition performance decreased with the use of dynamic invariants suggests that additional research is required for robust filtering of phase spaces constructed from noisy time series

    A Subband-Based SVM Front-End for Robust ASR

    Full text link
    This work proposes a novel support vector machine (SVM) based robust automatic speech recognition (ASR) front-end that operates on an ensemble of the subband components of high-dimensional acoustic waveforms. The key issues of selecting the appropriate SVM kernels for classification in frequency subbands and the combination of individual subband classifiers using ensemble methods are addressed. The proposed front-end is compared with state-of-the-art ASR front-ends in terms of robustness to additive noise and linear filtering. Experiments performed on the TIMIT phoneme classification task demonstrate the benefits of the proposed subband based SVM front-end: it outperforms the standard cepstral front-end in the presence of noise and linear filtering for signal-to-noise ratio (SNR) below 12-dB. A combination of the proposed front-end with a conventional front-end such as MFCC yields further improvements over the individual front ends across the full range of noise levels

    Statistical models of reconstructed phase spaces for signal classification

    Full text link

    Rotor Bar Fault Monitoring Method Based on Analysis of Air-Gap Torques of Induction Motors

    Get PDF
    A robust method to monitor the operating conditions of induction motors is presented. This method utilizes the data analysis of the air-gap torque profile in conjunction with a Bayesian classifier to determine the operating condition of an induction motor as either healthy or faulty. This method is trained offline with datasets generated either from an induction motor modeled by a time-stepping finite-element (TSFE) method or experimental data. This method can effectively monitor the operating conditions of induction motors that are different in frame/class, ratings, or design from the motor used in the training stage. Such differences can include the level of load torque and operating frequency. This is due to a novel air-gap torque normalization method introduced here, which leads to a motor fault classification process independent of these parameters and with no need for prior information about the motor being monitored. The experimental results given in this paper validate the robustness and efficacy of this method. Additionally, this method relies exclusively on data analysis of motor terminal operating voltages and currents, without relying on complex motor modeling or internal performance parameters not readily available

    Wavelet-based techniques for speech recognition

    Get PDF
    In this thesis, new wavelet-based techniques have been developed for the extraction of features from speech signals for the purpose of automatic speech recognition (ASR). One of the advantages of the wavelet transform over the short time Fourier transform (STFT) is its capability to process non-stationary signals. Since speech signals are not strictly stationary the wavelet transform is a better choice for time-frequency transformation of these signals. In addition it has compactly supported basis functions, thereby reducing the amount of computation as opposed to STFT where an overlapping window is needed. [Continues.