23 research outputs found

    AR and ARMA system identification techniques under heavy noisy conditions and their applications to speech analysis

    Get PDF
    System identification under noisy environment has axiomatic importance in numerous fields, such as communication, control, and signal processing. The system identification is to estimate and validate the parameters of the system from its output observations, a task that becomes very difficult when the system output is heavily noise-corrupted. The major objective of this research is to develop novel system identification techniques for an accurate estimation of the parameters of minimum phase autoregressive (AR) and autoregressive moving average (ARMA) systems in practical situations where the system input is not accessible and only noise-corrupted observations are available. Unlike conventional system identification methods in which only the white noise excitation is considered, both the white noise and periodic impulse-train excitations are taken into account in the methodologies developed with an aim of directly using them in speech analysis. A new ARMA correlation model is developed, based on which a two-stage correlation-domain ARMA system identification method is proposed. In the first stage, the new model in conjunction with a residue based least-squares (RBLS) model-fitting optimization algorithm is used to estimate the AR parameters. In the second stage, the moving average (MA) parameters are estimated from the residual signal obtained by filtering the observed data using the estimated AR parameters. With a view to overcome the adverse affect of noise on the MA part, a noise-compensation scheme using an inverse autocorrelation function (IACF) of the residual signal is also proposed. Cepstrum analysis has been popular in speech and biomedical signal processing. In this thesis, several cepstral domain techniques are developed to identify AR and ARMA systems in noisy conditions. First, a ramp-cepstrum model for the one-sided autocorrelation function (ACF) of the AR and ARMA signals is proposed, which is then used for the estimation of the parameters of AR or ARMA systems using the RBLS algorithm. It is shown that for the estimation of the MA parameters of the ARMA systems, either a direct ramp-cepstrum model-fitting based approach or a noise-compensation based approach can be adopted. Considering that, in the case of real signals, discrete cosine transform is more attractive than the Fourier transform (FT) in terms of the computational complexity, a ramp cosine cepstrum model is also proposed for the identification of the AR and ARMA systems. In order to overcome the limitations of the conventional low-order Yule-Walker methods, a noise-compensated quadratic eigenvalue method utilizing the low-order lags of the ACF, is proposed for the estimation of the AR parameters of the ARMA system along with the noise variance. For the estimation of the MA parameters, the new noise-compensation method, in which, a spectral factorization of the resulting noise-compensated ACF of the residual signal is used, is employed. In order to study the effectiveness of the proposed identification techniques, extensive simulations are carried out by considering synthetic AR and ARMA systems of various orders under heavy noisy conditions. The results demonstrate the significant superiority of the proposed techniques over some of the existing methods even under very low levels of SNR. Simulation results on the identification of human vocal-tract systems using natural speech signals are also provided, showing a superior performance of the new techniques. As an illustration of application of the proposed AR and ARMA system identification techniques to speech analysis, noise robust schemes for the estimation of formant frequencies are developed. Synthetic and natural phonemes including some naturally spoken sentences in noisy environments are tested using the new formant estimation schemes. The experimental results demonstrate a performance superior to that of some of state-of-the-art methods at low levels of SNR

    DwinFormer: Dual Window Transformers for End-to-End Monocular Depth Estimation

    Full text link
    Depth estimation from a single image is of paramount importance in the realm of computer vision, with a multitude of applications. Conventional methods suffer from the trade-off between consistency and fine-grained details due to the local-receptive field limiting their practicality. This lack of long-range dependency inherently comes from the convolutional neural network part of the architecture. In this paper, a dual window transformer-based network, namely DwinFormer, is proposed, which utilizes both local and global features for end-to-end monocular depth estimation. The DwinFormer consists of dual window self-attention and cross-attention transformers, Dwin-SAT and Dwin-CAT, respectively. The Dwin-SAT seamlessly extracts intricate, locally aware features while concurrently capturing global context. It harnesses the power of local and global window attention to adeptly capture both short-range and long-range dependencies, obviating the need for complex and computationally expensive operations, such as attention masking or window shifting. Moreover, Dwin-SAT introduces inductive biases which provide desirable properties, such as translational equvariance and less dependence on large-scale data. Furthermore, conventional decoding methods often rely on skip connections which may result in semantic discrepancies and a lack of global context when fusing encoder and decoder features. In contrast, the Dwin-CAT employs both local and global window cross-attention to seamlessly fuse encoder and decoder features with both fine-grained local and contextually aware global information, effectively amending semantic gap. Empirical evidence obtained through extensive experimentation on the NYU-Depth-V2 and KITTI datasets demonstrates the superiority of the proposed method, consistently outperforming existing approaches across both indoor and outdoor environments

    Quantum Convolutional Neural Networks with Interaction Layers for Classification of Classical Data

    Full text link
    Quantum Machine Learning (QML) has come into the limelight due to the exceptional computational abilities of quantum computers. With the promises of near error-free quantum computers in the not-so-distant future, it is important that the effect of multi-qubit interactions on quantum neural networks is studied extensively. This paper introduces a Quantum Convolutional Network with novel Interaction layers exploiting three-qubit interactions increasing the network's expressibility and entangling capability, for classifying both image and one-dimensional data. The proposed approach is tested on three publicly available datasets namely MNIST, Fashion MNIST, and Iris datasets, to perform binary and multiclass classifications and is found to supersede the performance of the existing state-of-the-art methods.Comment: 20 pages, 14 figures, 3 table

    A WAVELET-DOMAIN LOCAL DOMINANT FEATURE SELECTION SCHEME FOR FACE RECOGNITION

    Get PDF
    Abstract: In this paper, a multi-resolution feature extraction algorithm for face recognition is proposed based on two-dimensional discrete wavelet transform (2D-DWT), which efficiently exploits the local spatial variations in a face image. For the purpose of feature extraction, instead of considering the entire face image, an entropybased local band selection criterion is developed, which selects high-informative horizontal segments from the face image. In order to capture the local spatial variations within these high-informative horizontal bands precisely, the horizontal band is segmented into several small spatial modules. Dominant wavelet coefficients corresponding to each local region residing inside those horizontal bands are selected as features. In the selection of the dominant coefficients, a histogram-based threshold criterion is proposed, which not only drastically reduces the feature dimension but also provides high within-class compactness and high between-class separability. A principal component analysis is performed to further reduce the dimensionality of the feature space. Extensive experimentation is carried out upon standard face databases and a very high degree of recognition accuracy is achieved by the proposed method in comparison to those obtained by some of the existing methods

    VMD-RiM: Rician Modeling of Temporal Feature Variation Extracted From Variational Mode Decomposed EEG Signal for Automatic Sleep Apnea Detection

    Get PDF
    Electroencephalogram (EEG) is getting special attention of late in the detection of sleep apnea as it is directly related to the neural activity. But apnea detection through visual monitoring of EEG signal by an expert is expensive, difficult, and susceptible to human error. To counter this problem, an automatic apnea detection scheme is proposed in this paper using a single lead EEG signal, which can differentiate apnea patients and healthy subjects and also classify apnea and non-apnea frames in the data of an apnea patient. Each sub-frame of a given frame of EEG data is first decomposed into band-limited intrinsic mode functions (BLIMFs) by using the variational mode decomposition (VMD). The advantage of using VMD is to obtain compact BLIMFs with adaptive center frequencies, which give an opportunity to capture the local information corresponding to varying neural activity. Furthermore, by extracting features from each BLIMF, a temporal within-frame feature variation pattern is obtained for each mode. We propose to fit the resulting pattern with the Rician model (RiM) and utilize the fitted model parameters as features. The use of such VMD-RiM features not only offers better feature quality but also ensures very low feature dimension. In order to evaluate the performance of the proposed method, K nearest neighbor classifier is used and various cross-validation schemes are carried out. Detailed experimentation is carried out on several apnea and healthy subjects of various apnea-hypopnea indices from three publicly available datasets and it is found that the proposed method achieves superior classification performances in comparison to those obtained by the existing methods, in terms of sensitivity, specificity, and accuracy
    corecore