9,029 research outputs found

    Empirical mode decomposition-based facial pose estimation inside video sequences

    Get PDF
    We describe a new pose-estimation algorithm via integration of the strength in both empirical mode decomposition (EMD) and mutual information. While mutual information is exploited to measure the similarity between facial images to estimate poses, EMD is exploited to decompose input facial images into a number of intrinsic mode function (IMF) components, which redistribute the effect of noise, expression changes, and illumination variations as such that, when the input facial image is described by the selected IMF components, all the negative effects can be minimized. Extensive experiments were carried out in comparisons to existing representative techniques, and the results show that the proposed algorithm achieves better pose-estimation performances with robustness to noise corruption, illumination variation, and facial expressions

    Performance evaluation of the Hilbert–Huang transform for respiratory sound analysis and its application to continuous adventitious sound characterization

    Get PDF
    © 2016. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/The use of the Hilbert–Huang transform in the analysis of biomedical signals has increased during the past few years, but its use for respiratory sound (RS) analysis is still limited. The technique includes two steps: empirical mode decomposition (EMD) and instantaneous frequency (IF) estimation. Although the mode mixing (MM) problem of EMD has been widely discussed, this technique continues to be used in many RS analysis algorithms. In this study, we analyzed the MM effect in RS signals recorded from 30 asthmatic patients, and studied the performance of ensemble EMD (EEMD) and noise-assisted multivariate EMD (NA-MEMD) as means for preventing this effect. We propose quantitative parameters for measuring the size, reduction of MM, and residual noise level of each method. These parameters showed that EEMD is a good solution for MM, thus outperforming NA-MEMD. After testing different IF estimators, we propose Kay¿s method to calculate an EEMD-Kay-based Hilbert spectrum that offers high energy concentrations and high time and high frequency resolutions. We also propose an algorithm for the automatic characterization of continuous adventitious sounds (CAS). The tests performed showed that the proposed EEMD-Kay-based Hilbert spectrum makes it possible to determine CAS more precisely than other conventional time-frequency techniques.Postprint (author's final draft

    Machine Learning Mitigants for Speech Based Cyber Risk

    Get PDF
    Statistical analysis of speech is an emerging area of machine learning. In this paper, we tackle the biometric challenge of Automatic Speaker Verification (ASV) of differentiating between samples generated by two distinct populations of utterances, those of an authentic human voice and those generated by a synthetic one. Solving such an issue through a statistical perspective foresees the definition of a decision rule function and a learning procedure to identify the optimal classifier. Classical state-of-the-art countermeasures rely on strong assumptions such as stationarity or local-stationarity of speech that may be atypical to encounter in practice. We explore in this regard a robust non-linear and non-stationary signal decomposition method known as the Empirical Mode Decomposition combined with the Mel-Frequency Cepstral Coefficients in a novel fashion with a refined classifier technique known as multi-kernel Support Vector machine. We undertake significant real data case studies covering multiple ASV systems using different datasets, including the ASVSpoof 2019 challenge database. The obtained results overwhelmingly demonstrate the significance of our feature extraction and classifier approach versus existing conventional methods in reducing the threat of cyber-attack perpetrated by synthetic voice replication seeking unauthorised access

    Noise Variance Estimation In Signal Processing

    Get PDF
    We present a new method of estimating noise variance. The method is applicable for 1D and 2D signal processing. The essence of this method is estimation of the scatter of normally distributed data with high level of outliers. The method is applicable to data with the majority of the data points having no signal present. The method is based on the shortest half sample method. The mean of the shortest half sample (shorth) and the location of the least median of squares are among the most robust measures of the location of the mode. The length of the shortest half sample has been used as the measurement of the data scatter of uncontaminated data. We show that computing the length of several sub samples of varying sizes provides the necessary information to estimate both the scatter and the number of uncontaminated data points in a sample. We derive the system of equations to solve for the data scatter and the number of uncontaminated data points for the Gaussian distribution. The data scatter is the measure of the noise variance. The method can be extended to other distributions

    Adaptive wavelet thresholding with robust hybrid features for text-independent speaker identification system

    Get PDF
    The robustness of speaker identification system over additive noise channel is crucial for real-world applications. In speaker identification (SID) systems, the extracted features from each speech frame are an essential factor for building a reliable identification system. For clean environments, the identification system works well; in noisy environments, there is an additive noise, which is affect the system. To eliminate the problem of additive noise and to achieve a high accuracy in speaker identification system a proposed algorithm for feature extraction based on speech enhancement and a combined features is presents. In this paper, a wavelet thresholding pre-processing stage, and feature warping (FW) techniques are used with two combined features named power normalized cepstral coefficients (PNCC) and gammatone frequency cepstral coefficients (GFCC) to improve the identification system robustness against different types of additive noises. Universal Background Model Gaussian Mixture Model (UBM-GMM) is used for features matching between the claim and actual speakers. The results showed performance improvement for the proposed feature extraction algorithm of identification system comparing with conventional features over most types of noises and different SNR ratios
    • …
    corecore