313 research outputs found

    Synthetic speech detection and audio steganography in VoIP scenarios

    Get PDF
    The distinction between synthetic and human voice uses the techniques of the current biometric voice recognition systems, which prevent that a person’s voice, no matter if with good or bad intentions, can be confused with someone else’s. Steganography gives the possibility to hide in a file without a particular value (usually audio, video or image files) a hidden message in such a way as to not rise suspicion to any external observer. This article suggests two methods, applicable in a VoIP hypothetical scenario, which allow us to distinguish a synthetic speech from a human voice, and to insert within the Comfort Noise a text message generated in the pauses of a voice conversation. The first method takes up the studies already carried out for the Modulation Features related to the temporal analysis of the speech signals, while the second one proposes a technique that derives from the Direct Sequence Spread Spectrum, which consists in distributing the signal energy to hide on a wider band transmission. Due to space limits, this paper is only an extended abstract. The full version will contain further details on our research

    Glottal Source Cepstrum Coefficients Applied to NIST SRE 2010

    Get PDF
    Through the present paper, a novel feature set for speaker recognition based on glottal estimate information is presented. An iterative algorithm is used to derive the vocal tract and glottal source estimations from speech signal. In order to test the importance of glottal source information in speaker characterization, the novel feature set has been tested in the 2010 NIST Speaker Recognition Evaluation (NIST SRE10). The proposed system uses glottal estimate parameter templates and classical cepstral information to build a model for each speaker involved in the recognition process. ALIZE [1] open-source software has been used to create the GMM models for both background and target speakers. Compared to using mel-frequency cepstrum coefficients (MFCC), the misclassification rate for the NIST SRE 2010 reduced from 29.43% to 27.15% when glottal source features are use

    Time–Frequency Cepstral Features and Heteroscedastic Linear Discriminant Analysis for Language Recognition

    Get PDF
    The shifted delta cepstrum (SDC) is a widely used feature extraction for language recognition (LRE). With a high context width due to incorporation of multiple frames, SDC outperforms traditional delta and acceleration feature vectors. However, it also introduces correlation into the concatenated feature vector, which increases redundancy and may degrade the performance of backend classifiers. In this paper, we first propose a time-frequency cepstral (TFC) feature vector, which is obtained by performing a temporal discrete cosine transform (DCT) on the cepstrum matrix and selecting the transformed elements in a zigzag scan order. Beyond this, we increase discriminability through a heteroscedastic linear discriminant analysis (HLDA) on the full cepstrum matrix. By utilizing block diagonal matrix constraints, the large HLDA problem is then reduced to several smaller HLDA problems, creating a block diagonal HLDA (BDHLDA) algorithm which has much lower computational complexity. The BDHLDA method is finally extended to the GMM domain, using the simpler TFC features during re-estimation to provide significantly improved computation speed. Experiments on NIST 2003 and 2007 LRE evaluation corpora show that TFC is more effective than SDC, and that the GMM-based BDHLDA results in lower equal error rate (EER) and minimum average cost (Cavg) than either TFC or SDC approaches

    Cepstral methods for image feature extraction

    Get PDF
    Ankara : The Department of Electrical and Electronics Engineering and the Institute of Engineering and Sciences of Bilkent University, 2010.Thesis (Master's) -- Bilkent University, 2010.Includes bibliographical references leaves 49-57.Image feature extraction is one of the most vital tasks in computer vision and pattern recognition applications due to its importance in the preparation of data extracted from images. In this thesis, 2D cepstrum based methods (2D mel- and Mellin-cepstrum) are proposed for image feature extraction. The proposed feature extraction schemes are used in face recognition and target detection applications. The cepstral features are invariant to amplitude and translation changes. In addition, the features extracted using 2D Mellin-cepstrum method are rotation invariant. Due to these merits, the proposed techniques can be used in various feature extraction problems. The feature matrices extracted using the cepstral methods are classified by Common Matrix Approach (CMA) and multi-class Support Vector Machine (SVM). Experimental results show that the success rates obtained using cepstral feature extraction algorithms are higher than the rates obtained using standard baselines (PCA, Fourier-Mellin Transform, Fourier LDA approach). Moreover, it is observed that the features extracted by cepstral methods are computationally more efficient than the standard baselines. In target detection task, the proposed feature extraction methods are used in the detection and discrimination stages of a typical Automatic Target Recognition (ATR) system. The feature matrices obtained from the cepstral techniques are applied to the SVM classifier. The simulation results show that 2D cepstral feature extraction techniques can be used in the target detection in SAR images.Çakır, SerdarM.S

    Audio segmentation-by-classification approach based on factor analysis in broadcast news domain

    Get PDF
    This paper studies a novel audio segmentation-by-classification approach based on factor analysis. The proposed technique compensates the within-class variability by using class-dependent factor loading matrices and obtains the scores by computing the log-likelihood ratio for the class model to a non-class model over fixed-length windows. Afterwards, these scores are smoothed to yield longer contiguous segments of the same class by means of different back-end systems. Unlike previous solutions, our proposal does not make use of specific acoustic features and does not need a hierarchical structure. The proposed method is applied to segment and classify audios coming from TV shows into five different acoustic classes: speech, music, speech with music, speech with noise, and others. The technique is compared to a hierarchical system with specific acoustic features achieving a significant error reduction

    Efficient Spectral Power Estimation on an Arbitrary Frequency Scale

    Get PDF
    The Fast Fourier Transform is a very efficient algorithm for the Fourier spectrum estimation, but has the limitation of a linear frequency scale spectrum, which may not be suitable for every system. For example, audio and speech analysis needs a logarithmic frequency scale due to the characteristic of a human’s ear. The Fast Fourier Transform algorithms are not able to efficiently give the desired results and modified techniques have to be used in this case. In the following text a simple technique using the Goertzel algorithm allowing the evaluation of the power spectra on an arbitrary frequency scale will be introduced. Due to its simplicity the algorithm suffers from imperfections which will be discussed and partially solved in this paper. The implementation into real systems and the impact of quantization errors appeared to be critical and have to be dealt with in special cases. The simple method dealing with the quantization error will also be introduced. Finally, the proposed method will be compared to other methods based on its computational demands and its potential speed

    Mel-cepstral feature extraction methods for image representation

    Get PDF
    An image feature extraction method based on the twodimensional (2-D) mel cepstrum is introduced. The concept of onedimensional mel cepstrum, which is widely used in speech recognition, is extended to 2-D in this article. The feature matrix resulting from the 2-D mel-cepstral analysis are applied to the support-vector-machine classifier with multi-class support to test the performance of the mel-cepstrum feature matrix. The AR, ORL, and Yale face databases are used in experimental studies, which indicate that recognition rates obtained by the 2-D mel-cepstrum method are superior to the recognition rates obtained using 2-D principal-component analysis and ordinary image-matrixbased face recognition. Experimental results show that 2-D mel-cepstral analysis can also be used in other image feature extraction problems. © 2010 SPIE
    • 

    corecore