37 research outputs found

    Intelligent Local Face Recognition

    Get PDF

    The VidTIMIT Database

    Get PDF
    This communication describes the multi-modal VidTIMIT database, which can be useful for research involving mono- or multi-modal speech recognition or person authentication. It is comprised of video and corresponding audio recordings of 43 volunteers, reciting short sentences selected from the NTIMIT corpus

    Improvement Of Face Recognition Using Principal Component Analysis And Moment Invariant

    Get PDF
    Face recognition attracts many researchers and has made significant progress in recent years. Face recognition is a type of biometric just like fingerprint and iris scans. This technology plays an important role in real-world applications, such as commercial and law enforcement applications, from here comes the importance of tackling this kind of research. In this research, we have proposed a method that integrates Principal Component Analysis (PCA) and Moment Invariant with face colour in gray scale to recognize face images of various pose. The PCA method is used to analyze the face image because it is optimal with any similar face image analysis and it has been employed to extract the global information. The vectors of a face in the database that are matched with the one of face image will be recognized the owner. If the vector is not matched, the original face image will be reconsidered with moment invariant and face colour in gray scale extraction. Then, the face will be rematched.In this way, the unrecognized faces will be reconsidered again and some will be recognized accurately to increase the number of recognized faces and improve the recognition accuracy as well. We have applied our method on Olivetti Research Laboratory (ORL) database which is issued by AT&T. The database contains 40 different faces images with 10 each face. Our experiment is done by using the holdout to measure the recognition accuracy, as we divided about 2/3 of the data 280 faces for training, and about 1/3 which is 120 faces for testing. The results showed a recognition accuracy of 94% for applying PCA, and 96% after reconsidering the unrecognized patterns by dealing with pose-varied faces and face colour extraction. Our proposed method has improved the recognition accuracy with the additional features extracted (PCA + face colour in gray scale) with the consideration of the total time process

    Human abnormal behavior impact on speaker verification systems

    Get PDF
    Human behavior plays a major role in improving human-machine communication. The performance must be affected by abnormal behavior as systems are trained using normal utterances. The abnormal behavior is often associated with a change in the human emotional state. Different emotional states cause physiological changes in the human body that affect the vocal tract. Fear, anger, or even happiness we recognize as a deviation from a normal behavior. The whole spectrum of human-machine application is susceptible to behavioral changes. Abnormal behavior is a major factor, especially for security applications such as verification systems. Face, fingerprint, iris, or speaker verification is a group of the most common approaches to biometric authentication today. This paper discusses human normal and abnormal behavior and its impact on the accuracy and effectiveness of automatic speaker verification (ASV). The support vector machines classifier inputs are Mel-frequency cepstral coefficients and their dynamic changes. For this purpose, the Berlin Database of Emotional Speech was used. Research has shown that abnormal behavior has a major impact on the accuracy of verification, where the equal error rate increase to 37 %. This paper also describes a new design and application of the ASV system that is much more immune to the rejection of a target user with abnormal behavior.Web of Science6401274012

    Audio-visual speech recognition with background music using single-channel source separation

    Get PDF
    In this paper, we consider audio-visual speech recognition with background music. The proposed algorithm is an integration of audio-visual speech recognition and single channel source separation (SCSS). We apply the proposed algorithm to recognize spoken speech that is mixed with music signals. First, the SCSS algorithm based on nonnegative matrix factorization (NMF) and spectral masks is used to separate the audio speech signal from the background music in magnitude spectral domain. After speech audio is separated from music, regular audio-visual speech recognition (AVSR) is employed using multi-stream hidden Markov models. Employing two approaches together, we try to improve recognition accuracy by both processing the audio signal with SCSS and supporting the recognition task with visual information. Experimental results show that combining audio-visual speech recognition with source separation gives remarkable improvements in the accuracy of the speech recognition system

    Multimodal Authentication using Asynchronous HMMs

    Get PDF

    Using multiple visual tandem streams in audio-visual speech recognition

    Get PDF
    The method which is called the "tandem approach" in speech recognition has been shown to increase performance by using classifier posterior probabilities as observations in a hidden Markov model. We study the effect of using visual tandem features in audio-visual speech recognition using a novel setup which uses multiple classifiers to obtain multiple visual tandem features. We adopt the approach of multi-stream hidden Markov models where visual tandem features from two different classifiers are considered as additional streams in the model. It is shown in our experiments that using multiple visual tandem features improve the recognition accuracy in various noise conditions. In addition, in order to handle asynchrony between audio and visual observations, we employ coupled hidden Markov models and obtain improved performance as compared to the synchronous model

    Multimodal Based Audio-Visual Speech Recognition for Hard-of-Hearing: State of the Art Techniques and Challenges

    Get PDF
    Multimodal Integration (MI) is the study of merging the knowledge acquired by the nervous system using sensory modalities such as speech, vision, touch, and gesture. The applications of MI expand over the areas of Audio-Visual Speech Recognition (AVSR), Sign Language Recognition (SLR), Emotion Recognition (ER), Bio Metrics Applications (BMA), Affect Recognition (AR), Multimedia Retrieval (MR), etc. The fusion of modalities such as hand gestures- facial, lip- hand position, etc., are mainly used sensory modalities for the development of hearing-impaired multimodal systems. This paper encapsulates an overview of multimodal systems available within literature towards hearing impaired studies. This paper also discusses some of the studies related to hearing-impaired acoustic analysis. It is observed that very less algorithms have been developed for hearing impaired AVSR as compared to normal hearing. Thus, the study of audio-visual based speech recognition systems for the hearing impaired is highly demanded for the people who are trying to communicate with natively speaking languages.  This paper also highlights the state-of-the-art techniques in AVSR and the challenges faced by the researchers for the development of AVSR systems
    corecore