705 research outputs found

    Deep Learning for Real-time Information Hiding and Forensics

    Get PDF

    Speech Processing in Computer Vision Applications

    Get PDF
    Deep learning has been recently proven to be a viable asset in determining features in the field of Speech Analysis. Deep learning methods like Convolutional Neural Networks facilitate the expansion of specific feature information in waveforms, allowing networks to create more feature dense representations of data. Our work attempts to address the problem of re-creating a face given a speaker\u27s voice and speaker identification using deep learning methods. In this work, we first review the fundamental background in speech processing and its related applications. Then we introduce novel deep learning-based methods to speech feature analysis. Finally, we will present our deep learning approaches to speaker identification and speech to face synthesis. The presented method can convert a speaker audio sample to an image of their predicted face. This framework is composed of several chained together networks, each with an essential step in the conversion process. These include Audio embedding, encoding, and face generation networks, respectively. Our experiments show that certain features can map to the face and that with a speaker\u27s voice, DNNs can create their face and that a GUI could be used in conjunction to display a speaker recognition network\u27s data

    Biometric information analyses using computer vision techniques.

    Get PDF
    Biometric information analysis is derived from the analysis of a series of physical and biological characteristics of a person. It is widely regarded as the most fundamental task in the realms of computer vision and machine learning. With the overwhelming power of computer vision techniques, biometric information analysis have received increasing attention in the past decades. Biometric information can be analyzed from many sources including iris, retina, voice, fingerprint, facial image or even the way one walks with. Facial image and gait, because of their easy availability, are two preferable sources of biometric information analysis. In this thesis, we investigated the development of most recent computer vision techniques and proposed various state-of-the-art models to solve the four principle problems in biometric information analysis including the age estimation, age progression, face retrieval and gait recognition. For age estimation, the modeling has always been a challenge. Existing works model the age estimation problem as either a classification or a regression problem. However, these two types of models are not able to reveal the intrinsic nature of human age. To this end, we proposed a novel hierarchical framework and a ordinal metric learning based method. In the hierarchical framework, a random forest based clustering method is introduced to find an optimal age grouping protocol. In the ordinal metric learning approach, the age estimation is solved by learning an subspace where the ordinal structure of the data is preserved. Both of them have achieved state-of-the-art performance. For face retrieval, specifically under a cross-age setting, we first proposed a novel task, that is given two images, finding the target image which is supposed to have the same identity with the first input and the same age with the second input. To tackle this task, we proposed a joint manifold learning method that can disentangle the identity with the age information. Accompanied with two independent similarity measurements, the retrieval can be easily performed. For aging progression, we also proposed a novel task that has never been considered. We devoted to fuse the identity of one image with the age of another image. By proposing a novel framework based on generative adversarial networks, our model is able to generate close-to-realistic images. Lastly, although gait recognition is an ideal long-distance biometric information task that makes up the shortfall of facial image, existing works are not able to handle large scale data with various view angles. We proposed a generative model to solve this term and achieved promising results. Moreover, our model is able to generate evidences for forensic usage
    corecore