3,321 research outputs found

    Speech Processing in Computer Vision Applications

    Get PDF
    Deep learning has been recently proven to be a viable asset in determining features in the field of Speech Analysis. Deep learning methods like Convolutional Neural Networks facilitate the expansion of specific feature information in waveforms, allowing networks to create more feature dense representations of data. Our work attempts to address the problem of re-creating a face given a speaker\u27s voice and speaker identification using deep learning methods. In this work, we first review the fundamental background in speech processing and its related applications. Then we introduce novel deep learning-based methods to speech feature analysis. Finally, we will present our deep learning approaches to speaker identification and speech to face synthesis. The presented method can convert a speaker audio sample to an image of their predicted face. This framework is composed of several chained together networks, each with an essential step in the conversion process. These include Audio embedding, encoding, and face generation networks, respectively. Our experiments show that certain features can map to the face and that with a speaker\u27s voice, DNNs can create their face and that a GUI could be used in conjunction to display a speaker recognition network\u27s data

    Hybrid Deep Learning Framework for Reduction of Mixed Noise via Low Rank Noise Estimation

    Get PDF
    In this paper, an innovative hybridized deep learning framework (EN-CNN) is presented for image noise reduction where the noise originates from heterogeneous sources. More specifically, EN-CNN is applied to the benchmark natural images affected by a mixture of additive white gaussian noise (AWGN) and impulsive noise (IN). Reduction of mixed noise (AWGN and IN) is relatively more involved as compared to removing simply one type of noise. In fact, mitigating the impact of a mixture of multiple noise types becomes exceedingly challenging due to simultaneous presence of different noise statistics. Although, various effective deep learning approaches and the classical state-of-the-art approaches like WNNM have been used to suppress AWGN noise only, the same techniques are not suitable in case of mixed noise. In this context, EN-CNN can not only infer changed noise statistics but can also effectively eliminate residual noise. Firstly, EN-CNN employs the classical method of neighborhood filtering followed by non-local low rank estimation to respectively reduce IN noise and estimate the residual noise characteristics after reducing IN noise. As a result of this step, we obtain a pre-processed image with residual noise statistics. Secondly, convolutional neural network (CNN) is applied to the pre-processed image based on the noise statistics inferred in the first step. This two pronged strategy, in conjunction with the deep learning mechanism, effectively handles the mixed noise suppression. As a result, the suggested framework yields promising results as compared to various state-of-the-art approaches.publishedVersio
    • …
    corecore