184 research outputs found
Speech Processing in Computer Vision Applications
Deep learning has been recently proven to be a viable asset in determining features in the field of Speech Analysis. Deep learning methods like Convolutional Neural Networks facilitate the expansion of specific feature information in waveforms, allowing networks to create more feature dense representations of data. Our work attempts to address the problem of re-creating a face given a speaker\u27s voice and speaker identification using deep learning methods. In this work, we first review the fundamental background in speech processing and its related applications. Then we introduce novel deep learning-based methods to speech feature analysis. Finally, we will present our deep learning approaches to speaker identification and speech to face synthesis. The presented method can convert a speaker audio sample to an image of their predicted face. This framework is composed of several chained together networks, each with an essential step in the conversion process. These include Audio embedding, encoding, and face generation networks, respectively. Our experiments show that certain features can map to the face and that with a speaker\u27s voice, DNNs can create their face and that a GUI could be used in conjunction to display a speaker recognition network\u27s data
Acoustic model adaptation from raw waveforms with Sincnet
Raw waveform acoustic modelling has recently gained interest due to neural
networks' ability to learn feature extraction, and the potential for finding
better representations for a given scenario than hand-crafted features. SincNet
has been proposed to reduce the number of parameters required in raw-waveform
modelling, by restricting the filter functions, rather than having to learn
every tap of each filter. We study the adaptation of the SincNet filter
parameters from adults' to children's speech, and show that the
parameterisation of the SincNet layer is well suited for adaptation in
practice: we can efficiently adapt with a very small number of parameters,
producing error rates comparable to techniques using orders of magnitude more
parameters.Comment: Accepted to IEEE ASRU 201
- …