Search CORE

17,103 research outputs found

Improvement of speech recognition by nonlinear noise reduction

Author: Holger Kantz
Krzysztof Urbanowicz
Press W. H.
Urbanowicz K.
Urbanowicz K.
Publication venue: 'AIP Publishing'
Publication date: 01/01/2007
Field of study

The success of nonlinear noise reduction applied to a single channel recording of human voice is measured in terms of the recognition rate of a commercial speech recognition program in comparison to the optimal linear filter. The overall performance of the nonlinear method is shown to be superior. We hence demonstrate that an algorithm which has its roots in the theory of nonlinear deterministic dynamics possesses a large potential in a realistic application.Comment: see urbanowicz.org.p

arXiv.org e-Print Archive

Crossref

MPG.PuRe

Statistical Models of Reconstructed Phase Spaces for Signal Classification

Author: Johnson Michael T
Lindgren Andrew C.
Povinelli Richard J.
Roberts Felice M.
Ye Jinjin
Publication venue: e-Publications@Marquette
Publication date: 01/06/2006
Field of study

This paper introduces a novel approach to the analysis and classification of time series signals using statistical models of reconstructed phase spaces. With sufficient dimension, such reconstructed phase spaces are, with probability one, guaranteed to be topologically equivalent to the state dynamics of the generating system, and, therefore, may contain information that is absent in analysis and classification methods rooted in linear assumptions. Parametric and nonparametric distributions are introduced as statistical representations over the multidimensional reconstructed phase space, with classification accomplished through methods such as Bayes maximum likelihood and artificial neural networks (ANNs). The technique is demonstrated on heart arrhythmia classification and speech recognition. This new approach is shown to be a viable and effective alternative to traditional signal classification approaches, particularly for signals with strong nonlinear characteristics

epublications@Marquette

SSA of biomedical signals: A linear invariant systems approach

Author: Figueiredo Nuno
Georgieva P.
Lang E.W.
Santos I.M.
Teixeira Ana Rita
Tomé A.M.
Publication venue: 'International Press of Boston'
Publication date: 01/01/2010
Field of study

Singular spectrum analysis (SSA) is considered from a linear invariant systems perspective. In this terminology, the extracted components are considered as outputs of a linear invariant system which corresponds to finite impulse response (FIR) filters. The number of filters is determined by the embedding dimension.We propose to explicitly define the frequency response of each filter responsible for the selection of informative components. We also introduce a subspace distance measure for clustering subspace models. We illustrate the methodology by analyzing lectroencephalograms (EEG).FCT - PhD scholarship (SFRH/BD/28404/2006)FCT - PhD scholarship (SFRH/BD/48775/2008

Repositório Institucional da Universidade de Aveiro

Repositório Comum

FaceFilter: Audio-visual speech separation using still images

Author: Choe Soyeon
Chung Joon Son
Chung Soo-Whan
Kang Hong-Goo
Publication venue: 'International Speech Communication Association'
Publication date: 14/05/2020
Field of study

The objective of this paper is to separate a target speaker's speech from a mixture of two speakers using a deep audio-visual speech separation network. Unlike previous works that used lip movement on video clips or pre-enrolled speaker information as an auxiliary conditional feature, we use a single face image of the target speaker. In this task, the conditional feature is obtained from facial appearance in cross-modal biometric task, where audio and visual identity representations are shared in latent space. Learnt identities from facial images enforce the network to isolate matched speakers and extract the voices from mixed speech. It solves the permutation problem caused by swapped channel outputs, frequently occurred in speech separation tasks. The proposed method is far more practical than video-based speech separation since user profile images are readily available on many platforms. Also, unlike speaker-aware separation methods, it is applicable on separation with unseen speakers who have never been enrolled before. We show strong qualitative and quantitative results on challenging real-world examples.Comment: Under submission as a conference paper. Video examples: https://youtu.be/ku9xoLh62

arXiv.org e-Print Archive

Crossref

Speech Processing in Computer Vision Applications

Author: Waterworth Nicholas
Publication venue: ScholarWorks@UARK
Publication date: 01/05/2020
Field of study

Deep learning has been recently proven to be a viable asset in determining features in the field of Speech Analysis. Deep learning methods like Convolutional Neural Networks facilitate the expansion of specific feature information in waveforms, allowing networks to create more feature dense representations of data. Our work attempts to address the problem of re-creating a face given a speaker\u27s voice and speaker identification using deep learning methods. In this work, we first review the fundamental background in speech processing and its related applications. Then we introduce novel deep learning-based methods to speech feature analysis. Finally, we will present our deep learning approaches to speaker identification and speech to face synthesis. The presented method can convert a speaker audio sample to an image of their predicted face. This framework is composed of several chained together networks, each with an essential step in the conversion process. These include Audio embedding, encoding, and face generation networks, respectively. Our experiments show that certain features can map to the face and that with a speaker\u27s voice, DNNs can create their face and that a GUI could be used in conjunction to display a speaker recognition network\u27s data

ScholarWorks@UARK

UARK (University of Arkansas )