2,499 research outputs found
Speaker recognition by means of restricted Boltzmann machine adaptation
Restricted Boltzmann Machines (RBMs) have shown success in speaker recognition. In this paper, RBMs are investigated in a framework comprising a universal model training and model adaptation. Taking advantage of RBM unsupervised learning algorithm, a global model is trained based on all available background data. This general speaker-independent model, referred to as URBM, is further adapted to the data of a specific speaker to build speaker-dependent model. In order to show its effectiveness, we have applied this framework to two different tasks. It has been used to discriminatively model target and impostor spectral features for classification. It has been also utilized to produce a vector-based representation for speakers. This vector-based representation, similar to i-vector, can be further used for speaker recognition using either cosine scoring or Probabilistic Linear Discriminant Analysis (PLDA). The evaluation is performed on the core test condition of the NIST SRE 2006 database.Peer ReviewedPostprint (author's final draft
Reducing Audible Spectral Discontinuities
In this paper, a common problem in diphone synthesis is discussed, viz., the occurrence of audible discontinuities at diphone boundaries. Informal observations show that spectral mismatch is most likely the cause of this phenomenon.We first set out to find an objective spectral measure for discontinuity. To this end, several spectral distance measures are related to the results of a listening experiment. Then, we studied the feasibility of extending the diphone database with context-sensitive diphones to reduce the occurrence of audible discontinuities. The number of additional diphones is limited by clustering consonant contexts that have a similar effect on the surrounding vowels on the basis of the best performing distance measure. A listening experiment has shown that the addition of these context-sensitive diphones significantly reduces the amount of audible discontinuities
Improving the Speech Intelligibility By Cochlear Implant Users
In this thesis, we focus on improving the intelligibility of speech for cochlear implants (CI) users. As an auditory prosthetic device, CI can restore hearing sensations for most patients with profound hearing loss in both ears in a quiet background. However, CI users still have serious problems in understanding speech in noisy and reverberant environments. Also, bandwidth limitation, missing temporal fine structures, and reduced spectral resolution due to a limited number of electrodes are other factors that raise the difficulty of hearing in noisy conditions for CI users, regardless of the type of noise. To mitigate these difficulties for CI listener, we investigate several contributing factors such as the effects of low harmonics on tone identification in natural and vocoded speech, the contribution of matched envelope dynamic range to the binaural benefits and contribution of low-frequency harmonics to tone identification in quiet and six-talker babble background. These results revealed several promising methods for improving speech intelligibility for CI patients. In addition, we investigate the benefits of voice conversion in improving speech intelligibility for CI users, which was motivated by an earlier study showing that familiarity with a talker’s voice can improve understanding of the conversation. Research has shown that when adults are familiar with someone’s voice, they can more accurately – and even more quickly – process and understand what the person is saying. This theory identified as the “familiar talker advantage” was our motivation to examine its effect on CI patients using voice conversion technique. In the present research, we propose a new method based on multi-channel voice conversion to improve the intelligibility of transformed speeches for CI patients
On transforming spectral peaks in voice conversion
International audienceThis paper explores the benefits of transforming spectral peaks in voice conversion. First, in examining classic GMMbased transformation with cepstral coefficients, we show that the lack of transformed data variance ("over-smoothing") can be related to the choice of spectral parameterization. Consequently, we propose an alternative parameterization using spectral peaks. The peaks are transformed using HMMs with Gaussian state distributions. Two learning variants and post-processing treating peak evolution in time are also examined. In comparing the different transformation approaches, spectral peaks are shown to offer higher interspeaker feature correlation and yield higher transformed data variance than their cepstral coefficient counterparts
C-HiLasso: A Collaborative Hierarchical Sparse Modeling Framework
Sparse modeling is a powerful framework for data analysis and processing.
Traditionally, encoding in this framework is performed by solving an
L1-regularized linear regression problem, commonly referred to as Lasso or
Basis Pursuit. In this work we combine the sparsity-inducing property of the
Lasso model at the individual feature level, with the block-sparsity property
of the Group Lasso model, where sparse groups of features are jointly encoded,
obtaining a sparsity pattern hierarchically structured. This results in the
Hierarchical Lasso (HiLasso), which shows important practical modeling
advantages. We then extend this approach to the collaborative case, where a set
of simultaneously coded signals share the same sparsity pattern at the higher
(group) level, but not necessarily at the lower (inside the group) level,
obtaining the collaborative HiLasso model (C-HiLasso). Such signals then share
the same active groups, or classes, but not necessarily the same active set.
This model is very well suited for applications such as source identification
and separation. An efficient optimization procedure, which guarantees
convergence to the global optimum, is developed for these new models. The
underlying presentation of the new framework and optimization approach is
complemented with experimental examples and theoretical results regarding
recovery guarantees for the proposed models
- …