47,504 research outputs found
Application of time-scale and frequency-scale modification alogrithms to voice-gender conversion
Voice conversion is an active new branch of speech processing and deals with the transformation of natural speech, focusing on changing the characteristics of the speaker’s voice. As a result of voice conversion, the speaker’s identity can be changed to make the converted speech sound as if it were uttered by a different speaker, or certain characteristics of the voice can be modified while maintaining the speaker’s identity. Voice-gender conversion (VGC) is a subset of voice conversion and focuses on the transformation of gender-specific voice characteristics. As a result of a voice-gender conversation, male speech is converted into female-sounding speech and vice versa. A major application of voice conversion is speaker normalisation. This means that a given voice is converted to a normalised voice. This allows speech recognition and speech compression methods to perform better as their effective signal space is reduced significantly. In speech compression applications, the reduction of the signal space enhances the efficiency and achieves higher compression rates. Another application is voice transformation to accommodate hearing impairements: a straightforward application is the usage of voice-gender conversion to disguise voices for the protection of individuals, e. g. witnesses, or for nuisance-call determent. The system presented in this thesis achieves voice-gender transformation by independently frequency-scaling the excitation and the formant spectrum of the speech signal in order to model the different voice-gender features from the voice-production perspective. The novelty of this research is the linearization of the non-linear relationship between the male and female formant spectrum. The algorithm used to achieve frequency-scaling is a time scale modification (TSM) algorithm called adaptive over-lap and add (AOLA), which is a recently developed method to efficiently change the duration of time-based signals
Exploiting Nonlinear Recurrence and Fractal Scaling Properties for Voice Disorder Detection
Background: Voice disorders affect patients profoundly, and acoustic tools can potentially measure voice function objectively. Disordered sustained vowels exhibit wide-ranging phenomena, from nearly periodic to highly complex, aperiodic vibrations, and increased "breathiness". Modelling and surrogate data studies have shown significant nonlinear and non-Gaussian random properties in these sounds. Nonetheless, existing tools are limited to analysing voices displaying near periodicity, and do not account for this inherent biophysical nonlinearity and non-Gaussian randomness, often using linear signal processing methods insensitive to these properties. They do not directly measure the two main biophysical symptoms of disorder: complex nonlinear aperiodicity, and turbulent, aeroacoustic, non-Gaussian randomness. Often these tools cannot be applied to more severe disordered voices, limiting their clinical usefulness.

Methods: This paper introduces two new tools to speech analysis: recurrence and fractal scaling, which overcome the range limitations of existing tools by addressing directly these two symptoms of disorder, together reproducing a "hoarseness" diagram. A simple bootstrapped classifier then uses these two features to distinguish normal from disordered voices.

Results: On a large database of subjects with a wide variety of voice disorders, these new techniques can distinguish normal from disordered cases, using quadratic discriminant analysis, to overall correct classification performance of 91.8% plus or minus 2.0%. The true positive classification performance is 95.4% plus or minus 3.2%, and the true negative performance is 91.5% plus or minus 2.3% (95% confidence). This is shown to outperform all combinations of the most popular classical tools.

Conclusions: Given the very large number of arbitrary parameters and computational complexity of existing techniques, these new techniques are far simpler and yet achieve clinically useful classification performance using only a basic classification technique. They do so by exploiting the inherent nonlinearity and turbulent randomness in disordered voice signals. They are widely applicable to the whole range of disordered voice phenomena by design. These new measures could therefore be used for a variety of practical clinical purposes.

Modern Methods of Time-Frequency Warping of Sound Signals
Tato práce se zabĂ˝vá reprezentacĂ nestacionárnĂch harmonickĂ˝ch signálĹŻ s ÄŤasovÄ› promÄ›nnĂ˝mi komponentami. PrimárnÄ› je zaměřena na Harmonickou transformaci a jeji variantu se subkvadratickou vĂ˝poÄŤetnĂ sloĹľitostĂ, Rychlou harmonickou transformaci. V tĂ©to práci jsou prezentovány dva algoritmy vyuĹľĂvajĂcĂ Rychlou harmonickou transformaci. Prvni pouĹľĂvá jako metodu odhadu zmÄ›ny základnĂho kmitoÄŤtu sbĂranĂ© logaritmickĂ© spektrum a druhá pouĹľĂvá metodu analĂ˝zy syntĂ©zou. Oba algoritmy jsou pouĹľity k analĂ˝ze Ĺ™eÄŤovĂ©ho segmentu pro porovnánĂ vystupĹŻ. Nakonec je algoritmus vyuĹľĂvajĂcĂ metody analĂ˝zy syntĂ©zou pouĹľit na reálnĂ© zvukovĂ© signály, aby bylo moĹľnĂ© změřit zlepšenĂ reprezentace kmitoÄŤtovÄ› modulovanĂ˝ch signálĹŻ za pouĹľitĂ HarmonickĂ© transformace.This thesis deals with representation of non-stationary harmonic signals with time-varying components. Its main focus is aimed at Harmonic Transform and its variant with subquadratic computational complexity, the Fast Harmonic Transform. Two algorithms using the Fast Harmonic Transform are presented. The first uses the gathered log-spectrum as fundamental frequency change estimation method, the second uses analysis-by-synthesis approach. Both algorithms are used on a speech segment to compare its output. Further the analysis-by-synthesis algorithm is applied on several real sound signals to measure the increase in the ability to represent real frequency-modulated signals using the Harmonic Transform.
Joint Tensor Factorization and Outlying Slab Suppression with Applications
We consider factoring low-rank tensors in the presence of outlying slabs.
This problem is important in practice, because data collected in many
real-world applications, such as speech, fluorescence, and some social network
data, fit this paradigm. Prior work tackles this problem by iteratively
selecting a fixed number of slabs and fitting, a procedure which may not
converge. We formulate this problem from a group-sparsity promoting point of
view, and propose an alternating optimization framework to handle the
corresponding () minimization-based low-rank tensor
factorization problem. The proposed algorithm features a similar per-iteration
complexity as the plain trilinear alternating least squares (TALS) algorithm.
Convergence of the proposed algorithm is also easy to analyze under the
framework of alternating optimization and its variants. In addition,
regularization and constraints can be easily incorporated to make use of
\emph{a priori} information on the latent loading factors. Simulations and real
data experiments on blind speech separation, fluorescence data analysis, and
social network mining are used to showcase the effectiveness of the proposed
algorithm
Type-IV DCT, DST, and MDCT algorithms with reduced numbers of arithmetic operations
We present algorithms for the type-IV discrete cosine transform (DCT-IV) and
discrete sine transform (DST-IV), as well as for the modified discrete cosine
transform (MDCT) and its inverse, that achieve a lower count of real
multiplications and additions than previously published algorithms, without
sacrificing numerical accuracy. Asymptotically, the operation count is reduced
from ~2NlogN to ~(17/9)NlogN for a power-of-two transform size N, and the exact
count is strictly lowered for all N > 4. These results are derived by
considering the DCT to be a special case of a DFT of length 8N, with certain
symmetries, and then pruning redundant operations from a recent improved fast
Fourier transform algorithm (based on a recursive rescaling of the
conjugate-pair split radix algorithm). The improved algorithms for DST-IV and
MDCT follow immediately from the improved count for the DCT-IV.Comment: 11 page
A modulation property of time-frequency derivatives of filtered phase and its application to aperiodicity and fo estimation
We introduce a simple and linear SNR (strictly speaking, periodic to random
power ratio) estimator (0dB to 80dB without additional
calibration/linearization) for providing reliable descriptions of aperiodicity
in speech corpus. The main idea of this method is to estimate the background
random noise level without directly extracting the background noise. The
proposed method is applicable to a wide variety of time windowing functions
with very low sidelobe levels. The estimate combines the frequency derivative
and the time-frequency derivative of the mapping from filter center frequency
to the output instantaneous frequency. This procedure can replace the
periodicity detection and aperiodicity estimation subsystems of recently
introduced open source vocoder, YANG vocoder. Source code of MATLAB
implementation of this method will also be open sourced.Comment: 8 pages 9 figures, Submitted and accepted in Interspeech201
- …