64,754 research outputs found
Fuzzy Recursive Least-Squares Approach in Speech System Identification: A Transformed Domain LPC Model
In speech system identification, linear predictive coding (LPC) model is often employed due to its simple yet powerful representation of speech production model. However, the accuracy of LPC model often depends on the number and quality of past speech samples that are fed into the model; and it becomes a problem when past speech samples are not widely available or corrupted by noise. In this paper, fuzzy system is integrated into the LPC model using the recursive least-squares approach, where the fuzzy parameters are used to characterize the given speech samples. This transformed domain LPC model is called the FRLS-LPC model, in which its performance depends on the fuzzy rules and membership functions defined by the user. Based on the simulations, the FRLS-LPC model with this special property is shown to outperform the LPC model. Under the condition of limited past speech samples, simulation result shows that the synthetic speech produced by the FRLS-LPC model is better than those produced by the LPC model in terms of prediction error. Furthermore with corrupted past speech samples, the FRLS-LPC model is able to provide better reconstructed speech while the LPC model is failed to do so
A non-linear VAD for noisy environments
This paper deals with non-linear transformations for improving the
performance of an entropy-based voice activity detector (VAD). The idea to use
a non-linear transformation has already been applied in the field of speech
linear prediction, or linear predictive coding (LPC), based on source separation
techniques, where a score function is added to classical equations in order to
take into account the true distribution of the signal. We explore the possibility
of estimating the entropy of frames after calculating its score function, instead
of using original frames. We observe that if the signal is clean, the estimated
entropy is essentially the same; if the signal is noisy, however, the frames
transformed using the score function may give entropy that is different in
voiced frames as compared to nonvoiced ones. Experimental evidence is given
to show that this fact enables voice activity detection under high noise, where
the simple entropy method fails
Exploring Non-linear Transformations for an Entropybased Voice Activity Detector
In this paper we explore the use of non-linear transformations in
order to improve the performance of an entropy based voice activity detector
(VAD). The idea of using a non-linear transformation comes from some
previous work done in speech linear prediction (LPC) field based in source
separation techniques, where the score function was added into the classical
equations in order to take into account the real distribution of the signal. We
explore the possibility of estimating the entropy of frames after calculating its
score function, instead of using original frames. We observe that if signal is
clean, estimated entropy is essentially the same; but if signal is noisy
transformed frames (with score function) are able to give different entropy if
the frame is voiced against unvoiced ones. Experimental results show that this
fact permits to detect voice activity under high noise, where simple entropy
method fails
Voice morphing using the generative topographic mapping
In this paper we address the problem of Voice Morphing. We attempt to transform the spectral characteristics of a source speaker's speech signal so that the listener would believe that the speech was uttered by a target speaker. The voice morphing system transforms the spectral envelope as represented by a Linear Prediction model. The transformation is achieved by codebook mapping using the Generative Topographic Mapping, a non-linear, latent variable, parametrically constrained, Gaussian Mixture Model
Analysis of a Modern Voice Morphing Approach using Gaussian Mixture Models for Laryngectomees
This paper proposes a voice morphing system for people suffering from
Laryngectomy, which is the surgical removal of all or part of the larynx or the
voice box, particularly performed in cases of laryngeal cancer. A primitive
method of achieving voice morphing is by extracting the source's vocal
coefficients and then converting them into the target speaker's vocal
parameters. In this paper, we deploy Gaussian Mixture Models (GMM) for mapping
the coefficients from source to destination. However, the use of the
traditional/conventional GMM-based mapping approach results in the problem of
over-smoothening of the converted voice. Thus, we hereby propose a unique
method to perform efficient voice morphing and conversion based on GMM,which
overcomes the traditional-method effects of over-smoothening. It uses a
technique of glottal waveform separation and prediction of excitations and
hence the result shows that not only over-smoothening is eliminated but also
the transformed vocal tract parameters match with the target. Moreover, the
synthesized speech thus obtained is found to be of a sufficiently high quality.
Thus, voice morphing based on a unique GMM approach has been proposed and also
critically evaluated based on various subjective and objective evaluation
parameters. Further, an application of voice morphing for Laryngectomees which
deploys this unique approach has been recommended by this paper.Comment: 6 pages, 4 figures, 4 tables; International Journal of Computer
Applications Volume 49, Number 21, July 201
Frequency Domain Methods for Coding the Linear Predictive Residual of Speech Signals
The most frequently used speech coding paradigm is ACELP, famous because it encodes speech with high quality, while consuming a small bandwidth. ACELP performs linear prediction filtering in order to eliminate the effect of the spectral envelope from the signal. The noise-like excitation is then encoded using algebraic codebooks. The search of this codebook, however, can not be performed optimally with conventional encoders due to the correlation between their samples. Because of this, more complex algorithms are required in order to maintain the quality. Four different transformation algorithms have been implemented (DCT, DFT, Eigenvalue decomposition and Vandermonde decomposition) in order to decorrelate the samples of the innovative excitation in ACELP. These transformations have been integrated in the ACELP of the EVS codec. The transformed innovative excitation is coded using the envelope based arithmetic coder. Objective and subjective tests have been carried out to evaluate the quality of the encoding, the degree of decorrelation achieved by the transformations and the computational complexity of the algorithms
Mechanical and durability performance of lightweight concrete brick with palm oil fuel ash (POFA)
Lightweight building materials such as precast roof and wall panel has been widely used in the construction industries. This is because lightweight materials could benefits the economy and society in terms of manufacturing, transportation and handling cost. One of the most preferable lightweight material is Expanded Polystyrene (EPS). EPS consist of 98% of air and 2% of polystyrene. Therefore, EPS is very low in density which could contribute in the reduction of building materials mass. Abundance of studies has shown that EPS has significantly contribute to the reduction of brick density. EPS has been used as the aggregates replacement in concrete. However, the existing of EPS in the concrete has reduce the strength performance of the concrete. Due to this, researchers have extend their research in improvising the EPS concrete and brick strength with the addition of pozzolanic materials such as fly ash, rice husk ask, silica fume and etc [1-4]. The ability of these pozzolanic materials in enhancing the strength of brick or concrete has been proven..
Numerical simulation analysis on water jet pressure distribution at various nozzle aperture
The low velocity water jet is required by small scale Unmanned Underwater Vehicle (UUV) to control its position, either to remain statics in its position or to perform a slow and steady locomotion. However, the water jet performance is influenced by the size of nozzle aperture. By studying the pressure distribution around the nozzle area, the water jet velocity could be determined and characterized. In this studies, the ejection pressure was fixed at 23.37 Pa according to the constant actuation. Studies were conducted using ANSYS Fluent software. The results show that the water jet velocity and dynamic pressure are higher for larger nozzle aperture size at constant pressure. The total pressure and dynamic pressure had the lowest pressure drop at certain nozzle aperture size but became constant when the nozzle size was wider. This finding is useful in designing the UUV that powered by contractile water jet thruster
A Timing Model for Fast French
Models of speech timing are of both fundamental and applied interest. At the fundamental level, the prediction of time periods occupied by syllables and segments is required for general models of speech prosody and segmental structure. At the applied level, complete models of timing are an essential component of any speech synthesis system.
Previous research has established that a large number of factors influence various levels of speech timing. Statistical analysis and modelling can identify order of importance and mutual influences between such factors. In the present study, a three-tiered model was created by a modified step-wise statistical procedure. It predicts the temporal structure of French, as produced by a single, highly fluent speaker at a fast speech rate (100 phonologically balanced sentences, hand-scored in the acoustic signal). The first tier models segmental influences due to phoneme type and contextual interactions between phoneme types. The second tier models syllable-level influences of lexical vs. grammatical status of the containing word, presence of schwa and the position within the word. The third tier models utterance-final lengthening.
The complete segmental-syllabic model correlated with the original corpus of 1204 syllables at an overall r = 0.846. Residuals were normally distributed. An examination of subsets of the data set revealed some variation in the closeness of fit of the model.
The results are considered to be useful for an initial timing model, particularly in a speech synthesis context. However, further research is required to extend the model to other speech rates and to examine inter-speaker variability in greater detail
- âŠ