259 research outputs found
Voice source characterization for prosodic and spectral manipulation
The objective of this dissertation is to study and develop techniques to decompose the speech signal into its two main
components: voice source and vocal tract. Our main efforts are on the glottal pulse analysis and characterization. We want to
explore the utility of this model in different areas of speech processing: speech synthesis, voice conversion or emotion detection
among others. Thus, we will study different techniques for prosodic and spectral manipulation. One of our requirements is that
the methods should be robust enough to work with the large databases typical of speech synthesis. We use a speech production
model in which the glottal flow produced by the vibrating vocal folds goes through the vocal (and nasal) tract cavities and its
radiated by the lips. Removing the effect of the vocal tract from the speech signal to obtain the glottal pulse is known as inverse
filtering. We use a parametric model fo the glottal pulse directly in the source-filter decomposition phase.
In order to validate the accuracy of the parametrization algorithm, we designed a synthetic corpus using LF glottal parameters
reported in the literature, complemented with our own results from the vowel database. The results show that our method gives
satisfactory results in a wide range of glottal configurations and at different levels of SNR. Our method using the whitened
residual compared favorably to this reference, achieving high quality ratings (Good-Excellent). Our full parametrized system
scored lower than the other two ranking in third place, but still higher than the acceptance threshold (Fair-Good).
Next we proposed two methods for prosody modification, one for each of the residual representations explained above. The first
method used our full parametrization system and frame interpolation to perform the desired changes in pitch and duration. The
second method used resampling on the residual waveform and a frame selection technique to generate a new sequence of
frames to be synthesized. The results showed that both methods are rated similarly (Fair-Good) and that more work is needed in
order to achieve quality levels similar to the reference methods.
As part of this dissertation, we have studied the application of our models in three different areas: voice conversion, voice quality
analysis and emotion recognition. We have included our speech production model in a reference voice conversion system, to
evaluate the impact of our parametrization in this task. The results showed that the evaluators preferred our method over the
original one, rating it with a higher score in the MOS scale. To study the voice quality, we recorded a small database consisting of
isolated, sustained Spanish vowels in four different phonations (modal, rough, creaky and falsetto) and were later also used in
our study of voice quality. Comparing the results with those reported in the literature, we found them to generally agree with
previous findings. Some differences existed, but they could be attributed to the difficulties in comparing voice qualities produced
by different speakers. At the same time we conducted experiments in the field of voice quality identification, with very good
results. We have also evaluated the performance of an automatic emotion classifier based on GMM using glottal measures. For
each emotion, we have trained an specific model using different features, comparing our parametrization to a baseline system
using spectral and prosodic characteristics. The results of the test were very satisfactory, showing a relative error reduction of
more than 20% with respect to the baseline system. The accuracy of the different emotions detection was also high, improving
the results of previously reported works using the same database. Overall, we can conclude that the glottal source parameters
extracted using our algorithm have a positive impact in the field of automatic emotion classification
Models and analysis of vocal emissions for biomedical applications
This book of Proceedings collects the papers presented at the 3rd International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications, MAVEBA 2003, held 10-12 December 2003, Firenze, Italy. The workshop is organised every two years, and aims to stimulate contacts between specialists active in research and industrial developments, in the area of voice analysis for biomedical applications. The scope of the Workshop includes all aspects of voice modelling and analysis, ranging from fundamental research to all kinds of biomedical applications and related established and advanced technologies
Models and Analysis of Vocal Emissions for Biomedical Applications
The MAVEBA Workshop proceedings, held on a biannual basis, collect the scientific papers presented both as oral and poster contributions, during the conference. The main subjects are: development of theoretical and mechanical models as an aid to the study of main phonatory dysfunctions, as well as the biomedical engineering methods for the analysis of voice signals and images, as a support to clinical diagnosis and classification of vocal pathologies
The retrieval of surface parameters from satellite borne infrared radiometers for the study of climate
This thesis concerns the development and application of new infrared remote sensing techniques for measurement of climate-related variables. The nature of the climate system is discussed, and the need for global monitoring is noted, together with the suitability of satellite-based remote sensing for the task. Current applications of data from satellite-borne infrared radiometers are discussed, together with the attendant problems, particularly that of correction for the effects of the atmosphere on remotely-sensed thermal infrared temperatures. In addition, the monitoring of proxy indicators of climatic change, such as the areas of closed lakes, by remote sensing is seen as having great potential, despite the limited research to date. The problem of accurate measurement of lake areas by the necessarily coarse resolution instruments which are capable of providing the required repeat coverage is addressed. An initial case study shows that lakes of order a few hundred km2 can be measured to an accuracy of 1% with 1 km resolution data from the Advanced Very High Resolution Radiometer (AVHRR). A further study of a climatically-sensitive closed lake in Ethiopia demonstrates a qualitative relationship between the measured area cycle and climate records. It is noted that the accurate remote sensing of lake surface temperatures and tropical ocean surface temperatures, both important parameters for climate research, is difficult due to the problem of atmospheric correction. A new correction algorithm is developed which offers an improvement of a factor ~2 over conventional algorithms when applied to AVHRR data. Useful byproducts of the algorithm are accurate atmospheric transmittance and total water vapour. Further developments of the techniques devised are suggested with a view to maximising the exploitation of both new and existing global datasets in order to provide the necessary long time series of accurate measurements required for climate research
SWIPE: A Sawtooth Waveform Inspired Pitch Estimator for Speech and Music
Se encuentra disponible en:http://www.cise.ufl.edu/~acamacho/publications/dissertation.pdfA Sawtooth Waveform Inspired Pitch Estimator (SWIPE) has been developed for processing speech and music. SWIPE is shown to outperform existing algorithms on several publicly available speech/musical-instruments databases and a disordered speech database. SWIPE estimates the pitch as the fundamental frequency of the sawtooth waveform whose spectrum best matches the spectrum of the input signal. A decaying cosine kernel provides an extension to older frequency-based, sieve-type estimation algorithms by providing smooth peaks with decaying amplitudes to correlate with the harmonics of the signal. An improvement on the algorithm is achieved by using only the first and prime harmonics, which significantly reduces subharmonic errors commonly found in other pitch estimation algorithms.UCR::VicerrectorÃa de Investigación::Unidades de Investigación::IngenierÃa::Centro de Investigaciones en TecnologÃas de Información y Comunicación (CITIC
Models and analysis of vocal emissions for biomedical applications
This book of Proceedings collects the papers presented at the 4th International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications, MAVEBA 2005, held 29-31 October 2005, Firenze, Italy. The workshop is organised every two years, and aims to stimulate contacts between specialists active in research and industrial developments, in the area of voice analysis for biomedical applications. The scope of the Workshop includes all aspects of voice modelling and analysis, ranging from fundamental research to all kinds of biomedical applications and related established and advanced technologies
ORGAN MOTION AND IMAGE GUIDANCE IN RADIATION THERAPY
Organ motion and inaccurate patient positioning may compromise radiation therapy outcome. With the aid of image guidance, it is possible to allow for a more accurate organ motion and motion control study, which could lead to the reduction of irradiated healthy tissues and possible dose escalation to the target volume to achieve better treatment results. The studies on the organ motion and image guidance were divided into the following four sections. The first, the interfractional setup uncertainties from day-to-day treatment and intrafractional internal organ motion within the daily treatment from five different anatomic sites were studied with Helical TomoTherapy unit. The pre-treatment mega voltage computed tomography (MVCT) provided the real-time tumor and organ shift coordinates, and can be used to improve the accuracy of patient positioning. The interfractional system errors and random errors were analyzed and the suggested margins for HN, brain, prostate, abdomen and lung were derived. The second, lung stereotactic body radiation therapy using the MIDCO BodyLoc whole body stereotactic localizer combined with TomoTherapy MVCT image guidance were investigated for the possible target and organ motion reduction. The comparison of 3D displacement with and without BodyLoc immobilization showed that, suppression of internal organ motion was improved by using BodyLoc in this study. The third, respiration related tumor motion was accurately studied with the four dimensional computed tomography (4DCT). Deformable registration between different breathing phases was performed to estimate the motion trajectory for lung tumor. Optimization is performed by minimizing the mean squared difference in intensity, and is implemented with a multi-resolution, gradient descent procedure. The fourth, lung tumor mobility and dosimetric benefits were compared with different PTV obtained from 3DCT and 4DCT. The results illustrated that the PTV3D not only included excess normal tissues but also might result in missed target tissue. The normal tissue complication probability (NTCP) from 4D plan was statistically significant smaller than 3D plan for both ipsilateral lung and heart
- …