259 research outputs found

    Voice source characterization for prosodic and spectral manipulation

    Get PDF
    The objective of this dissertation is to study and develop techniques to decompose the speech signal into its two main components: voice source and vocal tract. Our main efforts are on the glottal pulse analysis and characterization. We want to explore the utility of this model in different areas of speech processing: speech synthesis, voice conversion or emotion detection among others. Thus, we will study different techniques for prosodic and spectral manipulation. One of our requirements is that the methods should be robust enough to work with the large databases typical of speech synthesis. We use a speech production model in which the glottal flow produced by the vibrating vocal folds goes through the vocal (and nasal) tract cavities and its radiated by the lips. Removing the effect of the vocal tract from the speech signal to obtain the glottal pulse is known as inverse filtering. We use a parametric model fo the glottal pulse directly in the source-filter decomposition phase. In order to validate the accuracy of the parametrization algorithm, we designed a synthetic corpus using LF glottal parameters reported in the literature, complemented with our own results from the vowel database. The results show that our method gives satisfactory results in a wide range of glottal configurations and at different levels of SNR. Our method using the whitened residual compared favorably to this reference, achieving high quality ratings (Good-Excellent). Our full parametrized system scored lower than the other two ranking in third place, but still higher than the acceptance threshold (Fair-Good). Next we proposed two methods for prosody modification, one for each of the residual representations explained above. The first method used our full parametrization system and frame interpolation to perform the desired changes in pitch and duration. The second method used resampling on the residual waveform and a frame selection technique to generate a new sequence of frames to be synthesized. The results showed that both methods are rated similarly (Fair-Good) and that more work is needed in order to achieve quality levels similar to the reference methods. As part of this dissertation, we have studied the application of our models in three different areas: voice conversion, voice quality analysis and emotion recognition. We have included our speech production model in a reference voice conversion system, to evaluate the impact of our parametrization in this task. The results showed that the evaluators preferred our method over the original one, rating it with a higher score in the MOS scale. To study the voice quality, we recorded a small database consisting of isolated, sustained Spanish vowels in four different phonations (modal, rough, creaky and falsetto) and were later also used in our study of voice quality. Comparing the results with those reported in the literature, we found them to generally agree with previous findings. Some differences existed, but they could be attributed to the difficulties in comparing voice qualities produced by different speakers. At the same time we conducted experiments in the field of voice quality identification, with very good results. We have also evaluated the performance of an automatic emotion classifier based on GMM using glottal measures. For each emotion, we have trained an specific model using different features, comparing our parametrization to a baseline system using spectral and prosodic characteristics. The results of the test were very satisfactory, showing a relative error reduction of more than 20% with respect to the baseline system. The accuracy of the different emotions detection was also high, improving the results of previously reported works using the same database. Overall, we can conclude that the glottal source parameters extracted using our algorithm have a positive impact in the field of automatic emotion classification

    Models and analysis of vocal emissions for biomedical applications

    Get PDF
    This book of Proceedings collects the papers presented at the 3rd International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications, MAVEBA 2003, held 10-12 December 2003, Firenze, Italy. The workshop is organised every two years, and aims to stimulate contacts between specialists active in research and industrial developments, in the area of voice analysis for biomedical applications. The scope of the Workshop includes all aspects of voice modelling and analysis, ranging from fundamental research to all kinds of biomedical applications and related established and advanced technologies

    Models and Analysis of Vocal Emissions for Biomedical Applications

    Get PDF
    The MAVEBA Workshop proceedings, held on a biannual basis, collect the scientific papers presented both as oral and poster contributions, during the conference. The main subjects are: development of theoretical and mechanical models as an aid to the study of main phonatory dysfunctions, as well as the biomedical engineering methods for the analysis of voice signals and images, as a support to clinical diagnosis and classification of vocal pathologies

    The retrieval of surface parameters from satellite borne infrared radiometers for the study of climate

    Get PDF
    This thesis concerns the development and application of new infrared remote sensing techniques for measurement of climate-related variables. The nature of the climate system is discussed, and the need for global monitoring is noted, together with the suitability of satellite-based remote sensing for the task. Current applications of data from satellite-borne infrared radiometers are discussed, together with the attendant problems, particularly that of correction for the effects of the atmosphere on remotely-sensed thermal infrared temperatures. In addition, the monitoring of proxy indicators of climatic change, such as the areas of closed lakes, by remote sensing is seen as having great potential, despite the limited research to date. The problem of accurate measurement of lake areas by the necessarily coarse resolution instruments which are capable of providing the required repeat coverage is addressed. An initial case study shows that lakes of order a few hundred km2 can be measured to an accuracy of 1% with 1 km resolution data from the Advanced Very High Resolution Radiometer (AVHRR). A further study of a climatically-sensitive closed lake in Ethiopia demonstrates a qualitative relationship between the measured area cycle and climate records. It is noted that the accurate remote sensing of lake surface temperatures and tropical ocean surface temperatures, both important parameters for climate research, is difficult due to the problem of atmospheric correction. A new correction algorithm is developed which offers an improvement of a factor ~2 over conventional algorithms when applied to AVHRR data. Useful byproducts of the algorithm are accurate atmospheric transmittance and total water vapour. Further developments of the techniques devised are suggested with a view to maximising the exploitation of both new and existing global datasets in order to provide the necessary long time series of accurate measurements required for climate research

    SWIPE: A Sawtooth Waveform Inspired Pitch Estimator for Speech and Music

    Get PDF
    Se encuentra disponible en:http://www.cise.ufl.edu/~acamacho/publications/dissertation.pdfA Sawtooth Waveform Inspired Pitch Estimator (SWIPE) has been developed for processing speech and music. SWIPE is shown to outperform existing algorithms on several publicly available speech/musical-instruments databases and a disordered speech database. SWIPE estimates the pitch as the fundamental frequency of the sawtooth waveform whose spectrum best matches the spectrum of the input signal. A decaying cosine kernel provides an extension to older frequency-based, sieve-type estimation algorithms by providing smooth peaks with decaying amplitudes to correlate with the harmonics of the signal. An improvement on the algorithm is achieved by using only the first and prime harmonics, which significantly reduces subharmonic errors commonly found in other pitch estimation algorithms.UCR::Vicerrectoría de Investigación::Unidades de Investigación::Ingeniería::Centro de Investigaciones en Tecnologías de Información y Comunicación (CITIC

    Models and analysis of vocal emissions for biomedical applications

    Get PDF
    This book of Proceedings collects the papers presented at the 4th International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications, MAVEBA 2005, held 29-31 October 2005, Firenze, Italy. The workshop is organised every two years, and aims to stimulate contacts between specialists active in research and industrial developments, in the area of voice analysis for biomedical applications. The scope of the Workshop includes all aspects of voice modelling and analysis, ranging from fundamental research to all kinds of biomedical applications and related established and advanced technologies

    ORGAN MOTION AND IMAGE GUIDANCE IN RADIATION THERAPY

    Get PDF
    Organ motion and inaccurate patient positioning may compromise radiation therapy outcome. With the aid of image guidance, it is possible to allow for a more accurate organ motion and motion control study, which could lead to the reduction of irradiated healthy tissues and possible dose escalation to the target volume to achieve better treatment results. The studies on the organ motion and image guidance were divided into the following four sections. The first, the interfractional setup uncertainties from day-to-day treatment and intrafractional internal organ motion within the daily treatment from five different anatomic sites were studied with Helical TomoTherapy unit. The pre-treatment mega voltage computed tomography (MVCT) provided the real-time tumor and organ shift coordinates, and can be used to improve the accuracy of patient positioning. The interfractional system errors and random errors were analyzed and the suggested margins for HN, brain, prostate, abdomen and lung were derived. The second, lung stereotactic body radiation therapy using the MIDCO BodyLoc whole body stereotactic localizer combined with TomoTherapy MVCT image guidance were investigated for the possible target and organ motion reduction. The comparison of 3D displacement with and without BodyLoc immobilization showed that, suppression of internal organ motion was improved by using BodyLoc in this study. The third, respiration related tumor motion was accurately studied with the four dimensional computed tomography (4DCT). Deformable registration between different breathing phases was performed to estimate the motion trajectory for lung tumor. Optimization is performed by minimizing the mean squared difference in intensity, and is implemented with a multi-resolution, gradient descent procedure. The fourth, lung tumor mobility and dosimetric benefits were compared with different PTV obtained from 3DCT and 4DCT. The results illustrated that the PTV3D not only included excess normal tissues but also might result in missed target tissue. The normal tissue complication probability (NTCP) from 4D plan was statistically significant smaller than 3D plan for both ipsilateral lung and heart
    • …
    corecore