4,588 research outputs found

    Effects of Palatal Expansion on Speech Production

    Get PDF
    Introduction: Rapid palatal expanders (RPEs) are a commonly used orthodontic adjunct for the treatment of posterior crossbites. RPEs are cemented to bilateral posterior teeth across the palate and thus may interfere with proper tongue movement and linguopalatal contact. The purpose of this study was to identify what specific role RPEs have on speech sound production for the child and early adolescent orthodontic patient. Materials and Methods: RPEs were treatment planned for patients seeking orthodontics at Marquette University. Speech recordings were made using a phonetically balanced reading passage (“The Caterpillar”) at 3 time points: 1) before RPE placement; 2) immediately after cementation; and 3) 10-14 days post appliance delivery. Measures of vocal tract resonance (formant center frequencies) were obtained for vowels and measures of noise distribution (spectral moments) were obtained for consonants. Two-way repeated measures (ANOVA) was used along with post-hoc tests for statistical analysis. Results: For the vowel /i/, the first formant increased and the second formant decreased indicating a more inferior and posterior tongue position. For /e/, only the second formant decreased resulting in a more posterior tongue position. The formants did not return to baseline within the two-week study period. For the fricatives /s/, //, /t/, and /k/, a significant shift from high to low frequencies indicated distortion upon appliance placement. Of these, only /t/ fully returned to baseline during the study period. Conclusion: Numerous phonemes were distorted upon RPE placement which indicated altered speech sound production. For most phonemes, it takes longer than two weeks for speech to return to baseline, if at all. Clinically, the results of this study will help with pre-treatment and interdisciplinary counseling for orthodontic patients receiving palatal expanders

    Interaction between source and filter in vowel identification

    Get PDF
    Speech sounds can be modeled as a product of two components: the source and the filter. The effects of filter manipulations on speech perception have been studied extensively, while the effects of source manipulations have been largely overlooked. This study was an attempt to assess the impact of source manipulations on vowel identification. To this end, two source manipulations were conducted prior to filtering. First, several harmonics of the source sawtooth wave that were located near formant peaks were mistuned, either towards or away from the peaks. Mistuning towards formant peaks was expected to facilitate vowel identification by helping to convey the position of the formant more clearly; mistuning away was expected to hinder performance. Consistent with this hypothesis, a significant effect of mistuning was observed. However, follow up analyses revealed that this manipulation only had an effect in conditions where harmonics are mistuned away from formant peaks by a large degree (5%). The second manipulation consisted of adding noise to the source signal to “fill in” the acoustic spectrum. Because the addition of noise occurred before filtering, the spectral shape of the noise component was identical to that of the harmonic portion of the tone, and was expected to help convey formant peaks, especially when they were not well conveyed by harmonic information. The results reveal that the addition of noise had no effect on vowel identification. Possible stimulus based explanations for the failure to observe some of the hypothesized effects are discussed

    Locating Discontinuities in Synthetic Speech using a Perceptually Orientated Approach

    Get PDF
    A significant problem with unit selection based speech synthesis is the listener perception of sound discontinuities at which the speech waveforms are joined. This work demonstrates the application of three different perceptually motivated timefrequency representations and associated measures to the identification of such discontinuities

    Robust equalization of multichannel acoustic systems

    Get PDF
    In most real-world acoustical scenarios, speech signals captured by distant microphones from a source are reverberated due to multipath propagation, and the reverberation may impair speech intelligibility. Speech dereverberation can be achieved by equalizing the channels from the source to microphones. Equalization systems can be computed using estimates of multichannel acoustic impulse responses. However, the estimates obtained from system identification always include errors; the fact that an equalization system is able to equalize the estimated multichannel acoustic system does not mean that it is able to equalize the true system. The objective of this thesis is to propose and investigate robust equalization methods for multichannel acoustic systems in the presence of system identification errors. Equalization systems can be computed using the multiple-input/output inverse theorem or multichannel least-squares method. However, equalization systems obtained from these methods are very sensitive to system identification errors. A study of the multichannel least-squares method with respect to two classes of characteristic channel zeros is conducted. Accordingly, a relaxed multichannel least- squares method is proposed. Channel shortening in connection with the multiple- input/output inverse theorem and the relaxed multichannel least-squares method is discussed. Two algorithms taking into account the system identification errors are developed. Firstly, an optimally-stopped weighted conjugate gradient algorithm is proposed. A conjugate gradient iterative method is employed to compute the equalization system. The iteration process is stopped optimally with respect to system identification errors. Secondly, a system-identification-error-robust equalization method exploring the use of error models is presented, which incorporates system identification error models in the weighted multichannel least-squares formulation

    Perceptual thresholds for the effects of room modes as a function of modal decay

    Get PDF
    Room modes cause audible artefacts in listening environments. Modal control approaches have emerged in scientific literature over the years and, often, their performance is measured by criteria that may be perceptually unfounded. Previous research has shown modal decay as a key perceptual factor in detecting modal effects. In this work, perceptual thresholds for the effects of modes as a function of modal decay have been measured in the region between 32Hz and 250Hz. A test methodology has been developed to include modal interaction and temporal masking from musical events, which are important aspects in recreating an ecologically valid test regime. This method has been deployed in addition to artificial test stimuli traditionally used in psychometric studies, which provide unmasked, absolute thresholds. For artificial stimuli, thresholds decrease monotonically from 0.9 seconds at 32 Hz to 0.17 seconds at 200 Hz, with a knee at 63 Hz. For music stimuli, thresholds decrease monotonically from 0.51 seconds at 63 Hz to 0.12 seconds at 250 Hz. Perceptual thresholds are shown to be dependent on frequency and to a much lesser extent on level. Results presented here define absolute and practical thresholds, which are useful as perceptually relevant optimization targets for modal control methods

    Understanding the role of phase function in translucent appearance

    Get PDF
    Multiple scattering contributes critically to the characteristic translucent appearance of food, liquids, skin, and crystals; but little is known about how it is perceived by human observers. This article explores the perception of translucency by studying the image effects of variations in one factor of multiple scattering: the phase function. We consider an expanded space of phase functions created by linear combinations of Henyey-Greenstein and von Mises-Fisher lobes, and we study this physical parameter space using computational data analysis and psychophysics. Our study identifies a two-dimensional embedding of the physical scattering parameters in a perceptually meaningful appearance space. Through our analysis of this space, we find uniform parameterizations of its two axes by analytical expressions of moments of the phase function, and provide an intuitive characterization of the visual effects that can be achieved at different parts of it. We show that our expansion of the space of phase functions enlarges the range of achievable translucent appearance compared to traditional single-parameter phase function models. Our findings highlight the important role phase function can have in controlling translucent appearance, and provide tools for manipulating its effect in material design applications.National Institutes of Health (U.S.) (Award R01-EY019262-02)National Institutes of Health (U.S.) (Award R21-EY019741-02

    Spectral discontinuity in concatenative speech synthesis – perception, join costs and feature transformations

    Get PDF
    This thesis explores the problem of determining an objective measure to represent human perception of spectral discontinuity in concatenative speech synthesis. Such measures are used as join costs to quantify the compatibility of speech units for concatenation in unit selection synthesis. No previous study has reported a spectral measure that satisfactorily correlates with human perception of discontinuity. An analysis of the limitations of existing measures and our understanding of the human auditory system were used to guide the strategies adopted to advance a solution to this problem. A listening experiment was conducted using a database of concatenated speech with results indicating the perceived continuity of each concatenation. The results of this experiment were used to correlate proposed measures of spectral continuity with the perceptual results. A number of standard speech parametrisations and distance measures were tested as measures of spectral continuity and analysed to identify their limitations. Time-frequency resolution was found to limit the performance of standard speech parametrisations.As a solution to this problem, measures of continuity based on the wavelet transform were proposed and tested, as wavelets offer superior time-frequency resolution to standard spectral measures. A further limitation of standard speech parametrisations is that they are typically computed from the magnitude spectrum. However, the auditory system combines information relating to the magnitude spectrum, phase spectrum and spectral dynamics. The potential of phase and spectral dynamics as measures of spectral continuity were investigated. One widely adopted approach to detecting discontinuities is to compute the Euclidean distance between feature vectors about the join in concatenated speech. The detection of an auditory event, such as the detection of a discontinuity, involves processing high up the auditory pathway in the central auditory system. The basic Euclidean distance cannot model such behaviour. A study was conducted to investigate feature transformations with sufficient processing complexity to mimic high level auditory processing. Neural networks and principal component analysis were investigated as feature transformations. Wavelet based measures were found to outperform all measures of continuity based on standard speech parametrisations. Phase and spectral dynamics based measures were found to correlate with human perception of discontinuity in the test database, although neither measure was found to contribute a significant increase in performance when combined with standard measures of continuity. Neural network feature transformations were found to significantly outperform all other measures tested in this study, producing correlations with perceptual results in excess of 90%
    • …
    corecore