1,391 research outputs found

    Uses of the pitch-scaled harmonic filter in speech processing

    No full text
    The pitch-scaled harmonic filter (PSHF) is a technique for decomposing speech signals into their periodic and aperiodic constituents, during periods of phonation. In this paper, the use of the PSHF for speech analysis and processing tasks is described. The periodic component can be used as an estimate of the part attributable to voicing, and the aperiodic component can act as an estimate of that attributable to turbulence noise, i.e., from fricative, aspiration and plosive sources. Here we present the algorithm for separating the periodic and aperiodic components from the pitch-scaled Fourier transform of a short section of speech, and show how to derive signals suitable for time-series analysis and for spectral analysis. These components can then be processed in a manner appropriate to their source type, for instance, extracting zeros as well as poles from the aperiodic spectral envelope. A summary of tests on synthetic speech-like signals demonstrates the robustness of the PSHF's performance to perturbations from additive noise, jitter and shimmer. Examples are given of speech analysed in various ways: power spectrum, short-time power and short-time harmonics-to-noise ratio, linear prediction and mel-frequency cepstral coefficients. Besides being valuable for speech production and perception studies, the latter two analyses show potential for incorporation into speech coding and speech recognition systems. Further uses of the PSHF are revealing normally-obscured acoustic features, exploring interactions of turbulence-noise sources with voicing, and pre-processing speech to enhance subsequent operations

    Learning to Produce Speech with an Altered Vocal Tract: The Role of Auditory Feedback

    Get PDF
    Modifying the vocal tract alters a speaker’s previously learned acoustic–articulatory relationship. This study investigated the contribution of auditory feedback to the process of adapting to vocal-tract modifications. Subjects said the word /tɑs/ while wearing a dental prosthesis that extended the length of their maxillary incisor teeth. The prosthesis affected /s/ productions and the subjects were asked to learn to produce ‘‘normal’’ /s/’s. They alternately received normal auditory feedback and noise that masked their natural feedback during productions. Acoustic analysis of the speakers’ /s/ productions showed that the distribution of energy across the spectra moved toward that of normal, unperturbed production with increased experience with the prosthesis. However, the acoustic analysis did not show any significant differences in learning dependent on auditory feedback. By contrast, when naive listeners were asked to rate the quality of the speakers’ utterances, productions made when auditory feedback was available were evaluated to be closer to the subjects’ normal productions than when feedback was masked. The perceptual analysis showed that speakers were able to use auditory information to partially compensate for the vocal-tract modification. Furthermore, utterances produced during the masked conditions also improved over a session, demonstrating that the compensatory articulations were learned and available after auditory feedback was removed

    Speech Communication

    Get PDF
    Contains research objectives and reports on three research projects.U.S. Air Force (Air Force Cambridge Research Center, Air Research and Development Command) under Contract AF19(604)-6102National Science Foundatio

    An acoustic-phonetic approach in automatic Arabic speech recognition

    Get PDF
    In a large vocabulary speech recognition system the broad phonetic classification technique is used instead of detailed phonetic analysis to overcome the variability in the acoustic realisation of utterances. The broad phonetic description of a word is used as a means of lexical access, where the lexicon is structured into sets of words sharing the same broad phonetic labelling. This approach has been applied to a large vocabulary isolated word Arabic speech recognition system. Statistical studies have been carried out on 10,000 Arabic words (converted to phonemic form) involving different combinations of broad phonetic classes. Some particular features of the Arabic language have been exploited. The results show that vowels represent about 43% of the total number of phonemes. They also show that about 38% of the words can uniquely be represented at this level by using eight broad phonetic classes. When introducing detailed vowel identification the percentage of uniquely specified words rises to 83%. These results suggest that a fully detailed phonetic analysis of the speech signal is perhaps unnecessary. In the adopted word recognition model, the consonants are classified into four broad phonetic classes, while the vowels are described by their phonemic form. A set of 100 words uttered by several speakers has been used to test the performance of the implemented approach. In the implemented recognition model, three procedures have been developed, namely voiced-unvoiced-silence segmentation, vowel detection and identification, and automatic spectral transition detection between phonemes within a word. The accuracy of both the V-UV-S and vowel recognition procedures is almost perfect. A broad phonetic segmentation procedure has been implemented, which exploits information from the above mentioned three procedures. Simple phonological constraints have been used to improve the accuracy of the segmentation process. The resultant sequence of labels are used for lexical access to retrieve the word or a small set of words sharing the same broad phonetic labelling. For the case of having more than one word-candidates, a verification procedure is used to choose the most likely one

    Speech Communication

    Get PDF
    Contains reports on five research projects.C.J. Lebel FellowshipNational Institutes of Health (Grant 5 T32 NS07040)National Institutes of Health (Grant 5 R01 NS04332)National Science Foundation (Grant 1ST 80-17599)U.S. Navy - Naval Electronic Systems Command Contract (N00039-85-C-0254)U.S. Navy - Naval Electronic Systems Command Contract (N00039-85-C-0341)U.S. Navy - Naval Electronic Systems Command Contract (N00039-85-C-0290

    The fractal characterisation of phonetic elements of human speech

    Get PDF
    The use of fractal techniques and fractal dimensions as a means of speech characterisation and speech recognition is a relatively new concept and as such very few papers have addressed the possibilities of its use and associated advantages and disadvantages over conventional methods. This thesis demonstrates that fractal techniques can effectively be used as a method of broad recognition of phonetic elements in human speech. Three distinct fractal methods have been used to associate fractal dimensions with speech: the Box Counting method, the Divider or Richardson method and the Minkowski-Bouligand disc method. Speech has been recorded by myself and another male and female speaker to provide a database of phonetic recordings that could be experimented on. The three fractal techniques were emulated by means of software programs written in a high level language

    Gestural Phasing as an Explanation for Vowel Devoicing in Turkish

    Get PDF
    Recent work in phonetics has suggested that vowel devoicing or schwa deletion, observed in various languages, is a gradient process. This study provides evidence for the previously undocumented process of high vowel devoicing in Turkish. The prosodic and segmental factors rate, stress, preceding environment, following environment, vowel type, and syllable type were investigated. The factors are described, evaluated and ranked according to the results of a multiple regression (Variable Rule) analysis. Where applicable, results are contrasted with findings for i.e., Japanese and Korean. Furthermore, VOT (voice onset time) measurements of the three voiceless stops [p t k] were obtained, as well as duration measurements of vowels in open and closed syllables where vowels are significantly longer in Turkish. Generally, most devoicing occurred when the vowel was shorter (i.e., as a result of faster rates of speech, lack of stress, in closed syllables, ect.). These findings accord well with predictions made by a model assuming gradual gestural overlap of adjacent consonantal and vocalic gestures. It will be attempted to explain the findings with differences in phasing between articulatory gestures

    The Fricative Sound Source Spectrum Derived From a Vocal Tract Analog.

    Get PDF
    The applications of speech synthesis for computer voice response and speech analysis present the need for highly intelligible and natural synthesized speech. In order to improve the synthesis of fricative and related sounds, the use of simple models for the source spectrum of fricative sounds is investigated. The investigation is based on the use of a vocal tract analog and experimental measurements. Measurements of the sound pressure spectra of fricative consonants are made. Simple sound pressure measurements and measurements based on the technique for measuring intensity are utilized. The fricatives studied are /f/, /th/, /s/, /sh/, and /h/. Fricative sound source spectra are determined by applying an inverse filter to the measured fricative sound pressure spectra. The inverse filtering function is derived from a vocal tract analog. The resulting fricative source spectra are fit to a truncated Fourier series. The results show that structure is evident in all the source spectra except /f/. The presence of structure was related to turbulent flows. The structure of turbulent flows is relevant since fricative sound production is induced by turbulence. The structure of turbulent flows with Reynolds number near the critical Reynolds number is dependent on the initial conditions, the boundary conditions, and on the nonlinearity of the Navier Stokes equations. These three factors are tied together by bifurcation theory which is used to explain the structure present in the fricative source spectra. Also, the possibility that the structure is a by-product of the vocal tract analog is allowed. In any case, the structure evident in the source spectra indicates the use of simple models for the source spectra of fricative sounds is in error or the vocal tract analog requires revision. The fricative source spectra determined in this study can be used in future speech synthesizers. Also, the same procedure employed in this study can be used for speech analysis of speech impaired subjects
    • …
    corecore