49 research outputs found

    Uses of the pitch-scaled harmonic filter in speech processing

    No full text
    The pitch-scaled harmonic filter (PSHF) is a technique for decomposing speech signals into their periodic and aperiodic constituents, during periods of phonation. In this paper, the use of the PSHF for speech analysis and processing tasks is described. The periodic component can be used as an estimate of the part attributable to voicing, and the aperiodic component can act as an estimate of that attributable to turbulence noise, i.e., from fricative, aspiration and plosive sources. Here we present the algorithm for separating the periodic and aperiodic components from the pitch-scaled Fourier transform of a short section of speech, and show how to derive signals suitable for time-series analysis and for spectral analysis. These components can then be processed in a manner appropriate to their source type, for instance, extracting zeros as well as poles from the aperiodic spectral envelope. A summary of tests on synthetic speech-like signals demonstrates the robustness of the PSHF's performance to perturbations from additive noise, jitter and shimmer. Examples are given of speech analysed in various ways: power spectrum, short-time power and short-time harmonics-to-noise ratio, linear prediction and mel-frequency cepstral coefficients. Besides being valuable for speech production and perception studies, the latter two analyses show potential for incorporation into speech coding and speech recognition systems. Further uses of the PSHF are revealing normally-obscured acoustic features, exploring interactions of turbulence-noise sources with voicing, and pre-processing speech to enhance subsequent operations

    Pitch-scaled estimation of simultaneous voiced and turbulence-noise components in speech

    Full text link

    Modelling the Noise Source in Voiced Fricatives

    No full text
    The noise source in voiced fricatives has not received as much attention as that of unvoiced fricatives, in part because the voiced case, with two sound sources, is undoubtedly more complicated, and the unvoiced case cannot be considered to be solved. In this paper results from previous studies are considered together with data from three subjects to correct this imbalance. The classic model of voiced fricatives includes two sources of sound: a periodic volume-velocity source located at the glottal end of the tract, and a noise source located in the vicinity of the primary tract constriction. The amplitude of the noise source has long been assumed to be modulated by the voicing, although this effect is sometimes neglected. The noise source before modulation is presumed to be similar to that used in models of unvoiced fricatives: it consists of white or broadband noise, and its strength depends primarily on the pressure drop across the constriction. This latter characteristic means that in general it is weaker than the noise source in unvoiced fricatives, that is, it produces less noise because the pressure drop across it is lower. This has been attributed to the need to maintain a significant transglottal pressure drop in order to maintain voicing, which therefore reduces the pressure differential that can be maintained across the constriction. It has been noted, however, that different speakers use different strategies with regard to glottis-constriction coordination, and so the picture is somewhat more complex. Apart from coordination issues, characterization of the noise source is more complex in certain other respects. First, the geometry of the vocal tract downstream of the constriction has a significant effect on the noise source spectrum, in particular by offering an obstacle to the emerging jet at which noise is generated (Shadle, 1990). Within a voiced-voiceless pair, one could assume that the geometry and therefore the parameters controlling the noise source spectrum are the same. Some work has been done on characterizing the dependence of spectral amplitude and spectral tilt on pressure drop and constriction area for [f,s,]. Second, while variation in the flowrate through a constant-area constriction can be predicted to change spectral amplitude and tilt of the noise source, it is not clear how such modulation would be timed with respect to glottal vibration. Acoustic variations generated at the glottis will travel at the speed of sound to the constriction; hydrodynamic variations, which may be of similar strength, will convect at a slower rate that depends on vocal tract area and is therefore much more difficult to predict. There are therefore two distinct problems in characterizing the noise source in voiced fricatives: understanding the nature of the glottis-constriction coordination, and describing the effect of the modulation imposed by voicing. We focus on the latter in this paper by describing the results of an Fo -synchronous analysis of a voiced fricative, and comparing to results of mechanical model studies

    An articulatory-acoustic-aerodynamic analysis of [s] in VCV sequences

    No full text
    Previous studies of the effect of vowel context on fricatives show seeming contradictions in the case of /s/: acoustic analysis shows the greatest context effect, while aerodynamic analysis shows relatively little effect, for the same subject. In this study, aerodynamic, acoustic, and articulatory data for the same subject producing /s, z/ in a variety of contexts were compared systematically. The strong acoustic effect of the /u - u/ context exists with /z/ as well as /s/, and appears to arise from a whistle-like source mechanism caused by lip rounding; the main tongue constriction does not appear to be immune to vowel context. Our interpretation of aerodynamic data as constrictions in series can be generalized to include the influence of lip rounding, thus: for this speaker and for these speaker-like sequences, the area of the vocal tract constriction for /s/ is independent of the vowel context but the overall aerodynamic effect does vary with lip rounding. Our aerodynamic and acoustic data seem to be consistent; both support the view that some rounding extends into the /s/ fricative

    Pitch-synchronous Decomposition of Mixed-source Speech Signals

    No full text
    As part of a study of turbulence-noise sources in speech production, a method has been developed for decomposing an acoustic signal into harmonic (voiced) and anharmonic (unvoiced) components, based on a hoarseness metric (Muta et al., 1988, J. Acoust. Soc. Am. 84, pp.1292-1301). Their pitch-synchronous harmonic filter (PSHF) has been extended (to EPSHF) to yield time histories of both harmonic and anharmonic components. Our corpus includes many examples of turbulence noise, including aspiration, voiced and unvoiced fricatives, and a variety of voice qualities (e.g. breathy, whispered). The EPSHF algorithm plausibly decomposed breathy vowels, but the harmonic component of voiced fricatives still contained significant noise, similar in shape to (though weaker than) the ensemble-averaged anharmonic spectrum. In general the algorithm performed best on sustained sounds. Tracking errors at rapid transitions, and due to jitter and shimmer, were spuriously attributed to the anharmonic component. However, the extracted anharmonic component clearly exhibited modulation in voiced fricatives. While such modulation has been previously reported (and also in hoarse voice), it was verified by tests on synthetic signals, where constant and modulated noise signals were extracted successfully. The results suggest that the EPSHF will continue to enable exploration of the interaction of phonation and turbulence noise

    Aero-acoustic modelling of voiced and unvoiced fricatives based on MRI data

    No full text
    We would like to develop a more realistic production model of unvoiced speech sounds, namely fricatives, plosives and aspiration noise. All three involve turbulence noise generation, with place-dependent source characteristics that vary with time (rapidly, in plosives). In this study, we aimed to produce, using an aero-acoustic model of the vocal-tract filter and source, voiced as well as unvoiced fricatives that provide a good match to analyses of speech recordings. The vocal-tract transfer function (VTTF) was computed by the vocal-tract acoustics program, VOAC [Davies, McGowan and Shadle. Vocal Fold Physiology: Frontiers in Basic Science, ed. Titze, Singular Pub., CA, 93-142, 1993], using geometrical data, in the form of cross-sectional area and hydraulic radius functions, along the length of the tract. VOAC incorporates the effects of net flow into the transmission of plane waves through a tubular representation of the tract, and relaxes assumptions of rrigid walls and isentropic propagation. The geometry functions were derived from multiple-slice, dynamic, magnetic resonance images (MRI) [Mohammad. PhD thesis, Dept. ECS, U. Southampton, UK, 1999; Shadle, Mohammad, Carter, and Jackson. Proc. ICPhS, S.F. CA, 1:623-626, 1999], using a method of converting from the pixel outlines that was improved over earlier efforts on vowels. A coloured noise source signal was combined with the VTTF and radiation characteristic to synthesize the unvoiced fricative [s]. For its voiced counterpart [z], many researchers have noted that the noise source appears to be modulated by voicing. Furthermore, the phase of the modulation has been shown to be perceptually significant. Based on our analysis [Jackson and Shadle. Proc. IEEE-ICASSP, Istanbul, 2000.] of recordings by the same subject, the frication source of [z] was varied periodically according to fluctuations in the flow velocity at the constriction exit, and the modulation phase was governed by the convection time for the flow perturbation to travel from the constriction to the obstacle. The synthesized fricatives were compared to the speech recordings in a simple listening test, and comparisons of the predicted and measured time series suggested that the model, which brings together physical, aerodynamic and acoustic information, can replicate characteristics of real speech, such as the modulation in voiced fricatives [http://www.isis.ecs.soton.ac.uk/research/ projects/nephthys/]
    corecore