6 research outputs found

    Detection of stop landmarks using gaussian mixture modeling of speech spectrum

    No full text
    Perception of speech under adverse listening conditions may be improved by processing it to incorporate properties of clear speech. It needs automated detection of stop landmarks and enhancement of bursts and transition segments. A technique for accurate detection of stop landmarks in continuous speech based on parameters derived from Gaussian mixture modeling of log magnitude spectrum, a voicing onset-offset detector, and a spectral flatness measure is presented. Applying the technique on sentences from the TIMIT database resulted in burst detection rates of 98, 97, 95: 90, and 73% at temporal accuracies of 30, 20, 15, 10, and 5 ms respectively

    Automated detection of transition segments for intensity and time-scale modification for speech intelligibility enhancement

    No full text
    Spectral transition segments serve as landmarks for the perception of consonants. In "clear speech" mode adopted by speakers to improve intelligibility in difficult communication environments, transition segments are of increased duration and intensity. Modification of conversational speech to have acoustic properties of clear speech has been reported to improve its intelligibility. This paper presents an automated method for locating spectral transition segments in speech, and to produce natural quality resynthesized speech with intensity and time-scale modified spectral transition segments. The boundaries of spectral transition segments are located using an index derived from the rate of variation of energy and centroid frequency in five non-overlapping spectral bands. Time-scale modification is performed using harmonic plus noise model (HNM) based analysis-synthesis. The overall speech duration is kept unaltered by appropriately compressing the steady state segments. Transition segments are intensity scaled by 6 dB. The effectiveness of the method was evaluated by conducting listening tests on normal hearing subjects using VCV syllables as the test material
    corecore