16 research outputs found

    Physiological and psychoacoustical correlates of perceiving natural and modified speech

    Get PDF

    Psychoacoustical evaluation of the pitch-synchronous overlap-and-add speech-waveform manipulation technique using single-format stimuli

    Get PDF
    This article presents two experiments dealing with a psychoacoustical evaluation of the pitch synchronous overlap-and-add (PSOLA) technique. This technique has been developed for modification of duration and fundamental frequency of speech and is based on simple waveform manipulations. Both experiments were aimed at deriving the sensitivity of the auditory system to the basic distortions introduced by PSOLA. In experiment I, manipulation of fundamental frequency was applied to synthetic single-formant stimuli under minimal stimulus uncertainty, level roving, and formant-frequency roving. In experiment II, the influence of the positioning of the so-called "pitch markers" was studied. Depending on the formant and fundamental frequency, experimental data could be described reasonably well by either a spectral intensity-discrimination model or a temporal model based on detecting changes in modulation of the output of a single auditory filter. Generally, the results were in line with psychoacoustical theory on the auditory processing of resolved and unresolved harmonics

    Calibration of the TDT equipment and the Beyer DT990 headphones

    Get PDF

    Psychoacoustical evaluation of the pitch-synchronous overlap-and-add speech-waveform manipulation technique using single-format stimuli

    Full text link
    This article presents two experiments dealing with a psychoacoustical evaluation of the pitch synchronous overlap-and-add (PSOLA) technique. This technique has been developed for modification of duration and fundamental frequency of speech and is based on simple waveform manipulations. Both experiments were aimed at deriving the sensitivity of the auditory system to the basic distortions introduced by PSOLA. In experiment I, manipulation of fundamental frequency was applied to synthetic single-formant stimuli under minimal stimulus uncertainty, level roving, and formant-frequency roving. In experiment II, the influence of the positioning of the so-called "pitch markers" was studied. Depending on the formant and fundamental frequency, experimental data could be described reasonably well by either a spectral intensity-discrimination model or a temporal model based on detecting changes in modulation of the output of a single auditory filter. Generally, the results were in line with psychoacoustical theory on the auditory processing of resolved and unresolved harmonics

    Psychoacoustical evaluation of PSOLA. II. Double-formant stimuli and the role of vocal perturbation

    Full text link
    This article presents the results of listening experiments and psychoacoustical modeling aimed at evaluating the pitch synchronous overlap-and-add (PSOLA) technique. This technique can be used for simultaneous modification of pitch and duration of natural speech, using simple and efficient time-domain operations on the speech waveform. The first set of experiments tested the ability of subjects to discriminate double-formant stimuli, modified in fundamental frequency using PSOLA, from unmodified stimuli. Of the potential auditory discrimination cues induced by PSOLA, cues from the first formant were found to generally dominate discrimination performance. In the second set of experiments the influence of vocal perturbation, i.e., jitter and shimmer, on discriminability of PSOLA-modified single-formant stimuli was determined. The data show that discriminability deteriorates at most modestly in the presence of jitter and shimmer. With the exception of a few conditions, the trends in these data could be replicated by either using a modulation-discrimination or an intensity-discrimination model, dependent on the formant frequency. As a baseline experiment detection thresholds for jitter and shimmer were measured. Thresholds for jitter could be replicated by using either the modulation-discrimination or the intensity-discrimination model, dependent on the (mean) fundamental frequency of stimuli. The thresholds for shimmer could be accurately predicted for stimuli with a 250-Hz fundamental, but less accurately in the case of a 100-Hz fundamenta

    Calibration of the TDT equipment and the Beyer DT990 headphones

    Get PDF
    corecore