509 research outputs found

    Real-Time Contrast Enhancement to Improve Speech Recognition

    Get PDF
    An algorithm that operates in real-time to enhance the salient features of speech is described and its efficacy is evaluated. The Contrast Enhancement (CE) algorithm implements dynamic compressive gain and lateral inhibitory sidebands across channels in a modified winner-take-all circuit, which together produce a form of suppression that sharpens the dynamic spectrum. Normal-hearing listeners identified spectrally smeared consonants (VCVs) and vowels (hVds) in quiet and in noise. Consonant and vowel identification, especially in noise, were improved by the processing. The amount of improvement did not depend on the degree of spectral smearing or talker characteristics. For consonants, when results were analyzed according to phonetic feature, the most consistent improvement was for place of articulation. This is encouraging for hearing aid applications because confusions between consonants differing in place are a persistent problem for listeners with sensorineural hearing loss

    Can phonation types be reliably measured from sound spectra? Some data from Wa and Burmese

    Get PDF
    This paper assesses the value of measuring aspects of an unmodified acoustic recordings of speech in the two language Burmese (Tibeto-Burman) and Wa (Mon_Khmer) in relation to the glottal source, or phonation type. This method faces the problem of how to ensure that what is measured is indeed attributable to the glottal source andnot to supralaryngeal acoustic shaping, or vowel quality. The methods adopted include: analysis of the relative prominence of the H1 and H2, formant amplitude and spectral tilt. The findings are that in Wa H2, F1 and F2 are all more energetic than H1 to a greater degree in creaky phonation than in breathy, though this is due in part to the significantly dominant H1 in breathy phonation. For Burmese, the methods in this study are too crude to tell these two phonation types apart, but they are sufficient to identify the cruder three-way categorisation of phonation types (modal, creaky and breathy), which, it has been suggested, is sufficient to give a satisfactory account of phonologically contrastive phonation type for most purposes. The findings suggest further that the relationship between the higher frequency region of the spectrum and phonation type merits further investigation

    Communicative functions integrate segments in prosodies and prosodies in segments

    Get PDF
    This paper takes a new look at the traditionally established divide between sounds and prosodies, viewing it as a useful heuristics in language descriptions that focus on the segmental make- up of words. It pleads for a new approach that bridges this reified compartmentalization of speech in a more global communicative perspective. Data are presented from a German perception experiment in the framework of the Semantic Differential that shows interdependence of f0 contours and the spectral characteristics of a following fricative segment, for the expression of semantic functions along the scales questioning - asserting, excited - calm, forceful - not forceful, contrary - agreeable. The results lead to the conclusion that segments shape prosodies and are shaped by them in varying ways in the coding of semantic functions. This implies that the analysis of sentence prosodies needs to integrate the manifestation of segments, just as the analysis of segments needs to consider their prosodic embedding. In communicative interaction, speakers set broad prosodic time windows of varying sizes, and listeners respond to them. So, future phonetic research needs to concentrate on speech analysis in such windows

    An Evaluation of an Auditory Neurophysiological Model

    Get PDF
    Individuals with normal hearing are adept at understanding speech in the presence of noise, such as other speakers or environmental sounds. In contrast, individuals with hearing loss struggle to understand speech in the same adverse conditions. Neural processing in the inferior colliculus (IC) of the brainstem appears to contribute to the ability to separate simultaneous competing sounds. A computational model developed in the Sinex lab reproduces the responses of IC neurons to complex sound mixtures. It seems likely that the model can be applied to improve the processing of speech in noise. The computational model\u27s effectiveness at improving the processing of speech in noise is evaluated through a perceptual experiment which uses the model to process sentences that are then presented to listeners. The experiment\u27s data are analyzed to evaluate the pattern of errors. The analysis shows that low frequency speech features are being accurately transmitted by the model while high frequency speech features are not. This pattern suggests ways in which the computational model may be improved. Possible technological and clinical applications of the computational model for individuals with hearing loss will also be discussed

    Reducing Audible Spectral Discontinuities

    Get PDF
    In this paper, a common problem in diphone synthesis is discussed, viz., the occurrence of audible discontinuities at diphone boundaries. Informal observations show that spectral mismatch is most likely the cause of this phenomenon.We first set out to find an objective spectral measure for discontinuity. To this end, several spectral distance measures are related to the results of a listening experiment. Then, we studied the feasibility of extending the diphone database with context-sensitive diphones to reduce the occurrence of audible discontinuities. The number of additional diphones is limited by clustering consonant contexts that have a similar effect on the surrounding vowels on the basis of the best performing distance measure. A listening experiment has shown that the addition of these context-sensitive diphones significantly reduces the amount of audible discontinuities

    Percepcijska utemeljenost kepstranih mjera udaljenosti za primjene u obradi govora

    Get PDF
    Currently, one of the most widely used distance measures in speech and speaker recognition is the Euclidean distance between mel frequency cepstral coefficients (MFCC). MFCCs are based on filter bank algorithm whose filters are equally spaced on a perceptually motivated mel frequency scale. The value of mel cepstral vector, as well as the properties of the corresponding cepstral distance, are determined by several parameters used in mel cepstral analysis. The aim of this work is to examine compatibility of MFCC measure with human perception for different values of parameters in the analysis. By analysing mel filter bank parameters it is found that filter bank with 24 bands, 220 mels bandwidth and band overlap coefficient equal and higher than one gives optimal spectral distortion (SD) distance measures. For this kind of mel filter bank, the difference between vowels can be recognised for full-length mel cepstral SD RMS measure higher than 0.4 - 0.5 dB. Further on, we will show that usage of truncated mel cepstral vector (12 coefficients) is justified for speech recognition, but may be arguable for speaker recognition. We also analysed the impact of aliasing in cepstral domain on cepstral distortion measures. The results showed high correlation of SD distances calculated from aperiodic and periodic mel cepstrum, leading to the conclusion that the impact of aliasing is generally minor. There are rare exceptions where aliasing is present, and these were also analysed.Jedna od danas najčešće korištenih mjera u automatskom prepoznavanju govora i govornika je mjera euklidske udaljenosti MFCC vektora. Algoritam za izračunavanje mel frekvencijskih kepstralnih koeficijenata zasniva se na filtarskom slogu kod kojeg su pojasi ekvidistantno raspoređeni na percepcijski motiviranoj mel skali. Na vrijednost mel kepstralnog vektora, a samim time i na svojstva kepstralne mjere udaljenosti glasova, utječe veći broj parametara sustava za kepstralnu analizu. Tema ovog rada je ispitati usklađenost MFCC mjere sa stvarnim percepcijskim razlikama za različite vrijednosti parametara analize. Analizom parametara mel filtarskog sloga utvrdili smo da filtar sa 24 pojasa, širine 220 mel-a i faktorom preklapanja filtra većim ili jednakim jedan, daje optimalne SD mjere koje se najbolje slažu s percepcijom. Za takav mel filtarski slog granica čujnosti razlike između glasova je 0.4-0.5 dB, mjereno SD RMS razlikom potpunih mel kepstralnih vektora. Također, pokazat ćemo da je korištenje mel kepstralnog vektora odrezanog na konačnu dužinu (12 koeficijenata) opravdano za prepoznavanje govora, ali da bi moglo biti upitno u primjenama prepoznavanja govornika. Analizirali smo i utjecaj preklapanja spektara u kepstralnoj domeni na mjere udaljenosti glasova. Utvrđena je izrazita koreliranost SD razlika izračunatih iz aperiodskog i periodičkog mel kepstra iz čega zaključujemo da je utjecaj preklapanja spektara generalno zanemariv. Postoje rijetke iznimke kod kojih je utjecaj preklapanja spektara prisutan, te su one posebno analizirane

    DEVELOPMENT AND EVALUATION OF ENVELOPE, SPECTRAL AND TIME ENHANCEMENT ALGORITHMS FOR AUDITORY NEUROPATHY

    Get PDF
    Auditory neuropathy (AN) is a hearing disorder that reduces the ability to detect temporal cues in speech, thus leading to deprived speech perception. Traditional amplification and frequency shifting techniques used in modern hearing aids are not suitable to assist individuals with AN due to the unique symptoms that result from the disorder. This study proposes a method for combining both speech envelope enhancement and time scaling to combine the proven benefits of each algorithm. In addition, spectral enhancement is cascaded with envelope and time enhancement to address the poor frequency discrimination in AN. The proposed speech enhancement strategy was evaluated using an AN simulator with normal hearing listeners under varying degrees of AN severity. The results showed a significant increase in word recognition scores for time scaling and envelope enhancement over envelope enhancement alone. Furthermore, the addition of spectral enhancement resulted in further increase in word recognition at profound AN severity

    Perceptual aspects of voice-source parameters

    Get PDF
    xii+114hlm.;24c
    corecore