7,069 research outputs found

    Anthropomorphic Coding of Speech and Audio: A Model Inversion Approach

    Get PDF
    Auditory modeling is a well-established methodology that provides insight into human perception and that facilitates the extraction of signal features that are most relevant to the listener. The aim of this paper is to provide a tutorial on perceptual speech and audio coding using an invertible auditory model. In this approach, the audio signal is converted into an auditory representation using an invertible auditory model. The auditory representation is quantized and coded. Upon decoding, it is then transformed back into the acoustic domain. This transformation converts a complex distortion criterion into a simple one, thus facilitating quantization with low complexity. We briefly review past work on auditory models and describe in more detail the components of our invertible model and its inversion procedure, that is, the method to reconstruct the signal from the output of the auditory model. We summarize attempts to use the auditory representation for low-bit-rate coding. Our approach also allows the exploitation of the inherent redundancy of the human auditory system for the purpose of multiple description (joint source-channel) coding

    Physiological and psychoacoustical correlates of perceiving natural and modified speech

    Get PDF

    Determination and evaluation of clinically efficient stopping criteria for the multiple auditory steady-state response technique

    Get PDF
    Background: Although the auditory steady-state response (ASSR) technique utilizes objective statistical detection algorithms to estimate behavioural hearing thresholds, the audiologist still has to decide when to terminate ASSR recordings introducing once more a certain degree of subjectivity. Aims: The present study aimed at establishing clinically efficient stopping criteria for a multiple 80-Hz ASSR system. Methods: In Experiment 1, data of 31 normal hearing subjects were analyzed off-line to propose stopping rules. Consequently, ASSR recordings will be stopped when (1) all 8 responses reach significance and significance can be maintained for 8 consecutive sweeps; (2) the mean noise levels were ≤ 4 nV (if at this “≤ 4-nV” criterion, p-values were between 0.05 and 0.1, measurements were extended only once by 8 sweeps); and (3) a maximum amount of 48 sweeps was attained. In Experiment 2, these stopping criteria were applied on 10 normal hearing and 10 hearing-impaired adults to asses the efficiency. Results: The application of these stopping rules resulted in ASSR threshold values that were comparable to other multiple-ASSR research with normal hearing and hearing-impaired adults. Furthermore, in 80% of the cases, ASSR thresholds could be obtained within a time-frame of 1 hour. Investigating the significant response-amplitudes of the hearing-impaired adults through cumulative curves indicated that probably a higher noise-stop criterion than “≤ 4 nV” can be used. Conclusions: The proposed stopping rules can be used in adults to determine accurate ASSR thresholds within an acceptable time-frame of about 1 hour. However, additional research with infants and adults with varying degrees and configurations of hearing loss is needed to optimize these criteria

    Detectability index measures of binaural masking level difference across populations of inferior colliculus neurons.

    Get PDF
    In everyday life we continually need to detect signals against a background of interfering noise (the “cocktail party effect”): a task that is much easier to accomplish using two ears. The binaural masking level difference (BMLD) measures the ability of listeners to use a difference in binaural attributes to segregate sound sources and thus improve their discriminability against interfering noises. By computing the detectability of tones from rate-versus-level functions in the presence of a suprathreshold noise, we previously demonstrated that individual low-frequency delay-sensitive neurons in the inferior colliculus are able to show BMLDs. Here we consider the responses of a population of such neurons when the noise level is held constant (as conventionally in psychophysical paradigms). We have sampled the responses of 121 units in the inferior colliculi of five guinea pigs to identical noise and 500 Hz tones at both ears (NoSo) and to identical noise but with the 500 Hz tone at one ear inverted (NoSπ). The result suggests that the neurons subserving detection of So tones in No (identical noise at the two ears) noise are those neurons with best frequencies (BFs) close to 500 Hz that respond to So tones with an increase in their discharge rate from that attributable to the noise. The detection of the inverted (Sπ) signal is also attributable to neurons with BFs close to 500 Hz. However, among these neurons, the presence of the Sπ tone was indicated by an increased discharge rate in some neurons and by a decreased discharge rate in others

    Acoustic signal processing based on the short-time spectrum

    Get PDF
    technical reportThe frequency domain representation of a time signal afforded by the Fourier transform is a powerful tool in acoustic signal processing. The usefulness of this representation is rooted in the mechanisms of sound production and perception. Many sources of sound exhibit normal modes or natural frequencies of vibration, and can be described concisely in the frequency domain. The human auditory system performs frequency analysis early in the hearing process, so perception is often best described by frequency domain parameters. This dissertation investigates a new approach to acoustic signal processing based on the short-time fourier transform, a two dimensional representation which shows the time and frequency structure of sounds. This representation is appropriate for signals such as speech and music. Where the natural frequencies of the source change and timing of these changes is important to perception. The principal advantage of this approach is that the signal processing domain is similar to the perceptual domain, so that signal modifications can be related to perceptual criteria. The mathematical basis for this type of processing is developed, and four examples are described: removal of broad band background noise, isolation of perceptually important speech features, dynamic range compression and expansion, and removal of locally periodic interfering signals

    Improvement of Speech Perception for Hearing-Impaired Listeners

    Get PDF
    Hearing impairment is becoming a prevalent health problem affecting 5% of world adult populations. Hearing aids and cochlear implant already play an essential role in helping patients over decades, but there are still several open problems that prevent them from providing the maximum benefits. Financial and discomfort reasons lead to only one of four patients choose to use hearing aids; Cochlear implant users always have trouble in understanding speech in a noisy environment. In this dissertation, we addressed the hearing aids limitations by proposing a new hearing aid signal processing system named Open-source Self-fitting Hearing Aids System (OS SF hearing aids). The proposed hearing aids system adopted the state-of-art digital signal processing technologies, combined with accurate hearing assessment and machine learning based self-fitting algorithm to further improve the speech perception and comfort for hearing aids users. Informal testing with hearing-impaired listeners showed that the testing results from the proposed system had less than 10 dB (by average) difference when compared with those results obtained from clinical audiometer. In addition, Sixteen-channel filter banks with adaptive differential microphone array provides up to six-dB SNR improvement in the noisy environment. Machine-learning based self-fitting algorithm provides more suitable hearing aids settings. To maximize cochlear implant users’ speech understanding in noise, the sequential (S) and parallel (P) coding strategies were proposed by integrating high-rate desynchronized pulse trains (DPT) in the continuous interleaved sampling (CIS) strategy. Ten participants with severe hearing loss participated in the two rounds cochlear implants testing. The testing results showed CIS-DPT-S strategy significantly improved (11%) the speech perception in background noise, while the CIS-DPT-P strategy had a significant improvement in both quiet (7%) and noisy (9%) environment

    Spectral discontinuity in concatenative speech synthesis – perception, join costs and feature transformations

    Get PDF
    This thesis explores the problem of determining an objective measure to represent human perception of spectral discontinuity in concatenative speech synthesis. Such measures are used as join costs to quantify the compatibility of speech units for concatenation in unit selection synthesis. No previous study has reported a spectral measure that satisfactorily correlates with human perception of discontinuity. An analysis of the limitations of existing measures and our understanding of the human auditory system were used to guide the strategies adopted to advance a solution to this problem. A listening experiment was conducted using a database of concatenated speech with results indicating the perceived continuity of each concatenation. The results of this experiment were used to correlate proposed measures of spectral continuity with the perceptual results. A number of standard speech parametrisations and distance measures were tested as measures of spectral continuity and analysed to identify their limitations. Time-frequency resolution was found to limit the performance of standard speech parametrisations.As a solution to this problem, measures of continuity based on the wavelet transform were proposed and tested, as wavelets offer superior time-frequency resolution to standard spectral measures. A further limitation of standard speech parametrisations is that they are typically computed from the magnitude spectrum. However, the auditory system combines information relating to the magnitude spectrum, phase spectrum and spectral dynamics. The potential of phase and spectral dynamics as measures of spectral continuity were investigated. One widely adopted approach to detecting discontinuities is to compute the Euclidean distance between feature vectors about the join in concatenated speech. The detection of an auditory event, such as the detection of a discontinuity, involves processing high up the auditory pathway in the central auditory system. The basic Euclidean distance cannot model such behaviour. A study was conducted to investigate feature transformations with sufficient processing complexity to mimic high level auditory processing. Neural networks and principal component analysis were investigated as feature transformations. Wavelet based measures were found to outperform all measures of continuity based on standard speech parametrisations. Phase and spectral dynamics based measures were found to correlate with human perception of discontinuity in the test database, although neither measure was found to contribute a significant increase in performance when combined with standard measures of continuity. Neural network feature transformations were found to significantly outperform all other measures tested in this study, producing correlations with perceptual results in excess of 90%

    Mathematical Modeling of Human Speech Processing Mechanism Based on the Principle of Brain Internal Model of Vocal Tract

    Get PDF
    Article信州大学工学部紀要 75: 87-96 (1995)departmental bulletin pape
    corecore