76 research outputs found

    Digital Signal Processing

    Get PDF
    Contains an introduction and reports on fourteen research projects.National Science Foundation FellowshipNational Science Foundation (Grant ECS84-07285)U.S. Navy - Office of Naval Research (Contract N00014-81-K-0742)Sanders Associates, Inc.U.S. Air Force - Office of Scientific Research (Contract F19628-85-K-0028)Advanced Television Research ProgramAmoco Foundation FellowshipHertz Foundation Fellowshi

    Techniques for the enhancement of linear predictive speech coding in adverse conditions

    Get PDF

    Improving the Speech Intelligibility By Cochlear Implant Users

    Get PDF
    In this thesis, we focus on improving the intelligibility of speech for cochlear implants (CI) users. As an auditory prosthetic device, CI can restore hearing sensations for most patients with profound hearing loss in both ears in a quiet background. However, CI users still have serious problems in understanding speech in noisy and reverberant environments. Also, bandwidth limitation, missing temporal fine structures, and reduced spectral resolution due to a limited number of electrodes are other factors that raise the difficulty of hearing in noisy conditions for CI users, regardless of the type of noise. To mitigate these difficulties for CI listener, we investigate several contributing factors such as the effects of low harmonics on tone identification in natural and vocoded speech, the contribution of matched envelope dynamic range to the binaural benefits and contribution of low-frequency harmonics to tone identification in quiet and six-talker babble background. These results revealed several promising methods for improving speech intelligibility for CI patients. In addition, we investigate the benefits of voice conversion in improving speech intelligibility for CI users, which was motivated by an earlier study showing that familiarity with a talker’s voice can improve understanding of the conversation. Research has shown that when adults are familiar with someone’s voice, they can more accurately – and even more quickly – process and understand what the person is saying. This theory identified as the “familiar talker advantage” was our motivation to examine its effect on CI patients using voice conversion technique. In the present research, we propose a new method based on multi-channel voice conversion to improve the intelligibility of transformed speeches for CI patients

    Cortical regions activated by spectrally degraded speech in adults with single sided deafness or bilateral normal hearing

    Get PDF
    Those with profound sensorineural hearing loss from single sided deafness (SSD) generally experience greater cognitive effort and fatigue in adverse sound environments. We studied cases with right ear, SSD compared to normal hearing (NH) individuals. SSD cases were significantly less correct in naming last words in spectrally degraded 8- and 16-band vocoded sentences, despite high semantic predictability. Group differences were not significant for less intelligible 4-band sentences, irrespective of predictability. SSD also had diminished BOLD percent signal changes to these same sentences in left hemisphere (LH) cortical regions of early auditory, association auditory, inferior frontal, premotor, inferior parietal, dorsolateral prefrontal, posterior cingulate, temporal-parietal-occipital junction, and posterior opercular. Cortical regions with lower amplitude responses in SSD than NH were mostly components of a LH language network, previously noted as concerned with speech recognition. Recorded BOLD signal magnitudes were averages from all vertices within predefined parcels from these cortex regions. Parcels from different regions in SSD showed significantly larger signal magnitudes to sentences of greater intelligibility (e.g., 8- or 16- vs. 4-band) in all except early auditory and posterior cingulate cortex. Significantly lower response magnitudes occurred in SSD than NH in regions prior studies found responsible for phonetics and phonology of speech, cognitive extraction of meaning, controlled retrieval of word meaning, and semantics. The findings suggested reduced activation of a LH fronto-temporo-parietal network in SSD contributed to difficulty processing speech for word meaning and sentence semantics. Effortful listening experienced by SSD might reflect diminished activation to degraded speech in the affected LH language network parcels. SSD showed no compensatory activity in matched right hemisphere parcels

    An investigation into glottal waveform based speech coding

    Get PDF
    Coding of voiced speech by extraction of the glottal waveform has shown promise in improving the efficiency of speech coding systems. This thesis describes an investigation into the performance of such a system. The effect of reverberation on the radiation impedance at the lips is shown to be negligible under normal conditions. Also, the accuracy of the Image Method for adding artificial reverberation to anechoic speech recordings is established. A new algorithm, Pre-emphasised Maximum Likelihood Epoch Detection (PMLED), for Glottal Closure Instant detection is proposed. The algorithm is tested on natural speech and is shown to be both accurate and robust. Two techniques for giottai waveform estimation, Closed Phase Inverse Filtering (CPIF) and Iterative Adaptive Inverse Filtering (IAIF), are compared. In tandem with an LF model fitting procedure, both techniques display a high degree of accuracy However, IAIF is found to be slightly more robust. Based on these results, a Glottal Excited Linear Predictive (GELP) coding system for voiced speech is proposed and tested. Using a differential LF parameter quantisation scheme, the system achieves speech quality similar to that of U S Federal Standard 1016 CELP at a lower mean bit rate while incurring no extra delay

    Spectrogram inversion and potential applications for hearing research

    Get PDF

    An asynchronous,low-power architecture for interleaved neural stimulation, using envelope and phase information

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2007.Includes bibliographical references (p. 122-124).This thesis describes a low-power cochlear-implant processor chip and a charge-balanced stimulation chip that together form a complete processing-and-stimulation cochlear-implant system. The processor chip uses a novel Asynchronous Interleaved Stimulation (AIS) algorithm that preserves phase and amplitude cues in its spectral input while simultaneously minimizing electrode interactions and lowering average stimulation power per electrode. The stimulator chip obviates the need for large D.C. blocking capacitors in neural implants to achieve highly precise charge-balanced stimulation, thus lowering the size and cost of the implant. Thus, this thesis suggests that significant performance, power and cost improvements in the current generation of cochlear implants may be simultaneously possible. The 16-channel ~90 square mm AIS processor chip was built in a 1.5[mu]m VLSI process and consumed 107[mu]W of power over and above that of its analog spectral processing front end, which consumed 250gtW and which has been previously described. The AIS processor was found to faithfully mimic MATLAB implementations of the AIS algorithm. Two perceptual tests of the AIS algorithm with normal-hearing listeners verified that AIS signal reconstructions enabled better melody and speech recognition in noise than traditional envelope-only vocoder simulations of cochlear-implant processing. The average firing rate of the AIS processor was found to be significantly lower than in traditional synchronous stimulators, suggesting that the AIS algorithm and processor can potentially save power and improve hearing performance in cochlear-implant users. The stimulator chip was built in a 0.7glm high-voltage VLSI process and performed dynamic current balancing followed by a shorting phase.(cont.) It achieved <6nA of average DC current error, well below the targeted safety limit of 25nA for cochlear-implant patients. On +6 and -9V rails, the power consumption of a single channel of this chip was 47[mu]W when biasing power is shared by 16 channels. It puts out a charge-balanced stimulation pulse whenever it receives an asynchronous input signal from an AIS processor encoding phase information and 7-bit amplitude information, thus making the AIS processor chip and stimulator chip fully compatible in the cochlear-implant system. The AIS algorithm and charge-balancing circuits described in this work may be useful in other nerve-stimulation prosthetics where good fidelity in input-information encoding, minimization of electrode interactions, low-power strategies for stimulation, and compact charge-balanced stimulation are also important.by Ji-Jon Sit.Ph.D

    Phase-Distortion-Robust Voice-Source Analysis

    Get PDF
    This work concerns itself with the analysis of voiced speech signals, in particular the analysis of the glottal source signal. Following the source-filter theory of speech, the glottal signal is produced by the vibratory behaviour of the vocal folds and is modulated by the resonances of the vocal tract and radiation characteristic of the lips to form the speech signal. As it is thought that the glottal source signal contributes much of the non-linguistic and prosodical information to speech, it is useful to develop techniques which can estimate and parameterise this signal accurately. Because of vocal tract modulation, estimating the glottal source waveform from the speech signal is a blind deconvolution problem which necessarily makes assumptions about the characteristics of both the glottal source and vocal tract. A common assumption is that the glottal signal and/or vocal tract can be approximated by a parametric model. Other assumptions include the causality of the speech signal: the vocal tract is assumed to be a minimum phase system while the glottal source is assumed to exhibit mixed phase characteristics. However, as the literature review within this thesis will show, the error criteria utilised to determine the parameters are not robust to the conditions under which the speech signal is recorded, and are particularly degraded in the common scenario where low frequency phase distortion is introduced. Those that are robust to this type of distortion are not well suited to the analysis of real-world signals. This research proposes a voice-source estimation and parameterisation technique, called the Power-spectrum-based determination of the Rd parameter (PowRd) method. Illustrated by theory and demonstrated by experiment, the new technique is robust to the time placement of the analysis frame and phase issues that are generally encountered during recording. The method assumes that the derivative glottal flow signal is approximated by the transformed Liljencrants-Fant model and that the vocal tract can be represented by an all-pole filter. Unlike many existing glottal source estimation methods, the PowRd method employs a new error criterion to optimise the parameters which is also suitable to determine the optimal vocal-tract filter order. In addition to the issue of glottal source parameterisation, nonlinear phase recording conditions can also adversely affect the results of other speech processing tasks such as the estimation of the instant of glottal closure. In this thesis, a new glottal closing instant estimation algorithm is proposed which incorporates elements from the state-of-the-art techniques and is specifically designed for operation upon speech recorded under nonlinear phase conditions. The new method, called the Fundamental RESidual Search or FRESS algorithm, is shown to estimate the glottal closing instant of voiced speech with superior precision and comparable accuracy as other existing methods over a large database of real speech signals under real and simulated recording conditions. An application of the proposed glottal source parameterisation method and glottal closing instant detection algorithm is a system which can analyse and re-synthesise voiced speech signals. This thesis describes perceptual experiments which show that, iunder linear and nonlinear recording conditions, the system produces synthetic speech which is generally preferred to speech synthesised based upon a state-of-the-art timedomain- based parameterisation technique. In sum, this work represents a movement towards flexible and robust voice-source analysis, with potential for a wide range of applications including speech analysis, modification and synthesis

    Non-intrusive identification of speech codecs in digital audio signals

    Get PDF
    Speech compression has become an integral component in all modern telecommunications networks. Numerous codecs have been developed and deployed for efficiently transmitting voice signals while maintaining high perceptual quality. Because of the diversity of speech codecs used by different carriers and networks, the ability to distinguish between different codecs lends itself to a wide variety of practical applications, including determining call provenance, enhancing network diagnostic metrics, and improving automated speaker recognition. However, few research efforts have attempted to provide a methodology for identifying amongst speech codecs in an audio signal. In this research, we demonstrate a novel approach for accurately determining the presence of several contemporary speech codecs in a non-intrusive manner. The methodology developed in this research demonstrates techniques for analyzing an audio signal such that the subtle noise components introduced by the codec processing are accentuated while most of the original speech content is eliminated. Using these techniques, an audio signal may be profiled to gather a set of values that effectively characterize the codec present in the signal. This procedure is first applied to a large data set of audio signals from known codecs to develop a set of trained profiles. Thereafter, signals from unknown codecs may be similarly profiled, and the profiles compared to each of the known training profiles in order to decide which codec is the best match with the unknown signal. Overall, the proposed strategy generates extremely favorable results, with codecs being identified correctly in nearly 95% of all test signals. In addition, the profiling process is shown to require a very short analysis length of less than 4 seconds of audio to achieve these results. Both the identification rate and the small analysis window represent dramatic improvements over previous efforts in speech codec identification
    corecore