23 research outputs found

    SIMULASI PERANGKAT IMPLAN KOKLEA DENGAN CONTINUOUS INTERLEAVE SAMPLING

    Get PDF
    Perangkat bantu dengar implan koklea telah diketahui dapat membantu penyandang tuna rungu mempersepsikan suara secara signifikan. Perangkat ini terdiri atas bagian eksternal dan internal, masing-masing berfungsi untuk  menangkap sinyal suara serta mengolah sinyal suara menjadi sinyal listrik yang akan digunakan untuk menstimulasi saraf-saraf pendengaran. Salah satu skema pengolahan sinyal yang umum dipakai pada perangkat implan koklea adalah Continuous Interleave Sampling (CIS), yang memisahkan frekuensi sinyal suara asli menjadi beberapa frekuensi untuk menstimulasi koklea manusia pada titik-titik berbeda. Pada makalah ini, cara kerja implan koklea dengan skema CIS disimulasikan menggunakan peranti lunak LabView. Hasil simulasi menunjukkan bahwa sinyal yang disintesis dari sinyal asli yang telah diproses dengan delapan filter sesuai skema CIS dapat dipersepsikan oleh subjek tes berpendengaran normal. Peningkatan jumlah filter hingga 12 buah tidak menambah inteligibilitas sinyal hasil sintesis, sebaliknya penggunaan kurang dari 5 (lima) filter akan mengakibatkan sinyal hasil sintesis sulit dipahami. Kata kunci: implan koklea, continuous interleave sampling, simulasi ujara

    Temporal fine structure: the missing component in speech processing algorithms

    Get PDF
    Abstract. A speech processing algorithm, which encodes the temporal fine structure of sound by extracting slowly varying frequency modulations, was shown to improve speech perception in noise, talker identification, melody recognition, and tonal language recognition.

    On automatic recognition of spectrally reduced speech synthesized from amplitude and frequency modulations

    Get PDF
    This report investigates the behavior of automatic speech recognition (ASR) system with spectrally reduced speech (SRS) synthesized from subband amplitude modulations (AMs) and frequency modulations (FMs). Acoustic analysis shows that the resynthesis of SRS from only AM components helps alleviate certain non-linguistic iabilities in the original speech signal. When the SRS spectral resolution is suciently good, this alleviation not only has no consequence but also yields comparable or even better ASR word accuracy compared to that attained with original clean speech signal. In contrast, FM components support human speech recognition but yield no significant improvement in terms of ASR word accuracy when the SRS spectral resolution is sufficiently good

    The neural representation and behavioral detection of frequency modulation

    Get PDF
    Understanding a speech signal is reliant on the ability of the auditory system to accurately encode rapidly changing spectral and temporal cues over time. Evidence from behavioral studies in humans suggests that relatively poor temporal fine structure (TFS) encoding ability is correlated with poorer performance on speech understanding tasks in quiet and in noise. Electroencephalography, including measurement of the frequency-following response, has been used to assess the human central auditory nervous system’s ability to encode temporal patterns in steady-state and dynamic tonal stimuli and short syllables. To date, the FFR has been used to investigate the accuracy of phase-locked auditory encoding of various stimuli, however, no study has demonstrated an FFR evoked by dynamic TFS contained in the modulating frequency content of a carrier tone. Furthermore, the relationship between a physiological representation of TFS encoding and either behavioral perception or speech-in-noise understanding has not been studied. The present study investigated the feasibility of eliciting FFRs in young, normal-hearing listeners using frequency-modulated (FM) tones, which contain TFS. Brainstem responses were compared to the behavioral detection of frequency modulation as well as speech-in-noise understanding. FFRs in response to FM tones were obtained from all listeners, indicating a reliable measurement of TFS encoding within the brainstem. FFRs were more accurate at lower carrier frequencies and at shallower FM depths. FM detection ability was consistent with previously reported findings in normal-hearing listeners. In the present study, however, FFR accuracy was not predictive of behavioral performance. Additionally, FFR accuracy was not predictive of speech-in-noise understanding. Further investigation of brainstem encoding of TFS may reveal a stronger brain-behavior relationship across an age continuum

    Improvement of Speech Perception for Hearing-Impaired Listeners

    Get PDF
    Hearing impairment is becoming a prevalent health problem affecting 5% of world adult populations. Hearing aids and cochlear implant already play an essential role in helping patients over decades, but there are still several open problems that prevent them from providing the maximum benefits. Financial and discomfort reasons lead to only one of four patients choose to use hearing aids; Cochlear implant users always have trouble in understanding speech in a noisy environment. In this dissertation, we addressed the hearing aids limitations by proposing a new hearing aid signal processing system named Open-source Self-fitting Hearing Aids System (OS SF hearing aids). The proposed hearing aids system adopted the state-of-art digital signal processing technologies, combined with accurate hearing assessment and machine learning based self-fitting algorithm to further improve the speech perception and comfort for hearing aids users. Informal testing with hearing-impaired listeners showed that the testing results from the proposed system had less than 10 dB (by average) difference when compared with those results obtained from clinical audiometer. In addition, Sixteen-channel filter banks with adaptive differential microphone array provides up to six-dB SNR improvement in the noisy environment. Machine-learning based self-fitting algorithm provides more suitable hearing aids settings. To maximize cochlear implant users’ speech understanding in noise, the sequential (S) and parallel (P) coding strategies were proposed by integrating high-rate desynchronized pulse trains (DPT) in the continuous interleaved sampling (CIS) strategy. Ten participants with severe hearing loss participated in the two rounds cochlear implants testing. The testing results showed CIS-DPT-S strategy significantly improved (11%) the speech perception in background noise, while the CIS-DPT-P strategy had a significant improvement in both quiet (7%) and noisy (9%) environment

    Optimizing the neural response to electrical stimulation and exploring new applications of neurostimulation

    Get PDF
    Electrical stimulation has been successful in treating patients who suffer from neurologic and neuropsychiatric disorders that are resistant to standard treatments. For deep brain stimulation (DBS), its official approved use has been limited to mainly motor disorders, such as Parkinson\u27s disease and essential tremor. Alcohol use disorder, and addictive disorders in general, is a prevalent condition that is difficult to treat long-term. To determine whether DBS can reduce alcohol drinking in animals, voluntary alcohol consumption of alcohol-preferring rats before, during, and after stimulation of the nucleus accumbens shell were compared. Intake levels in the low stimulus intensity group (n=3, 100&mgr;A current) decreased by as much as 43% during stimulation, but the effect did not persist. In the high stimulus intensity group (n=4, 200&mgr;A current), alcohol intake decreased as much as 59%, and the effect was sustained. These results demonstrate the potent, reversible effects of DBS.^ Left vagus nerve stimulation (VNS) is approved for treating epilepsy and depression. However, the standard method of determining stimulus parameters is imprecise, and the patient responses are highly variable. I developed a method of designing custom stimulus waveforms and assessing the nerve response to optimize stimulation selectivity and efficiency. VNS experiments were performed in rats aiming to increase the selectivity of slow nerve fibers while assessing activation efficiency. When producing 50% of maximal activation of slow fibers, customized stimuli were able to activate as low as 12.8% of fast fibers, while the lowest for standard rectangular waveforms was 35.0% (n=4-6 animals). However, the stimulus with the highest selectivity requires 19.6nC of charge per stimulus phase, while the rectangular stimulus required only 13.2nC.^ Right VNS is currently under clinical investigation for preventing sudden unexpected death in epilepsy and for treating heart failure. Activation of the right vagal parasympathetic fibers led to waveform-independent reductions in heart rate, ejection ratio, and stroke volume. Customized stimulus design with response feedback produces reproducible and predictable patterns of nerve activation and physiological effects, which will lead to more consistent patient responses

    On the mechanism of response latencies in auditory nerve fibers

    Get PDF
    Despite the structural differences of the middle and inner ears, the latency pattern in auditory nerve fibers to an identical sound has been found similar across numerous species. Studies have shown the similarity in remarkable species with distinct cochleae or even without a basilar membrane. This stimulus-, neuron-, and species- independent similarity of latency cannot be simply explained by the concept of cochlear traveling waves that is generally accepted as the main cause of the neural latency pattern. An original concept of Fourier pattern is defined, intended to characterize a feature of temporal processing—specifically phase encoding—that is not readily apparent in more conventional analyses. The pattern is created by marking the first amplitude maximum for each sinusoid component of the stimulus, to encode phase information. The hypothesis is that the hearing organ serves as a running analyzer whose output reflects synchronization of auditory neural activity consistent with the Fourier pattern. A combined research of experimental, correlational and meta-analysis approaches is used to test the hypothesis. Manipulations included phase encoding and stimuli to test their effects on the predicted latency pattern. Animal studies in the literature using the same stimulus were then compared to determine the degree of relationship. The results show that each marking accounts for a large percentage of a corresponding peak latency in the peristimulus-time histogram. For each of the stimuli considered, the latency predicted by the Fourier pattern is highly correlated with the observed latency in the auditory nerve fiber of representative species. The results suggest that the hearing organ analyzes not only amplitude spectrum but also phase information in Fourier analysis, to distribute the specific spikes among auditory nerve fibers and within a single unit. This phase-encoding mechanism in Fourier analysis is proposed to be the common mechanism that, in the face of species differences in peripheral auditory hardware, accounts for the considerable similarities across species in their latency-by-frequency functions, in turn assuring optimal phase encoding across species. Also, the mechanism has the potential to improve phase encoding of cochlear implants

    Speech Decomposition and Enhancement

    Get PDF
    The goal of this study is to investigate the roles of steady-state speech sounds and transitions between these sounds in the intelligibility of speech. The motivation for this approach is that the auditory system may be particularly sensitive to time-varying frequency edges, which in speech are produced primarily by transitions between vowels and consonants and within vowels. The possibility that selectively amplifying these edges may enhance speech intelligibility is examined. Computer algorithms to decompose speech into two different components were developed. One component, which is defined as a tonal component, was intended to predominately include formant activity. The second component, which is defined as a non-tonal component, was intended to predominately include transitions between and within formants.The approach to the decomposition is to use a set of time-varying filters whose center frequencies and bandwidths are controlled to identify the strongest formant components in speech. Each center frequency and bandwidth is estimated based on FM and AM information of each formant component. The tonal component is composed of the sum of the filter outputs. The non-tonal component is defined as the difference between the original speech signal and the tonal component.The relative energy and intelligibility of the tonal and non-tonal components were compared to the original speech. Psychoacoustic growth functions were used to assess the intelligibility. Most of the speech energy was in the tonal component, but this component had a significantly lower maximum word recognition than the original and non-tonal component had. The non-tonal component averaged 2% of the original speech energy, but this component had almost equal maximum word recognition as the original speech. The non-tonal component was amplified and recombined with the original speech to generate enhanced speech. The energy of the enhanced speech was adjusted to be equal to the original speech, and the intelligibility of the enhanced speech was compared to the original speech in background noise. The enhanced speech showed higher recognition scores at lower SNRs, and the differences were significant. The original and enhanced speech showed similar recognition scores at higher SNRs. These results suggest that amplification of transient information can enhance the speech in noise and this enhancement method is more effective at severe noise conditions

    Peripheral auditory processing and speech reception in impaired hearing

    Get PDF

    Constructing Invariant Representation of Sound Using Optimal Features And Sound Statistics Adaptation

    Get PDF
    The ability to convey information using sound is critical for the survival of many vocal species, including humans. These communication sounds (vocalizations or calls) are often comprised of complex spectrotemporal features that require accurate detection to prevent mis-categorization. This task is made difficult by two factors: 1) the inherent variability in vocalization production, and 2) competing sounds from the environment. The auditory system must generalize across these variabilities while maintaining sufficient sensitivity to detect subtle differences in fine acoustic structures. While several studies have described vocalization-selective and noise invariant neural responses in the auditory pathway at a phenomenological level, the algorithmic and mechanistic principles behind these observations remain speculative. In this thesis, we first adopted a theoretical approach to develop biologically plausible computational algorithms to categorize vocalizations while generalizing over sound production and environment variability. From an initial set of randomly chosen vocalization features, we used a greedy search algorithm to select most informative features that maximized vocalization categorization performance and minimized redundancy between features. High classification performance could be achieved using only 10–20 features per vocalization category. The optimal features tended to be of intermediate complexity, offering an optimal compromise between fine and tolerant feature tuning. Predictions of tuning properties of putative feature-selective neurons matched some observed auditory cortical responses. While this algorithm performed well in quiet listening conditions, it failed in noisy conditions. To address this shortcoming, we implemented biologically plausible algorithms to improve model performance in noisy conditions. We explored two model elements to aid adaption to sound statistics: 1. De-noising of noisy inputs by thresholding based on wide-band energy, and 2. Adjusting feature detection parameters to offset noise-masking effects. These processes were consistent with physiological observations of gain control mechanisms and principles of efficient encoding in the brain. With these additions, our model was able to achieve near-physiological levels of performance. Our results suggest that invariant representation of sound can be achieved based on task-dependent features with adaptation to input sound statistics
    corecore