380 research outputs found

    A psychoacoustic "NofM"-type speech coding strategy for cochlear implants

    Get PDF
    We describe a new signal processing technique for cochlear implants using a psychoacoustic-masking model. The technique is based on the principle of a so-called "NofM" strategy. These strategies stimulate fewer channels (N) per cycle than active electrodes (NofM; N < M). In "NofM" strategies such as ACE or SPEAK, only the N channels with higher amplitudes are stimulated. The new strategy is based on the ACE strategy but uses a psychoacoustic-masking model in order to determine the essential components of any given audio signal. This new strategy was tested on device users in an acute Study, with either 4 or 8 channels stimulated per cycle. For the first condition (4 channels), the mean improvement over the ACE strategy was 17%. For the second condition (8 channels), no significant difference was found between the two strategies

    Audio Processing and Loudness Estimation Algorithms with iOS Simulations

    Get PDF
    abstract: The processing power and storage capacity of portable devices have improved considerably over the past decade. This has motivated the implementation of sophisticated audio and other signal processing algorithms on such mobile devices. Of particular interest in this thesis is audio/speech processing based on perceptual criteria. Specifically, estimation of parameters from human auditory models, such as auditory patterns and loudness, involves computationally intensive operations which can strain device resources. Hence, strategies for implementing computationally efficient human auditory models for loudness estimation have been studied in this thesis. Existing algorithms for reducing computations in auditory pattern and loudness estimation have been examined and improved algorithms have been proposed to overcome limitations of these methods. In addition, real-time applications such as perceptual loudness estimation and loudness equalization using auditory models have also been implemented. A software implementation of loudness estimation on iOS devices is also reported in this thesis. In addition to the loudness estimation algorithms and software, in this thesis project we also created new illustrations of speech and audio processing concepts for research and education. As a result, a new suite of speech/audio DSP functions was developed and integrated as part of the award-winning educational iOS App 'iJDSP." These functions are described in detail in this thesis. Several enhancements in the architecture of the application have also been introduced for providing the supporting framework for speech/audio processing. Frame-by-frame processing and visualization functionalities have been developed to facilitate speech/audio processing. In addition, facilities for easy sound recording, processing and audio rendering have also been developed to provide students, practitioners and researchers with an enriched DSP simulation tool. Simulations and assessments have been also developed for use in classes and training of practitioners and students.Dissertation/ThesisM.S. Electrical Engineering 201

    Traditional Psychoacoustic Model and Daubechies Wavelets for Enhanced Speech Coder Performance

    Get PDF
    Speech compression techniques based on traditional psychoacoustic model have been proposed by many researchers. We have suggested Discrete Wavelet Transform (DWT) supported by the same psychoacoustic model for speech compression. This paper presents a traditional psychoacoustic model to process equal partitions of total bandwidth spectrum of audio signal frequency to reduce redundancy by filtering out the tones and noise masker in speech signal. Here, the uniform filter banks are used for efficient computations and selection of appropriate threshold level for better compression of Discrete Wavelet Transformed coefficients. Daubechies wavelet filter bank is a nonlinear and asymmetric wavelet filter bank. It is equivalent to cochlear filter of human hearing system. The resemblance between Daubechies Filter Bank and our hearing system is used to develop the novel speech coder. Results have shown better performance in terms of compression factor (CF) and Signal-to-Noise Ratio (SNR) as compare to the methods suggested earlier

    Auf einem menschlichen Gehörmodell basierende Elektrodenstimulationsstrategie für Cochleaimplantate

    Get PDF
    Cochleaimplantate (CI), verbunden mit einer professionellen Rehabilitation, haben mehreren hunderttausenden Hörgeschädigten die verbale Kommunikation wieder ermöglicht. Betrachtet man jedoch die Rehabilitationserfolge, so haben CI-Systeme inzwischen ihre Grenzen erreicht. Die Tatsache, dass die meisten CI-Träger nicht in der Lage sind, Musik zu genießen oder einer Konversation in geräuschvoller Umgebung zu folgen, zeigt, dass es noch Raum für Verbesserungen gibt.Diese Dissertation stellt die neue CI-Signalverarbeitungsstrategie Stimulation based on Auditory Modeling (SAM) vor, die vollständig auf einem Computermodell des menschlichen peripheren Hörsystems beruht.Im Rahmen der vorliegenden Arbeit wurde die SAM Strategie dreifach evaluiert: mit vereinfachten Wahrnehmungsmodellen von CI-Nutzern, mit fünf CI-Nutzern, und mit 27 Normalhörenden mittels eines akustischen Modells der CI-Wahrnehmung. Die Evaluationsergebnisse wurden stets mit Ergebnissen, die durch die Verwendung der Advanced Combination Encoder (ACE) Strategie ermittelt wurden, verglichen. ACE stellt die zurzeit verbreitetste Strategie dar. Erste Simulationen zeigten, dass die Sprachverständlichkeit mit SAM genauso gut wie mit ACE ist. Weiterhin lieferte SAM genauere binaurale Merkmale, was potentiell zu einer Verbesserung der Schallquellenlokalisierungfähigkeit führen kann. Die Simulationen zeigten ebenfalls einen erhöhten Anteil an zeitlichen Pitchinformationen, welche von SAM bereitgestellt wurden. Die Ergebnisse der nachfolgenden Pilotstudie mit fünf CI-Nutzern zeigten mehrere Vorteile von SAM auf. Erstens war eine signifikante Verbesserung der Tonhöhenunterscheidung bei Sinustönen und gesungenen Vokalen zu erkennen. Zweitens bestätigten CI-Nutzer, die kontralateral mit einem Hörgerät versorgt waren, eine natürlicheren Klangeindruck. Als ein sehr bedeutender Vorteil stellte sich drittens heraus, dass sich alle Testpersonen in sehr kurzer Zeit (ca. 10 bis 30 Minuten) an SAM gewöhnen konnten. Dies ist besonders wichtig, da typischerweise Wochen oder Monate nötig sind. Tests mit Normalhörenden lieferten weitere Nachweise für die verbesserte Tonhöhenunterscheidung mit SAM.Obwohl SAM noch keine marktreife Alternative ist, versucht sie den Weg für zukünftige Strategien, die auf Gehörmodellen beruhen, zu ebnen und ist somit ein erfolgversprechender Kandidat für weitere Forschungsarbeiten.Cochlear implants (CIs) combined with professional rehabilitation have enabled several hundreds of thousands of hearing-impaired individuals to re-enter the world of verbal communication. Though very successful, current CI systems seem to have reached their peak potential. The fact that most recipients claim not to enjoy listening to music and are not capable of carrying on a conversation in noisy or reverberative environments shows that there is still room for improvement.This dissertation presents a new cochlear implant signal processing strategy called Stimulation based on Auditory Modeling (SAM), which is completely based on a computational model of the human peripheral auditory system.SAM has been evaluated through simplified models of CI listeners, with five cochlear implant users, and with 27 normal-hearing subjects using an acoustic model of CI perception. Results have always been compared to those acquired using Advanced Combination Encoder (ACE), which is today’s most prevalent CI strategy. First simulations showed that speech intelligibility of CI users fitted with SAM should be just as good as that of CI listeners fitted with ACE. Furthermore, it has been shown that SAM provides more accurate binaural cues, which can potentially enhance the sound source localization ability of bilaterally fitted implantees. Simulations have also revealed an increased amount of temporal pitch information provided by SAM. The subsequent pilot study, which ran smoothly, revealed several benefits of using SAM. First, there was a significant improvement in pitch discrimination of pure tones and sung vowels. Second, CI users fitted with a contralateral hearing aid reported a more natural sound of both speech and music. Third, all subjects were accustomed to SAM in a very short period of time (in the order of 10 to 30 minutes), which is particularly important given that a successful CI strategy change typically takes weeks to months. An additional test with 27 normal-hearing listeners using an acoustic model of CI perception delivered further evidence for improved pitch discrimination ability with SAM as compared to ACE.Although SAM is not yet a market-ready alternative, it strives to pave the way for future strategies based on auditory models and it is a promising candidate for further research and investigation

    An application of an auditory periphery model in speaker identification

    Get PDF
    The number of applications of automatic Speaker Identification (SID) is growing due to the advanced technologies for secure access and authentication in services and devices. In 2016, in a study, the Cascade of Asymmetric Resonators with Fast Acting Compression (CAR FAC) cochlear model achieved the best performance among seven recent cochlear models to fit a set of human auditory physiological data. Motivated by the performance of the CAR-FAC, I apply this cochlear model in an SID task for the first time to produce a similar performance to a human auditory system. This thesis investigates the potential of the CAR-FAC model in an SID task. I investigate the capability of the CAR-FAC in text-dependent and text-independent SID tasks. This thesis also investigates contributions of different parameters, nonlinearities, and stages of the CAR-FAC that enhance SID accuracy. The performance of the CAR-FAC is compared with another recent cochlear model called the Auditory Nerve (AN) model. In addition, three FFT-based auditory features – Mel frequency Cepstral Coefficient (MFCC), Frequency Domain Linear Prediction (FDLP), and Gammatone Frequency Cepstral Coefficient (GFCC), are also included to compare their performance with cochlear features. This comparison allows me to investigate a better front-end for a noise-robust SID system. Three different statistical classifiers: a Gaussian Mixture Model with Universal Background Model (GMM-UBM), a Support Vector Machine (SVM), and an I-vector were used to evaluate the performance. These statistical classifiers allow me to investigate nonlinearities in the cochlear front-ends. The performance is evaluated under clean and noisy conditions for a wide range of noise levels. Techniques to improve the performance of a cochlear algorithm are also investigated in this thesis. It was found that the application of a cube root and DCT on cochlear output enhances the SID accuracy substantially

    Binaural Cue Coding - Part I: Psychoacoustic Fundamentals and Design Principles

    Get PDF

    Frame Theory for Signal Processing in Psychoacoustics

    Full text link
    This review chapter aims to strengthen the link between frame theory and signal processing tasks in psychoacoustics. On the one side, the basic concepts of frame theory are presented and some proofs are provided to explain those concepts in some detail. The goal is to reveal to hearing scientists how this mathematical theory could be relevant for their research. In particular, we focus on frame theory in a filter bank approach, which is probably the most relevant view-point for audio signal processing. On the other side, basic psychoacoustic concepts are presented to stimulate mathematicians to apply their knowledge in this field

    Optimizing Stimulation Strategies in Cochlear Implants for Music Listening

    Get PDF
    Most cochlear implant (CI) strategies are optimized for speech characteristics while music enjoyment is signicantly below normal hearing performance. In this thesis, electrical stimulation strategies in CIs are analyzed for music input. A simulation chain consisting of two parallel paths, simulating normal hearing conditions and electrical hearing respectively, is utilized. One thesis objective is to congure and develop the sound processor of the CI chain to analyze dierent compression- and channel selection strategies to optimally capture the characteristics of music signals. A new set of knee points (KPs) for the compression function are investigated together with clustering of frequency bands. The N-of-M electrode selection strategy models the eect of a psychoacoustic masking threshold. In order to evaluate the performance of the CI model, the normal hearing model is considered a true reference. Similarity among the resulting neurograms of respective model are measured using the image analysis method Neurogram Similarity Index Measure (NSIM). The validation and resolution of NSIM is another objective of the thesis. Results indicate that NSIM is sensitive to no-activity regions in the neurograms and has diculties capturing small CI changes, i.e. compression settings. Further verication of the model setup is suggested together with investigating an alternative optimal electric hearing reference and/or objective similarity measure

    Efficient audio signal processing for embedded systems

    Get PDF
    We investigated two design strategies that would allow us to efficiently process audio signals on embedded systems such as mobile phones and portable electronics. In the first strategy, we exploit properties of the human auditory system to process audio signals. We designed a sound enhancement algorithm to make piezoelectric loudspeakers sound "richer" and "fuller," using a combination of bass extension and dynamic range compression. We also developed an audio energy reduction algorithm for loudspeaker power management by suppressing signal energy below the masking threshold. In the second strategy, we use low-power analog circuits to process the signal before digitizing it. We designed an analog front-end for sound detection and implemented it on a field programmable analog array (FPAA). The sound classifier front-end can be used in a wide range of applications because programmable floating-gate transistors are employed to store classifier weights. Moreover, we incorporated a feature selection algorithm to simplify the analog front-end. A machine learning algorithm AdaBoost is used to select the most relevant features for a particular sound detection application. We also designed the circuits to implement the AdaBoost-based analog classifier.PhDCommittee Chair: Anderson, David; Committee Member: Hasler, Jennifer; Committee Member: Hunt, William; Committee Member: Lanterman, Aaron; Committee Member: Minch, Bradle

    Models and experiments of binaural interactions

    Get PDF
    Dizertační práce popisuje modely a experimenty binaurální interakce se zaměřením na lidské slyšení. Prezentovány jsou modely mediální a laterální superior olivy~(MSO a LSO) fungujících na rate-code principu. Tyto modely jsou inspirovány nedávnými objevy v neurofyziologii a byly publikovány v Bouse et al., J. Acoust. Soc. Am. 2019. Modely navíc obsahují centrální části dekódující interaurální časové diference a interaurální úrovňové diference (ITD a ILD). Tyto části jsou pak schopné vyjádřit subjektivní lateralizaci v absolutních číslech. Predikce jak MSO, tak LSO modelu jsou porovnávány se subjektivními daty tj. lateralizací čistých tónů a úzkopásmových šumů, diskriminací ITD a ILD a diskriminací phase warpu. Jak lateralizační, tak diskriminační experimenty ukazují shodu mezi predikcemi modelů a subjektivními daty. Publikované modely jsou v této práci dále vylepšeny s cílem snížit výpočetní nároky modelů. Predikce vylepšených modelů jsou porovnány se subjektivními daty a predikcemi původních modelů na stejných testovacích datech. Dodatečně je ještě přidán experiment s čistými tóny s ITD versus IPD (interaurální fázová diference). Obě verze modelů ukazují dobrou shodu s lateralizačními i diskriminačními subjektivními daty. V některých případech vykazují nové modely lepší výsledky než modely původní. Experimenty binaurální interakce popisované v této dizertační práci jsou lateralizační experiment s 1-ERB (ekvivalentní pravoúhlá šířka pásma) širokými úzkopásmovými šumy s IPD nebo ILD a subjektivní hodnocení kvality metod odstraňujících rušení z DHRTF (differential head related transfer function), publikovány v Bouse et al., J. Acoust. Soc. Am 2019, a Storek et al., J. Audio Eng. Soc. 2016.This dissertation thesis presents models and experiments of binaural interactions in human hearing. The rate-code models of medial and lateral superior olives (MSO and LSO) are presented. The models are inspired by recent neurophysiological findings and published in Bouse et al., J. Acoust. Soc. Am. 2019. A feature of these models is that they contain central stages of interaural time difference and interaural level difference (ITD and ILD) processing. These stages give subjective lateralization expressed in absolute numbers. The predictions made by both MSO and LSO models are compared with subjective data on the lateralization of pure tones and narrow band noises, discrimination of the ITD and ILD, and discrimination of the phase warp. The lateralization and discrimination experiments show good agreement with the subjective data.The published models are further improved in this thesis to reduce computational demands. The improved model predictions are compared with both subjective experiments and former models data from the same test pool. Additionally, lateralization pure tone experiment on ITD versus IPD (interaural phase difference) was added to the test pool. Both versions of the models show good agreement with lateralization and discrimination subjective data. In some cases, new models show better performance than the old ones. The experiments of binaural interactions shown in this thesis are lateralization of 1-ERB (equivalent rectangular bandwidth) wide narrow band noises with IPD or ILD, and audible quality assessment of DHRTF (differential head related transfer functions) artifact reduction methods, presented in Bouse et al., J. Acoust. Soc. Am 2019, and Storek et al., J. Audio Eng. Soc. 2016
    corecore