126 research outputs found
Auf einem menschlichen Gehörmodell basierende Elektrodenstimulationsstrategie für Cochleaimplantate
Cochleaimplantate (CI), verbunden mit einer professionellen Rehabilitation,
haben mehreren hunderttausenden Hörgeschädigten die verbale Kommunikation
wieder ermöglicht. Betrachtet man jedoch die Rehabilitationserfolge, so
haben CI-Systeme inzwischen ihre Grenzen erreicht. Die Tatsache, dass die
meisten CI-Träger nicht in der Lage sind, Musik zu genießen oder einer
Konversation in geräuschvoller Umgebung zu folgen, zeigt, dass es noch Raum
für Verbesserungen gibt.Diese Dissertation stellt die neue
CI-Signalverarbeitungsstrategie Stimulation based on Auditory Modeling
(SAM) vor, die vollständig auf einem Computermodell des menschlichen
peripheren Hörsystems beruht.Im Rahmen der vorliegenden Arbeit wurde die
SAM Strategie dreifach evaluiert: mit vereinfachten Wahrnehmungsmodellen
von CI-Nutzern, mit fünf CI-Nutzern, und mit 27 Normalhörenden mittels
eines akustischen Modells der CI-Wahrnehmung. Die Evaluationsergebnisse
wurden stets mit Ergebnissen, die durch die Verwendung der Advanced
Combination Encoder (ACE) Strategie ermittelt wurden, verglichen. ACE
stellt die zurzeit verbreitetste Strategie dar. Erste Simulationen zeigten,
dass die Sprachverständlichkeit mit SAM genauso gut wie mit ACE ist.
Weiterhin lieferte SAM genauere binaurale Merkmale, was potentiell zu einer
Verbesserung der Schallquellenlokalisierungfähigkeit führen kann. Die
Simulationen zeigten ebenfalls einen erhöhten Anteil an zeitlichen
Pitchinformationen, welche von SAM bereitgestellt wurden. Die Ergebnisse
der nachfolgenden Pilotstudie mit fünf CI-Nutzern zeigten mehrere Vorteile
von SAM auf. Erstens war eine signifikante Verbesserung der
Tonhöhenunterscheidung bei Sinustönen und gesungenen Vokalen zu erkennen.
Zweitens bestätigten CI-Nutzer, die kontralateral mit einem Hörgerät
versorgt waren, eine natürlicheren Klangeindruck. Als ein sehr bedeutender
Vorteil stellte sich drittens heraus, dass sich alle Testpersonen in sehr
kurzer Zeit (ca. 10 bis 30 Minuten) an SAM gewöhnen konnten. Dies ist
besonders wichtig, da typischerweise Wochen oder Monate nötig sind. Tests
mit Normalhörenden lieferten weitere Nachweise für die verbesserte
Tonhöhenunterscheidung mit SAM.Obwohl SAM noch keine marktreife Alternative
ist, versucht sie den Weg für zukünftige Strategien, die auf Gehörmodellen
beruhen, zu ebnen und ist somit ein erfolgversprechender Kandidat für
weitere Forschungsarbeiten.Cochlear implants (CIs) combined with professional rehabilitation have
enabled several hundreds of thousands of hearing-impaired individuals to
re-enter the world of verbal communication. Though very successful, current
CI systems seem to have reached their peak potential. The fact that most
recipients claim not to enjoy listening to music and are not capable of
carrying on a conversation in noisy or reverberative environments shows
that there is still room for improvement.This dissertation presents a new
cochlear implant signal processing strategy called Stimulation based on
Auditory Modeling (SAM), which is completely based on a computational model
of the human peripheral auditory system.SAM has been evaluated through
simplified models of CI listeners, with five cochlear implant users, and
with 27 normal-hearing subjects using an acoustic model of CI perception.
Results have always been compared to those acquired using Advanced
Combination Encoder (ACE), which is today’s most prevalent CI strategy.
First simulations showed that speech intelligibility of CI users fitted
with SAM should be just as good as that of CI listeners fitted with ACE.
Furthermore, it has been shown that SAM provides more accurate binaural
cues, which can potentially enhance the sound source localization ability
of bilaterally fitted implantees. Simulations have also revealed an
increased amount of temporal pitch information provided by SAM. The
subsequent pilot study, which ran smoothly, revealed several benefits of
using SAM. First, there was a significant improvement in pitch
discrimination of pure tones and sung vowels. Second, CI users fitted with
a contralateral hearing aid reported a more natural sound of both speech
and music. Third, all subjects were accustomed to SAM in a very short
period of time (in the order of 10 to 30 minutes), which is particularly
important given that a successful CI strategy change typically takes weeks
to months. An additional test with 27 normal-hearing listeners using an
acoustic model of CI perception delivered further evidence for improved
pitch discrimination ability with SAM as compared to ACE.Although SAM is
not yet a market-ready alternative, it strives to pave the way for future
strategies based on auditory models and it is a promising candidate for
further research and investigation
Object-based Modeling of Audio for Coding and Source Separation
This thesis studies several data decomposition algorithms for obtaining an object-based representation of an audio signal. The estimation of the representation parameters are coupled with audio-specific criteria, such as the spectral redundancy, sparsity, perceptual relevance and spatial position of sounds. The objective is to obtain an audio signal representation that is composed of meaningful entities called audio objects that reflect the properties of real-world sound objects and events. The estimation of the object-based model is based on magnitude spectrogram redundancy using non-negative matrix factorization with extensions to multichannel and complex-valued data. The benefits of working with object-based audio representations over the conventional time-frequency bin-wise processing are studied. The two main applications of the object-based audio representations proposed in this thesis are spatial audio coding and sound source separation from multichannel microphone array recordings.
In the proposed spatial audio coding algorithm, the audio objects are estimated from the multichannel magnitude spectrogram. The audio objects are used for recovering the content of each original channel from a single downmixed signal, using time-frequency filtering. The perceptual relevance of modeling the audio signal is considered in the estimation of the parameters of the object-based model, and the sparsity of the model is utilized in encoding its parameters. Additionally, a quantization of the model parameters is proposed that reflects the perceptual relevance of each quantized element.
The proposed object-based spatial audio coding algorithm is evaluated via listening tests and comparing the overall perceptual quality to conventional time-frequency block-wise methods at the same bitrates. The proposed approach is found to produce comparable coding efficiency while providing additional functionality via the object-based coding domain representation, such as the blind separation of the mixture of sound sources in the encoded channels.
For the sound source separation from multichannel audio recorded by a microphone array, a method combining an object-based magnitude model and spatial covariance matrix estimation is considered. A direction of arrival-based model for the spatial covariance matrices of the sound sources is proposed. Unlike the conventional approaches, the estimation of the parameters of the proposed spatial covariance matrix model ensures a spatially coherent solution for the spatial parameterization of the sound sources. The separation quality is measured with objective criteria and the proposed method is shown to improve over the state-of-the-art sound source separation methods, with recordings done using a small microphone array
Evaluation of audio source separation in the context of 3D audio
The emergence and broader availability of 3D audio systems allows for new possibilities in mixing, post-production and playback of audio content. Used in movie post-production for cinemas, as special effect by disk jockeys for example and even for live concerts, 3D rendering immerses the listener more than ever before. When existing audio material is to be employed, Audio Source Separation (ASS) techniques enable the extraction of single sources from a mixture. Modern mixing approaches for 3D audio do not assign individual gains and delays for each source in every channel. A sound scene is rather designed, with individual sources treated as objects to be placed within a scene. The hardware layer is mostly irrelevant for mixing in such a setting. ASS is therefore a valuable tool to ¿disassemble¿ amore traditional monophonic, stereophonic, or multichannel mix. However, due to the complexity of the ASS problem, extracted sources are subject to degradations. While state-of-the-art objective measures for ASS quality build on monaural auditory models, they don¿t take into account binaural listening and the psychoacoustic phenomena that are involved, such as binaural unmasking. In this thesis, an extension to Perceptive Evaluation Methods for Audio Source Separation (PEASS) [41] is proposed with spatial rendering in mind. Additionally a new binaural model for ASS evaluation in the context of 3D audio is presented. The performance of the basic and extended versions of PEASS, as well as the proposed binaural model is evaluated in two subjective studies. The first study is conducted with binaural spatialisation presented over headphones, while the second experiment uses a 3DWave Field Synthesis (WFS) system. A set of artificial ASS degradation algorithms is proposed and used for the stimuli of the subjective studies. Results of the studies indicate monotonic decrease of the perceived quality as a function of the amounts of degradations introduced. The most important degradation is found to be target distortion, followed by onset misallocation and musical noise-type artifacts. Additionally, spatialising the extracted target source away from the residue or having it louder than the residue negatively affects the results, indicating a perceived quality degradation. In 3D WFS conditions, results show evidence for monaural and binaural unmasking. The performance of the proposed binauralmodel is consistently superior to that of the basic or extended PEASS versions. In the binaural spatialisation experiment, a correlation coefficient of 0.60 between subjective and objective results is achieved, versus 0.57 and 0.53 with the extended and basic PEASS version respectively. For the 3D WFS study, the binaural model achieves 0.67 prediction accuracy whereas both PEASS versions get 0.57. The perceptual validity of the WFS formulation is also verified in a localisation experiment. Vertical localisation is found to be nearly as good as physical source localisation for an extended listening area with localisation precision of 6± - 9±. The response time is also used as an indicator of localisation performance
Creating music by listening
Thesis (Ph. D.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2005.Includes bibliographical references (p. 127-139).Machines have the power and potential to make expressive music on their own. This thesis aims to computationally model the process of creating music using experience from listening to examples. Our unbiased signal-based solution models the life cycle of listening, composing, and performing, turning the machine into an active musician, instead of simply an instrument. We accomplish this through an analysis-synthesis technique by combined perceptual and structural modeling of the musical surface, which leads to a minimal data representation. We introduce a music cognition framework that results from the interaction of psychoacoustically grounded causal listening, a time-lag embedded feature representation, and perceptual similarity clustering. Our bottom-up analysis intends to be generic and uniform by recursively revealing metrical hierarchies and structures of pitch, rhythm, and timbre. Training is suggested for top-down un-biased supervision, and is demonstrated with the prediction of downbeat. This musical intelligence enables a range of original manipulations including song alignment, music restoration, cross-synthesis or song morphing, and ultimately the synthesis of original pieces.by Tristan Jehan.Ph.D
Real-time Sound Source Separation For Music Applications
Sound source separation refers to the task of extracting individual sound sources from some number of mixtures of those sound sources. In this thesis, a novel sound source separation algorithm for musical applications is presented. It leverages the fact that the vast majority of commercially recorded music since the 1950s has been mixed down for two channel reproduction, more commonly known as stereo. The algorithm presented in Chapter 3 in this thesis requires no prior knowledge or learning and performs the task of separation based purely on azimuth discrimination within the stereo field. The algorithm exploits the use of the pan pot as a means to achieve image localisation within stereophonic recordings. As such, only an interaural intensity difference exists between left and right channels for a single source. We use gain scaling and phase cancellation techniques to expose frequency dependent nulls across the azimuth domain, from which source separation and resynthesis is carried out. The algorithm is demonstrated to be state of the art in the field of sound source separation but also to be a useful pre-process to other tasks such as music segmentation and surround sound upmixing
Physiology, Psychoacoustics and Cognition in Normal and Impaired Hearing
otorhinolaryngology; neurosciences; hearin
- …