1,231 research outputs found

    Probabilistic Modeling Paradigms for Audio Source Separation

    Get PDF
    This is the author's final version of the article, first published as E. Vincent, M. G. Jafari, S. A. Abdallah, M. D. Plumbley, M. E. Davies. Probabilistic Modeling Paradigms for Audio Source Separation. In W. Wang (Ed), Machine Audition: Principles, Algorithms and Systems. Chapter 7, pp. 162-185. IGI Global, 2011. ISBN 978-1-61520-919-4. DOI: 10.4018/978-1-61520-919-4.ch007file: VincentJafariAbdallahPD11-probabilistic.pdf:v\VincentJafariAbdallahPD11-probabilistic.pdf:PDF owner: markp timestamp: 2011.02.04file: VincentJafariAbdallahPD11-probabilistic.pdf:v\VincentJafariAbdallahPD11-probabilistic.pdf:PDF owner: markp timestamp: 2011.02.04Most sound scenes result from the superposition of several sources, which can be separately perceived and analyzed by human listeners. Source separation aims to provide machine listeners with similar skills by extracting the sounds of individual sources from a given scene. Existing separation systems operate either by emulating the human auditory system or by inferring the parameters of probabilistic sound models. In this chapter, the authors focus on the latter approach and provide a joint overview of established and recent models, including independent component analysis, local time-frequency models and spectral template-based models. They show that most models are instances of one of the following two general paradigms: linear modeling or variance modeling. They compare the merits of either paradigm and report objective performance figures. They also,conclude by discussing promising combinations of probabilistic priors and inference algorithms that could form the basis of future state-of-the-art systems

    Brian Hears: Online Auditory Processing Using Vectorization Over Channels

    Get PDF
    The human cochlea includes about 3000 inner hair cells which filter sounds at frequencies between 20 Hz and 20 kHz. This massively parallel frequency analysis is reflected in models of auditory processing, which are often based on banks of filters. However, existing implementations do not exploit this parallelism. Here we propose algorithms to simulate these models by vectorizing computation over frequency channels, which are implemented in “Brian Hears,” a library for the spiking neural network simulator package “Brian.” This approach allows us to use high-level programming languages such as Python, because with vectorized operations, the computational cost of interpretation represents a small fraction of the total cost. This makes it possible to define and simulate complex models in a simple way, while all previous implementations were model-specific. In addition, we show that these algorithms can be naturally parallelized using graphics processing units, yielding substantial speed improvements. We demonstrate these algorithms with several state-of-the-art cochlear models, and show that they compare favorably with existing, less flexible, implementations

    Special Topics in Information Technology

    Get PDF
    This open access book presents outstanding doctoral dissertations in Information Technology from the Department of Electronics, Information and Bioengineering, Politecnico di Milano, Italy. Information Technology has always been highly interdisciplinary, as many aspects have to be considered in IT systems. The doctoral studies program in IT at Politecnico di Milano emphasizes this interdisciplinary nature, which is becoming more and more important in recent technological advances, in collaborative projects, and in the education of young researchers. Accordingly, the focus of advanced research is on pursuing a rigorous approach to specific research topics starting from a broad background in various areas of Information Technology, especially Computer Science and Engineering, Electronics, Systems and Control, and Telecommunications. Each year, more than 50 PhDs graduate from the program. This book gathers the outcomes of the best theses defended in 2021-22 and selected for the IT PhD Award. Each of the authors provides a chapter summarizing his/her findings, including an introduction, description of methods, main achievements and future work on the topic. Hence, the book provides a cutting-edge overview of the latest research trends in Information Technology at Politecnico di Milano, presented in an easy-to-read format that will also appeal to non-specialists

    EFFICACY OF THREE BACKWARD MASKING SIGNALS

    Get PDF
    Increased backward masking has been correlated with Auditory Processing Disorders (APD). An efficacious test of the backward masking function that is compatible with naïve listeners could have clinical utility in diagnosing APDs. In order to determine an appropriate probe for such a test, three 20-ms signal-types were compared for ease-of-task. Response times (RT) were taken as a proxy for ease-of-task. Seven participants used a method-of-adjustment to track threshold in the presence of a 50-ms broadband-Gausian-noise backward-masker. The signal-types yielded two comparisons: Linear rise-fall on a 1000Hz sine-wave versus a “chirp” (750 Hz-4000Hz); Linear rise-fall vs Blackman gating function on a 1000Hz sine-wave. The results suggest that signal-type is a significant factor in participant response time and hence, confidence. Moreover, the contribution of signal-type to RT is not confounded by any potential interaction terms, such as inter-stimulus interval (ISI). The signal-type that yielded the quickest RTs across all participants, ISIs, and intensity levels was the 20-ms, 1000 Hz sine-wave fitted with a trapezoidal gating function. This may be the most efficacious signal-type to serve as a probe in a clinical test of backward masking

    Studies on auditory processing of spatial sound and speech by neuromagnetic measurements and computational modeling

    Get PDF
    This thesis addresses the auditory processing of spatial sound and speech. The thesis consists of two research branches: one, magnetoencephalographic (MEG) brain measurements on spatial localization and speech perception, and two, construction of computational auditory scene analysis models, which exploit spatial cues and other cues that are robust in reverberant environments. In the MEG research branch, we have addressed the processing of the spatial stimuli in the auditory cortex through studies concentrating to the following issues: processing of sound source location with realistic spatial stimuli, spatial processing of speech vs. non-speech stimuli, and finally processing of range of spatial location cues in the auditory cortex. Our main findings are as follows: Both auditory cortices respond more vigorously to contralaterally presented sound, whereby responses exhibit systematic tuning to the sound source direction. Responses and response dynamics are generally larger in the right hemisphere, which indicates right hemispheric specialization in the spatial processing. These observations hold over the range of speech and non-speech stimuli. The responses to speech sounds are decreased markedly if the natural periodic speech excitation is changed to random noise sequence. Moreover, the activation strength of the right auditory cortex seems to reflect processing of spatial cues, so that the dynamical differences are larger and the angular organization is more orderly for realistic spatial stimuli compared to impoverished spatial stimuli (e.g. isolated interaural time and level difference cues). In the auditory modeling part, we constructed models for the recognition of speech in the presence of interference. Firstly, we constructed a system using binaural cues in order to segregate target speech from spatially separated interference, and showed that the system outperforms a conventional approach at low signal-to-noise ratios. Secondly, we constructed a single channel system that is robust in room reverberation using strong speech modulations as robust cues, and showed that it outperforms a baseline approach in the most reverberant test conditions. In this case, the baseline approach was specifically optimized for recognition of speech in reverberation. In summary, this thesis addresses the auditory processing of spatial sound and speech in both brain measurement and auditory modeling. The studies aim to clarify cortical processes of sound localization, and to construct computational auditory models for sound segregation exploiting spatial cues, and strong speech modulations as robust cues in reverberation.reviewe

    Special Topics in Information Technology

    Get PDF
    This open access book presents outstanding doctoral dissertations in Information Technology from the Department of Electronics, Information and Bioengineering, Politecnico di Milano, Italy. Information Technology has always been highly interdisciplinary, as many aspects have to be considered in IT systems. The doctoral studies program in IT at Politecnico di Milano emphasizes this interdisciplinary nature, which is becoming more and more important in recent technological advances, in collaborative projects, and in the education of young researchers. Accordingly, the focus of advanced research is on pursuing a rigorous approach to specific research topics starting from a broad background in various areas of Information Technology, especially Computer Science and Engineering, Electronics, Systems and Control, and Telecommunications. Each year, more than 50 PhDs graduate from the program. This book gathers the outcomes of the best theses defended in 2021-22 and selected for the IT PhD Award. Each of the authors provides a chapter summarizing his/her findings, including an introduction, description of methods, main achievements and future work on the topic. Hence, the book provides a cutting-edge overview of the latest research trends in Information Technology at Politecnico di Milano, presented in an easy-to-read format that will also appeal to non-specialists

    Rapid Detection and Use of Non-verbal Confidence Cues During Adaptive Memory Biasing

    Get PDF
    Prior literature has demonstrated that participants use probabilistic, verbal memory cues (‘Likely Old’ or ‘Likely New’) to adaptively bias their recognition judgments. Here we tested whether this is more effective when the cues are the actual videotaped responses of others taking the same recognition test, based on the possibility that observers might use non-verbal confidence signs to modulate their degree of cue reliance on each trial. Experiment 1 demonstrated observers could reliably rate the confidence of others (Models) from single recognition responses (‘old’ or ‘new’) and that when doing so, the latency of the model’s response was the primary influence, with a secondary influence of non-latency information presumably linked to prosody or facial expression. In Experiment 2, subjects were asked to use these video-taped recognition responses as memory cues while they undertook the same recognition test. The model’s responses reliably biased the observer’s recognition judgments and were reliably moderated by the model’s response latency; non-latency signs of confidence were not reliably influential. In Experiment 3, observers were asked to explicitly rate the confidence of the model’s responses before using them during their own recognition judgments. Their initial ratings of the model’s confidence were sensitive to latency and non-latency confidence signs; however, the subsequent recognition judgments of the observers were again only sensitive to the latency of the model’s recognition judgments. Overall, subjects can rapidly read non-verbal confidence information contained in brief single recognition responses. However, when using these to inform their own recognition judgments, only response latency appears to reliably moderate the biasing of recognition judgments
    corecore