228 research outputs found

    Acoustic Event Detection from Weakly Labeled Data Using Auditory Salience

    Get PDF
    Acoustic Event Detection (AED) is an important task of machine listening which, in recent years, has been addressed using common machine learning methods like Non-negative Matrix Factorization (NMF) or deep learning. However, most of these approaches do not take into consideration the way that human auditory system detects salient sounds. In this work, we propose a method for AED using weakly labeled data that combines a Non-negative Matrix Factorization model with a salience model based on predictive coding in the form of Kalman filters. We show that models of auditory perception, particularly auditory salience, can be successfully incorporated into existing AED methods and improve their performance on rare event detection. We evaluate the method on the Task2 of DCASE2017 Challenge

    Deep Learning for Audio Signal Processing

    Full text link
    Given the recent surge in developments of deep learning, this article provides a review of the state-of-the-art deep learning techniques for audio signal processing. Speech, music, and environmental sound processing are considered side-by-side, in order to point out similarities and differences between the domains, highlighting general methods, problems, key references, and potential for cross-fertilization between areas. The dominant feature representations (in particular, log-mel spectra and raw waveform) and deep learning models are reviewed, including convolutional neural networks, variants of the long short-term memory architecture, as well as more audio-specific neural network models. Subsequently, prominent deep learning application areas are covered, i.e. audio recognition (automatic speech recognition, music information retrieval, environmental sound detection, localization and tracking) and synthesis and transformation (source separation, audio enhancement, generative models for speech, sound, and music synthesis). Finally, key issues and future questions regarding deep learning applied to audio signal processing are identified.Comment: 15 pages, 2 pdf figure

    Spatial aspects of auditory salience

    Get PDF
    Models of auditory salience aim to predict which sounds attract people’s attention, and their proposed applications range from soundscape design to machine listening systems and object-based broadcasting. A few different types of models have been proposed, but one of the areas where most of them still fall short is spatial aspects of sound – they usually operate on mono signals and do not consider spatial auditory scenes. Part of the reason why this is the case might be that the relationship between auditory salience and position of sound is still not clear. In addition, methods used to measure auditory salience vary greatly, and authors in the field do not always use the same definition of salience.In Part I, this thesis aims to answer questions about the effect of spatial location of sound on auditory salience. This is done in four different experiments, which are based on previously published experimental methods but adapted to measure spatial effects. In general, the combined results of these experiments do not support the hypothesis that the spatial position of a sound alone influences how salient the sound is. However, they do show that unexpectedchanges in position might activate the deviance detection mechanism and therefore be salient. In addition, an experiment comparing three of the methods used reveals at least twodimensions of salience, which are measured by different methods to different extent. This emphasises the importance of carefully considering which experimental methods are used tomeasure auditory salience, and also providing a clear definition of what type of salience is of interest.Part II demonstrates how spatial position of sound can be incorporated into an auditory salience model. The results of experiments described in this thesis support the idea that the basis of auditory salience is the violation of expectations. The surprise caused by a sudden change in sound position can therefore be modelled by a Kalman-filter-based deviance detection model, which predicts experimental data discussed above with good accuracy.Finally, an example is given of how an application of such a model can improve the performance of a machine learning algorithm for acoustic event detection

    DISSOCIABLE MECHANISMS OF CONCURRENT SPEECH IDENTIFICATION IN NOISE AT CORTICAL AND SUBCORTICAL LEVELS.

    Get PDF
    When two vowels with different fundamental frequencies (F0s) are presented concurrently, listeners often hear two voices producing different vowels on different pitches. Parsing of this simultaneous speech can also be affected by the signal-to-noise ratio (SNR) in the auditory scene. The extraction and interaction of F0 and SNR cues may occur at multiple levels of the auditory system. The major aims of this dissertation are to elucidate the neural mechanisms and time course of concurrent speech perception in clean and in degraded listening conditions and its behavioral correlates. In two complementary experiments, electrical brain activity (EEG) was recorded at cortical (EEG Study #1) and subcortical (FFR Study #2) levels while participants heard double-vowel stimuli whose fundamental frequencies (F0s) differed by zero and four semitones (STs) presented in either clean or noise degraded (+5 dB SNR) conditions. Behaviorally, listeners were more accurate in identifying both vowels for larger F0 separations (i.e., 4ST; with pitch cues), and this F0-benefit was more pronounced at more favorable SNRs. Time-frequency analysis of cortical EEG oscillations (i.e., brain rhythms) revealed a dynamic time course for concurrent speech processing that depended on both extrinsic (SNR) and intrinsic (pitch) acoustic factors. Early high frequency activity reflected pre-perceptual encoding of acoustic features (~200 ms) and the quality (i.e., SNR) of the speech signal (~250-350ms), whereas later-evolving low-frequency rhythms (~400-500ms) reflected post-perceptual, cognitive operations that covaried with listening effort and task demands. Analysis of subcortical responses indicated that while FFRs provided a high-fidelity representation of double vowel stimuli and the spectro-temporal nonlinear properties of the peripheral auditory system. FFR activity largely reflected the neural encoding of stimulus features (exogenous coding) rather than perceptual outcomes, but timbre (F1) could predict the speed in noise conditions. Taken together, results of this dissertation suggest that subcortical auditory processing reflects mostly exogenous (acoustic) feature encoding in stark contrast to cortical activity, which reflects perceptual and cognitive aspects of concurrent speech perception. By studying multiple brain indices underlying an identical task, these studies provide a more comprehensive window into the hierarchy of brain mechanisms and time-course of concurrent speech processing

    Evaluation of product sound design within the context of emotion design and emotional branding

    Get PDF
    Thesis (Master)--Izmir Institute of Technology, Industrial Design, Izmir, 2005Includes bibliographical references (leaves: 111-122)Text in English; Abstract: Turkish and Englishxi, 127 leavesThe main purpose of this thesis is to set out the relationships between the work of product designers and the perceptions of costumers regarding the acceptability of product sounds. Product design that provides aesthetic appeal, pleasure and satisfaction can greatly influence success of a product. Sound as a cognitive artifact, plays a significant role in the cognition of product interaction and in shaping its identity. This thesis will review emotion theories end their application to sound design and sound quality modeling, the measurement of emotional responses to sound, and the relationship between psycho-acoustical sound descriptions and emotions. In addition to that, affects of sounds to emotionally significant brands will be evaluated so as to examine marketing values. One of the main purposes of chapter 2 is to prove knowledge about psychoacoustics; as product sound quality is a basic understanding of the underlying psychoacoustics phenomena. Perception; particularly sound perception and its elements are described during chapter 2. Starting with the description of sound wave and how our hear works, sound perception and auditory sensation is reviewed in continuation. In chapter 3, product sound quality concept and its evaluation principles are reviewed. Thus, in order to understand the coupling between the acoustic perception and the product design; knowledge of general principles for product sound quality are required. Chapter 4 can be considered as two main sections. .How does emotion act as a delighter in product design?. is examined to better understand customer and user experiences impacting pleasure-ability in first section. In the second section, emotion is evaluated through sound design. A qualitative evaluation is done so as to examine cognition and emotion in sound perception. Chapter 5 leads subject through emotional branding. Sounds that carry the brand.s identity are evaluated within. Sound design is re-evaluated as marketing strategy and examined with several instances. Keywords: Product sound design, psychoacoustics, product sound quality, emotion design, emotional branding
    • …
    corecore