228 research outputs found
Acoustic Event Detection from Weakly Labeled Data Using Auditory Salience
Acoustic Event Detection (AED) is an important task of machine listening which, in recent years, has been addressed using common machine learning methods like Non-negative Matrix Factorization (NMF) or deep learning. However, most of these approaches do not take into consideration the way that human auditory system detects
salient sounds. In this work, we propose a method for AED using weakly labeled data that combines a Non-negative Matrix Factorization model with a salience model based on predictive coding in the form of Kalman filters. We show that models of auditory perception, particularly auditory salience, can be successfully incorporated into existing AED methods and improve their performance on rare event
detection. We evaluate the method on the Task2 of DCASE2017 Challenge
Deep Learning for Audio Signal Processing
Given the recent surge in developments of deep learning, this article
provides a review of the state-of-the-art deep learning techniques for audio
signal processing. Speech, music, and environmental sound processing are
considered side-by-side, in order to point out similarities and differences
between the domains, highlighting general methods, problems, key references,
and potential for cross-fertilization between areas. The dominant feature
representations (in particular, log-mel spectra and raw waveform) and deep
learning models are reviewed, including convolutional neural networks, variants
of the long short-term memory architecture, as well as more audio-specific
neural network models. Subsequently, prominent deep learning application areas
are covered, i.e. audio recognition (automatic speech recognition, music
information retrieval, environmental sound detection, localization and
tracking) and synthesis and transformation (source separation, audio
enhancement, generative models for speech, sound, and music synthesis).
Finally, key issues and future questions regarding deep learning applied to
audio signal processing are identified.Comment: 15 pages, 2 pdf figure
Spatial aspects of auditory salience
Models of auditory salience aim to predict which sounds attract people’s attention, and their proposed applications range from soundscape design to machine listening systems and object-based broadcasting. A few different types of models have been proposed, but one of the areas where most of them still fall short is spatial aspects of sound – they usually operate on mono signals and do not consider spatial auditory scenes. Part of the reason why this is the case might be that the relationship between auditory salience and position of sound is still not clear. In addition, methods used to measure auditory salience vary greatly, and authors in the field do not always use the same definition of salience.In Part I, this thesis aims to answer questions about the effect of spatial location of sound on auditory salience. This is done in four different experiments, which are based on previously published experimental methods but adapted to measure spatial effects. In general, the combined results of these experiments do not support the hypothesis that the spatial position of a sound alone influences how salient the sound is. However, they do show that unexpectedchanges in position might activate the deviance detection mechanism and therefore be salient. In addition, an experiment comparing three of the methods used reveals at least twodimensions of salience, which are measured by different methods to different extent. This emphasises the importance of carefully considering which experimental methods are used tomeasure auditory salience, and also providing a clear definition of what type of salience is of interest.Part II demonstrates how spatial position of sound can be incorporated into an auditory salience model. The results of experiments described in this thesis support the idea that the basis of auditory salience is the violation of expectations. The surprise caused by a sudden change in sound position can therefore be modelled by a Kalman-filter-based deviance detection model, which predicts experimental data discussed above with good accuracy.Finally, an example is given of how an application of such a model can improve the performance of a machine learning algorithm for acoustic event detection
DISSOCIABLE MECHANISMS OF CONCURRENT SPEECH IDENTIFICATION IN NOISE AT CORTICAL AND SUBCORTICAL LEVELS.
When two vowels with different fundamental frequencies (F0s) are presented concurrently, listeners often hear two voices producing different vowels on different pitches. Parsing of this simultaneous speech can also be affected by the signal-to-noise ratio (SNR) in the auditory scene. The extraction and interaction of F0 and SNR cues may occur at multiple levels of the auditory system. The major aims of this dissertation are to elucidate the neural mechanisms and time course of concurrent speech perception in clean and in degraded listening conditions and its behavioral correlates. In two complementary experiments, electrical brain activity (EEG) was recorded at cortical (EEG Study #1) and subcortical (FFR Study #2) levels while participants heard double-vowel stimuli whose fundamental frequencies (F0s) differed by zero and four semitones (STs) presented in either clean or noise degraded (+5 dB SNR) conditions. Behaviorally, listeners were more accurate in identifying both vowels for larger F0 separations (i.e., 4ST; with pitch cues), and this F0-benefit was more pronounced at more favorable SNRs. Time-frequency analysis of cortical EEG oscillations (i.e., brain rhythms) revealed a dynamic time course for concurrent speech processing that depended on both extrinsic (SNR) and intrinsic (pitch) acoustic factors. Early high frequency activity reflected pre-perceptual encoding of acoustic features (~200 ms) and the quality (i.e., SNR) of the speech signal (~250-350ms), whereas later-evolving low-frequency rhythms (~400-500ms) reflected post-perceptual, cognitive operations that covaried with listening effort and task demands. Analysis of subcortical responses indicated that while FFRs provided a high-fidelity representation of double vowel stimuli and the spectro-temporal nonlinear properties of the peripheral auditory system. FFR activity largely reflected the neural encoding of stimulus features (exogenous coding) rather than perceptual outcomes, but timbre (F1) could predict the speed in noise conditions. Taken together, results of this dissertation suggest that subcortical auditory processing reflects mostly exogenous (acoustic) feature encoding in stark contrast to cortical activity, which reflects perceptual and cognitive aspects of concurrent speech perception. By studying multiple brain indices underlying an identical task, these studies provide a more comprehensive window into the hierarchy of brain mechanisms and time-course of concurrent speech processing
Evaluation of product sound design within the context of emotion design and emotional branding
Thesis (Master)--Izmir Institute of Technology, Industrial Design, Izmir, 2005Includes bibliographical references (leaves: 111-122)Text in English; Abstract: Turkish and Englishxi, 127 leavesThe main purpose of this thesis is to set out the relationships between the work of product designers and the perceptions of costumers regarding the acceptability of product sounds. Product design that provides aesthetic appeal, pleasure and satisfaction can greatly influence success of a product. Sound as a cognitive artifact, plays a significant role in the cognition of product interaction and in shaping its identity. This thesis will review emotion theories end their application to sound design and sound quality modeling, the measurement of emotional responses to sound, and the relationship between psycho-acoustical sound descriptions and emotions. In addition to that, affects of sounds to emotionally significant brands will be evaluated so as to examine marketing values. One of the main purposes of chapter 2 is to prove knowledge about psychoacoustics; as product sound quality is a basic understanding of the underlying psychoacoustics phenomena. Perception; particularly sound perception and its elements are described during chapter 2. Starting with the description of sound wave and how our hear works, sound perception and auditory sensation is reviewed in continuation. In chapter 3, product sound quality concept and its evaluation principles are reviewed. Thus, in order to understand the coupling between the acoustic perception and the product design; knowledge of general principles for product sound quality are required. Chapter 4 can be considered as two main sections. .How does emotion act as a delighter in product design?. is examined to better understand customer and user experiences impacting pleasure-ability in first section. In the second section, emotion is evaluated through sound design. A qualitative evaluation is done so as to examine cognition and emotion in sound perception. Chapter 5 leads subject through emotional branding. Sounds that carry the brand.s identity are evaluated within. Sound design is re-evaluated as marketing strategy and examined with several instances. Keywords: Product sound design, psychoacoustics, product sound quality, emotion design, emotional branding
- …