136 research outputs found

    What makes audio event detection harder than classification?

    Get PDF
    Audio event classification and detection (AEC/D) have been an active field of research in recent years [1]–[3]. So far, beside a majority of works focusing on the improving overall performance in terms of accuracy [2], [1], [4], [5], many other aspects have also been studied, including noise robustness [6]–[7], [8], overlapping event handling [9], [10], [11], [12], early event detection [13], multi-channel fusion [14], as well as generic representation [15]. However, little attention has been paid to the important aspect of event detection systems on continuous streams: false positive reduction. False positives, i.e., event instances that are spuriously detected by a detection system, and subsequently draw attention to them, are arguably one of the most important problems faced by different applications like ambient intelligence and surveillance. To the best knowledge of the authors, this is the first work explicitly addressing this problem

    Weighted and Multi-Task Loss for Rare Audio Event Detection

    Get PDF
    We present in this paper two loss functions tailored for rare audio event detection in audio streams. The weighted loss is designed to tackle the common issue of imbalanced data in background/foreground classification while the multi-task loss enables the networks to simultaneously model the class distribution and the temporal structures of the target events for recognition. We study the proposed loss functions with deep neural networks (DNNs) and convolutional neural networks (CNNs) coupled with state-of-the-art phase-aware signal enhancement. Experiments on the DCASE 2017 challenge’s data show that our system with the proposed losses significantly outperforms not only the DCASE 2017 baseline but also our baseline which has a similar network architecture and a standard loss function

    Intelligent Control of Dynamic Range Compressor

    Get PDF
    PhD ThesisMusic production is an essential element in the value chain of modern music. It includes enhancing the recorded audio tracks, balancing the loudness level of multiple tracks as well as making artistic decisions to satisfy music genre, style and emotion. Similarly to related professions in creative media production, the tools for music making are now highly computerised. However, many parts of the work remain labour intensive and time consuming. The demand for intelligent tools is therefore growing. This situation encourages the emerging trend of ever increasing research into intelligent music production tools. Since audio effects are among the main tools used by music producers, there are many discussions and developments targeting the controlling mechanism of audio effects. This thesis is aiming at pushing the boundaries in this field by investigating the intelligent control of one of the essential audio effects, the dynamic range compressor. This research presents an innovative control system design. The core of this design is to learn from a reference audio, and control the dynamic range compressor to make the processed input audio sounds as close as possible to the reference. One of the proposed approaches can be divided into three stages, a feature extractor, a trained regression model, and an objective evaluation algorithm. In the feature extractor stage we firstly test feature sets using conventional audio features commonly used in speech and audio signal analyses. Substantially, we test handcrafted audio features specifically designed to characterise audio properties related to the dynamic range of audio samples. Research into feature design has been completed at different levels of complexity. A series of feature selection schemes are also assessed to select the optimal feature sets from both conventional and specifically designed audio features. In the subsequent stage of the research, feature extraction is replaced by a feature learning deep neural network (DNN). This is addressing the problem that the previous features are exclusive to each parameter, while a general feature extractor may be formed using DNN. A universal feature extractor can reduce the computational cost and become easier to adapt to more complex audio materials as well. The second stage of the control system is a trained regression model. Random forest regression is selected from several algorithms using experimental validation. Since different feature extractors are tested with increasingly complex audio material, as well as exclusive to the DRC’s parameters, e.g., attack time or compression ratio, separate models are trained and tested respectively. The third component of our approach is a method for evaluation. A computational audio similarity algorithm was designed to verify the results using auditory models. This algorithm is based on estimating the distance between two statistical models fitted on perceptually motivated audio features characterising similarity in loudness and timbre. Finally, the overall system is evaluated with both objective and subjective methods. The main contribution of this Thesis is a method for using a reference audio to control a dynamic range compressor. Besides the system design, the analysis of the evaluation provides useful insights of the relations between audio effects and audio features as well as auditory perception. The research is conducted in a way that it is possible to transfer the knowledge to other audio effects and other use case scenarios, providing an alternative research direction in the field of intelligent music production and simplifying how audio effects are controlled for end users.

    Self-Supervised Pretraining and Transfer Learning on fMRI Data with Transformers

    Get PDF
    Transfer learning is a machine learning technique founded on the idea that knowledge acquired by a model during “pretraining” on a source task can be transferred to the learning of a target task. Successful transfer learning can result in improved performance, faster convergence, and reduced demand for data. This technique is particularly desirable for the task of brain decoding in the domain of functional magnetic resonance imaging (fMRI), wherein even the most modern machine learning methods can struggle to decode labelled features of brain images. This challenge is due to the highly complex underlying signal, physical and neurological differences between brains, low data collection throughput, and other factors. Transfer learning is exciting in its potential to mitigate these challenges, but with this application still in its infancy, we must begin on the ground floor. The goals of this thesis were to design, implement, and evaluate a framework for pretraining and transfer learning on arbitrary fMRI datasets, then demonstrate its performance with respect to the literature, and achieve substantive progress toward generalized pretrained models of the brain. The primary contribution is our novel framework which achieves these goals, called BEAT, which stands for Bi-directional Encoders for Auditory Tasks. The design and implementation of BEAT include adapting state-of-the-art deep learning architectures to sequences of fMRI data, as well as a novel self-supervised pretraining task called Next Thought Prediction and several novel supervised brain decoding tasks. To evaluate BEAT, we pretrained ii on Next Thought Prediction and performed transfer learning to the brain decoding tasks, which are specific to one of three fMRI datasets. To demonstrate significant benefits of transfer learning, BEAT decoded instrumental timbre from one of the fMRI datasets which standard methods failed to decode in addition to improved downstream performance. Toward generalized pretrained models of the brain, BEAT learned Next Thought Prediction on one fMRI dataset, and then successfully transferred that learning to a supervised brain decoding task on an entirely distinct dataset, with different participants and stimuli. To our knowledge this is the first instance of transfer learning across participants and stimuli–a necessity for whole-brain pretrained models

    Learning feature hierarchies for musical audio signals

    Get PDF

    Mapping Brain Development and Decoding Brain Activity with Diffuse Optical Tomography

    Get PDF
    Functional neuroimaging has been used to map brain function as well as decode information from brain activity. However, applications like studying early brain development or enabling augmentative communication in patients with severe motor disabilities have been constrained by extant imaging modalities, which can be challenging to use in young children and entail major tradeoffs between logistics and image quality. Diffuse optical tomography (DOT) is an emerging method combining logistical advantages of optical imaging with enhanced image quality. Here, we developed one of the world’s largest DOT systems for high-performance optical brain imaging in children. From visual cortex activity in adults, we decoded the locations of checkerboard visual stimuli, e.g. localizing a 60 degree wedge rotating through 36 positions with an error of 25.8±24.7 degrees. Using animated movies as more child-friendly stimuli, we mapped reproducible responses to speech and faces with DOT in awake, typically developing 1-7 year-old children and adults. We then decoded with accuracy significantly above chance which movie a participant was watching or listening to from DOT data. This work lays a valuable foundation for ongoing research with wearable imaging systems and increasingly complex algorithms to map atypical brain development and decode covert semantic information in clinical populations

    Functional imaging studies of visual-auditory integration in man.

    Get PDF
    This thesis investigates the central nervous system's ability to integrate visual and auditory information from the sensory environment into unified conscious perception. It develops the possibility that the principle of functional specialisation may be applicable in the multisensory domain. The first aim was to establish the neuroanatomical location at which visual and auditory stimuli are integrated in sensory perception. The second was to investigate the neural correlates of visual-auditory synchronicity, which would be expected to play a vital role in establishing which visual and auditory stimuli should be perceptually integrated. Four functional Magnetic Resonance Imaging studies identified brain areas specialised for: the integration of dynamic visual and auditory cues derived from the same everyday environmental events (Experiment 1), discriminating relative synchronicity between dynamic, cyclic, abstract visual and auditory stimuli (Experiment 2 & 3) and the aesthetic evaluation of visually and acoustically perceived art (Experiment 4). Experiment 1 provided evidence to suggest that the posterior temporo-parietal junction may be an important site of crossmodal integration. Experiment 2 revealed for the first time significant activation of the right anterior frontal operculum (aFO) when visual and auditory stimuli cycled asynchronously. Experiment 3 confirmed and developed this observation as the right aFO was activated only during crossmodal (visual-auditory), but not intramodal (visual-visual, auditory-auditory) asynchrony. Experiment 3 also demonstrated activation of the amygdala bilaterally during crossmodal synchrony. Experiment 4 revealed the neural correlates of supramodal, contemplative, aesthetic evaluation within the medial fronto-polar cortex. Activity at this locus varied parametrically according to the degree of subjective aesthetic beauty, for both visual art and musical extracts. The most robust finding of this thesis is that activity in the right aFO increases when concurrently perceived visual and auditory sensory stimuli deviate from crossmodal synchrony, which may veto the crossmodal integration of unrelated stimuli into unified conscious perception

    Wavelets and sparse methods for image reconstruction and classification in neuroimaging

    Get PDF
    This dissertation contributes to neuroimaging literature in the fields of compressed sensing magnetic resonance imaging (CS-MRI) and image-based detection of Alzheimer’s disease (AD). It consists of three main contributions, based on wavelets and sparse methods. The first contribution is a method for wavelet packet basis optimisation for sparse approximation and compressed sensing reconstruction of magnetic resonance (MR) images of the brain. The proposed method is based on the basis search algorithm developed by Coifman and Wickerhauser, with a cost function designed specifically for compressed sensing. It is tested on MR images available from the Alzheimer’s Disease Neuroimaging Initiative (ADNI). The second contribution consists of evaluating and comparing several sparse classification methods in an application to detection of AD based on positron emission tomography (PET) images of the brain. This comparison includes univariate feature selection, feature clustering and classifiers that automatically select a small subset of features due to their mathematical or algorithmic construction. The evaluation is based on PET images available from ADNI. The third contribution is proposing an extension of wavelet-based scattering networks (originally proposed by Mallat and Bruna) to three-dimensional tomographic images. The proposed extension is evaluated as a feature representation in an application to detection of AD based on MR images available from ADNI. There are several possible extensions of the work presented in this dissertation. The wavelet packet basis search method proposed in the first contribution can be improved to take into account the coherence between the sparse approximation basis and the sensing basis. The evaluation presented in the second contribution can be extended with additional algorithms to make it more comprehensive. The three-dimensional scattering networks that are the core part of the third contribution can be combined with other machine learning methods, such as manifold learning or deep convolutional neural networks. As a whole, the methods proposed in this dissertation contribute to the work towards efficient screening for Alzheimer’s disease, by making MRI scans of the brain faster and helping to automate image analysis for AD detection. The first contribution is a method for wavelet packet basis optimisation for sparse approximation and compressed sensing reconstruction of magnetic resonance (MR) images of the brain. The proposed method is based on the basis search algorithm developed by Coifman and Wickerhauser, with a cost function designed specifically for compressed sensing. It is tested on MR images available from the Alzheimer’s Disease Neuroimaging Initiative (ADNI). The second contribution consists of evaluating and comparing several sparse classification methods in an application to detection of AD based on positron emission tomography (PET) images of the brain. This comparison includes univariate feature selection, feature clustering and classifiers that automatically select a small subset of features due to their mathematical or algorithmic construction. The evaluation is based on PET images available from ADNI. The third contribution is proposing an extension of wavelet-based scattering networks (originally proposed by Mallat and Bruna) to three-dimensional tomographic images. The proposed extension is evaluated as a feature representation in an application to detection of AD based on MR images available from ADNI. There are several possible extensions of the work presented in this dissertation. The wavelet packet basis search method proposed in the first contribution can be improved to take into account the coherence between the sparse approximation basis and the sensing basis. The evaluation presented in the second contribution can be extended with additional algorithms to make it more comprehensive. The three-dimensional scattering networks that are the core part of the third contribution can be combined with other machine learning methods, such as manifold learning or deep convolutional neural networks. This dissertation contributes to neuroimaging literature in the fields of compressed sensing magnetic resonance imaging (CS-MRI) and image-based detection of Alzheimer’s disease (AD). It consists of three main contributions, based on wavelets and sparse methods. The first contribution is a method for wavelet packet basis optimisation for sparse approximation and compressed sensing reconstruction of magnetic resonance (MR) images of the brain. The proposed method is based on the basis search algorithm developed by Coifman and Wickerhauser, with a cost function designed specifically for compressed sensing. It is tested on MR images available from the Alzheimer’s Disease Neuroimaging Initiative (ADNI). The second contribution consists of evaluating and comparing several sparse classification methods in an application to detection of AD based on positron emission tomography (PET) images of the brain. This comparison includes univariate feature selection, feature clustering and classifiers that automatically select a small subset of features due to their mathematical or algorithmic construction. The evaluation is based on PET images available from ADNI. The third contribution is proposing an extension of wavelet-based scattering networks (originally proposed by Mallat and Bruna) to three-dimensional tomographic images. The proposed extension is evaluated as a feature representation in an application to detection of AD based on MR images available from ADNI. There are several possible extensions of the work presented in this dissertation. The wavelet packet basis search method proposed in the first contribution can be improved to take into account the coherence between the sparse approximation basis and the sensing basis. The evaluation presented in the second contribution can be extended with additional algorithms to make it more comprehensive. The three-dimensional scattering networks that are the core part of the third contribution can be combined with other machine learning methods, such as manifold learning or deep convolutional neural networks. As a whole, the methods proposed in this dissertation contribute to the work towards efficient screening for Alzheimer’s disease, by making MRI scans of the brain faster and helping to automate image analysis for AD detection.Open Acces
    • …
    corecore