108 research outputs found

    Evolutionary multi-objective training set selection of data instances and augmentations for vocal detection

    Get PDF
    © Springer Nature Switzerland AG 2019. The size of publicly available music data sets has grown significantly in recent years, which allows training better classification models. However, training on large data sets is time-intensive and cumbersome, and some training instances might be unrepresentative and thus hurt classification performance regardless of the used model. On the other hand, it is often beneficial to extend the original training data with augmentations, but only if they are carefully chosen. Therefore, identifying a “smart” selection of training instances should improve performance. In this paper, we introduce a novel, multi-objective framework for training set selection with the target to simultaneously minimise the number of training instances and the classification error. Experimentally, we apply our method to vocal activity detection on a multi-track database extended with various audio augmentations for accompaniment and vocals. Results show that our approach is very effective at reducing classification error on a separate validation set, and that the resulting training set selections either reduce classification error or require only a small fraction of training instances for comparable performance

    Models and Analysis of Vocal Emissions for Biomedical Applications

    Get PDF
    The Models and Analysis of Vocal Emissions with Biomedical Applications (MAVEBA) workshop came into being in 1999 from the particularly felt need of sharing know-how, objectives and results between areas that until then seemed quite distinct such as bioengineering, medicine and singing. MAVEBA deals with all aspects concerning the study of the human voice with applications ranging from the neonate to the adult and elderly. Over the years the initial issues have grown and spread also in other aspects of research such as occupational voice disorders, neurology, rehabilitation, image and video analysis. MAVEBA takes place every two years always in Firenze, Italy

    Listening-Mode-Centered Sonification Design for Data Exploration

    Get PDF
    Grond F. Listening-Mode-Centered Sonification Design for Data Exploration. Bielefeld: Bielefeld University; 2013.From the Introduction to this thesis: Through the ever growing amount of data and the desire to make them accessible to the user through the sense of listening, sonification, the representation of data by using sound has been subject of active research in the computer sciences and the field of HCI for the last 20 years. During this time, the field of sonification has diversified into different application areas: today, sound in auditory display informs the user about states and actions on the desktop and in mobile devices; sonification has been applied in monitoring applications, where sound can range from being informative to alarming; sonification has been used to give sensory feedback in order to close the action and perception loop; last but not least, sonifications have also been developed for exploratory data analysis, where sound is used to represent data with unknown structures for hypothesis building. Coming from the computer sciences and HCI, the conceptualization of sonification has been mostly driven by application areas. On the other hand, the sonic arts who have always contributed to the community of auditory display have a genuine focus on sound. Despite this close interdisciplinary relation of communities of sound practitioners, a rich and sound- (or listening)-centered concept about sonification is still missing as a point of departure for a more application and task overarching approach towards design guidelines. Complementary to the useful organization along fields of applications, a conceptual framework that is proper to sound needs to abstract from applications and also to some degree from tasks, as both are not directly related to sound. I hence propose in this thesis to conceptualize sonifications along two poles where sound serves either a normative or a descriptive purpose. In the beginning of auditory display research, a continuum between a symbolic and an analogic pole has been proposed by Kramer (1994a, page 21). In this continuum, symbolic stands for sounds that coincide with existing schemas and are more denotative, analogic stands for sounds that are informative through their connotative aspects. (compare Worrall (2009, page 315)). The notions of symbolic and analogic illustrate the struggle to find apt descriptions of how the intention of the listener subjects audible phenomena to a process of meaning making and interpretation. Complementing the analogic-symbolic continuum with descriptive and normative purposes of displays is proposed in the light of the recently increased research interest in listening modes and intentions. Similar to the terms symbolic and analogic, listening modes have been discussed in auditory display since the beginning usually in dichotomic terms which were either identified with the words listening and hearing or understood as musical listening and everyday listening as proposed by Gaver (1993a). More than 25 years earlier, four direct listening modes have been introduced by Schaeffer (1966) together with a 5th synthetic mode of reduced listening which leads to the well-known sound object. Interestingly, Schaeffer’s listening modes remained largely unnoticed by the auditory display community. Particularly the notion of reduced listening goes beyond the connotative and denotative poles of the continuum proposed by Kramer and justifies the new terms descriptive and normative. Recently, a new taxonomy of listening modes has been proposed by Tuuri and Eerola (2012) that is motivated through an embodied cognition approach. The main contribution of their taxonomy is that it convincingly diversifies the connotative and denotative aspects of listening modes. In the recently published sonification handbook, multimodal and interactive aspects in combination with sonification have been discussed as promising options to expand and advance the field by Hunt and Hermann (2011), who point out that there is a big need for a better theoretical foundation in order to systematically integrate these aspects. The main contribution of this thesis is to address this need by providing alternative and complementary design guidelines with respect to existing approaches, all of which have been conceived before the recently increased research interest in listening modes. None of the existing contributions to design frameworks integrates multimodality, and listening modes with a focus on exploratory data analysis, where sonification is conceived to support the understanding of complex data potentially helping to identify new structures therein. In order to structure this field the following questions are addressed in this thesis: • How do natural listening modes and reduced listening relate to the proposed normative and descriptive display purposes? • What is the relationship of multimodality and interaction with listening modes and display purposes? • How can the potential of embodied cognition based listening modes be put to use for exploratory data sonification? • How can listening modes and display purposes be connected to questions of aesthetics in the display? • How do data complexity and Parameter-mapping sonification relate to exploratory data analysis and listening modes

    Deep Learning for Music Information Retrieval in Limited Data Scenarios.

    Get PDF
    PhD ThesisWhile deep learning (DL) models have achieved impressive results in settings where large amounts of annotated training data are available, over tting often degrades performance when data is more limited. To improve the generalisation of DL models, we investigate \data-driven priors" that exploit additional unlabelled data or labelled data from related tasks. Unlike techniques such as data augmentation, these priors are applicable across a range of machine listening tasks, since their design does not rely on problem-speci c knowledge. We rst consider scenarios in which parts of samples can be missing, aiming to make more datasets available for model training. In an initial study focusing on audio source separation (ASS), we exploit additionally available unlabelled music and solo source recordings by using generative adversarial networks (GANs), resulting in higher separation quality. We then present a fully adversarial framework for learning generative models with missing data. Our discriminator consists of separately trainable components that can be combined to train the generator with the same objective as in the original GAN framework. We apply our framework to image generation, image segmentation and ASS, demonstrating superior performance compared to the original GAN. To improve performance on any given MIR task, we also aim to leverage datasets which are annotated for similar tasks. We use multi-task learning (MTL) to perform singing voice detection and singing voice separation with one model, improving performance on both tasks. Furthermore, we employ meta-learning on a diverse collection of ten MIR tasks to nd a weight initialisation for a \universal MIR model" so that training the model on any MIR task with this initialisation quickly leads to good performance. Since our data-driven priors encode knowledge shared across tasks and datasets, they are suited for high-dimensional, end-to-end models, instead of small models relying on task-speci c feature engineering, such as xed spectrogram representations of audio commonly used in machine listening. To this end, we propose \Wave-U-Net", an adaptation of the U-Net, which can perform ASS directly on the raw waveform while performing favourably to its spectrogrambased counterpart. Finally, we derive \Seq-U-Net" as a causal variant of Wave- U-Net, which performs comparably to Wavenet and Temporal Convolutional Network (TCN) on a variety of sequence modelling tasks, while being more computationally e cient.

    Building Machines That Learn and Think Like People

    Get PDF
    Recent progress in artificial intelligence (AI) has renewed interest in building systems that learn and think like people. Many advances have come from using deep neural networks trained end-to-end in tasks such as object recognition, video games, and board games, achieving performance that equals or even beats humans in some respects. Despite their biological inspiration and performance achievements, these systems differ from human intelligence in crucial ways. We review progress in cognitive science suggesting that truly human-like learning and thinking machines will have to reach beyond current engineering trends in both what they learn, and how they learn it. Specifically, we argue that these machines should (a) build causal models of the world that support explanation and understanding, rather than merely solving pattern recognition problems; (b) ground learning in intuitive theories of physics and psychology, to support and enrich the knowledge that is learned; and (c) harness compositionality and learning-to-learn to rapidly acquire and generalize knowledge to new tasks and situations. We suggest concrete challenges and promising routes towards these goals that can combine the strengths of recent neural network advances with more structured cognitive models.Comment: In press at Behavioral and Brain Sciences. Open call for commentary proposals (until Nov. 22, 2016). https://www.cambridge.org/core/journals/behavioral-and-brain-sciences/information/calls-for-commentary/open-calls-for-commentar

    Bias in Deep Learning and Applications to Face Analysis

    Get PDF
    Deep learning has fostered the progress in the field of face analysis, resulting in the integration of these models in multiple aspects of society. Even though the majority of research has focused on optimizing standard evaluation metrics, recent work has exposed the bias of such algorithms as well as the dangers of their unaccountable utilization.n this thesis, we explore the bias of deep learning models in the discriminative and the generative setting. We begin by investigating the bias of face analysis models with regards to different demographics. To this end, we collect KANFace, a large-scale video and image dataset of faces captured ``in-the-wild’'. The rich set of annotations allows us to expose the demographic bias of deep learning models, which we mitigate by utilizing adversarial learning to debias the deep representations. Furthermore, we explore neural augmentation as a strategy towards training fair classifiers. We propose a style-based multi-attribute transfer framework that is able to synthesize photo-realistic faces of the underrepresented demographics. This is achieved by introducing a multi-attribute extension to Adaptive Instance Normalisation that captures the multiplicative interactions between the representations of different attributes. Focusing on bias in gender recognition, we showcase the efficacy of the framework in training classifiers that are more fair compared to generative and fairness-aware methods.In the second part, we focus on bias in deep generative models. In particular, we start by studying the generalization of generative models on images of unseen attribute combinations. To this end, we extend the conditional Variational Autoencoder by introducing a multilinear conditioning framework. The proposed method is able to synthesize unseen attribute combinations by modeling the multiplicative interactions between the attributes. Lastly, in order to control protected attributes, we investigate controlled image generation without training on a labelled dataset. We leverage pre-trained Generative Adversarial Networks that are trained in an unsupervised fashion and exploit the clustering that occurs in the representation space of intermediate layers of the generator. We show that these clusters capture semantic attribute information and condition image synthesis on the cluster assignment using Implicit Maximum Likelihood Estimation.Open Acces

    Using MapReduce Streaming for Distributed Life Simulation on the Cloud

    Get PDF
    Distributed software simulations are indispensable in the study of large-scale life models but often require the use of technically complex lower-level distributed computing frameworks, such as MPI. We propose to overcome the complexity challenge by applying the emerging MapReduce (MR) model to distributed life simulations and by running such simulations on the cloud. Technically, we design optimized MR streaming algorithms for discrete and continuous versions of Conway’s life according to a general MR streaming pattern. We chose life because it is simple enough as a testbed for MR’s applicability to a-life simulations and general enough to make our results applicable to various lattice-based a-life models. We implement and empirically evaluate our algorithms’ performance on Amazon’s Elastic MR cloud. Our experiments demonstrate that a single MR optimization technique called strip partitioning can reduce the execution time of continuous life simulations by 64%. To the best of our knowledge, we are the first to propose and evaluate MR streaming algorithms for lattice-based simulations. Our algorithms can serve as prototypes in the development of novel MR simulation algorithms for large-scale lattice-based a-life models.https://digitalcommons.chapman.edu/scs_books/1014/thumbnail.jp
    • …
    corecore