388 research outputs found

    Sequential decision making in artificial musical intelligence

    Get PDF
    Over the past 60 years, artificial intelligence has grown from a largely academic field of research to a ubiquitous array of tools and approaches used in everyday technology. Despite its many recent successes and growing prevalence, certain meaningful facets of computational intelligence have not been as thoroughly explored. Such additional facets cover a wide array of complex mental tasks which humans carry out easily, yet are difficult for computers to mimic. A prime example of a domain in which human intelligence thrives, but machine understanding is still fairly limited, is music. Over the last decade, many researchers have applied computational tools to carry out tasks such as genre identification, music summarization, music database querying, and melodic segmentation. While these are all useful algorithmic solutions, we are still a long way from constructing complete music agents, able to mimic (at least partially) the complexity with which humans approach music. One key aspect which hasn't been sufficiently studied is that of sequential decision making in musical intelligence. This thesis strives to answer the following question: Can a sequential decision making perspective guide us in the creation of better music agents, and social agents in general? And if so, how? More specifically, this thesis focuses on two aspects of musical intelligence: music recommendation and human-agent (and more generally agent-agent) interaction in the context of music. The key contributions of this thesis are the design of better music playlist recommendation algorithms; the design of algorithms for tracking user preferences over time; new approaches for modeling people's behavior in situations that involve music; and the design of agents capable of meaningful interaction with humans and other agents in a setting where music plays a roll (either directly or indirectly). Though motivated primarily by music-related tasks, and focusing largely on people's musical preferences, this thesis also establishes that insights from music-specific case studies can also be applicable in other concrete social domains, such as different types of content recommendation. Showing the generality of insights from musical data in other contexts serves as evidence for the utility of music domains as testbeds for the development of general artificial intelligence techniques. Ultimately, this thesis demonstrates the overall usefulness of taking a sequential decision making approach in settings previously unexplored from this perspectiveComputer Science

    Evolutionary multi-objective training set selection of data instances and augmentations for vocal detection

    Get PDF
    © Springer Nature Switzerland AG 2019. The size of publicly available music data sets has grown significantly in recent years, which allows training better classification models. However, training on large data sets is time-intensive and cumbersome, and some training instances might be unrepresentative and thus hurt classification performance regardless of the used model. On the other hand, it is often beneficial to extend the original training data with augmentations, but only if they are carefully chosen. Therefore, identifying a “smart” selection of training instances should improve performance. In this paper, we introduce a novel, multi-objective framework for training set selection with the target to simultaneously minimise the number of training instances and the classification error. Experimentally, we apply our method to vocal activity detection on a multi-track database extended with various audio augmentations for accompaniment and vocals. Results show that our approach is very effective at reducing classification error on a separate validation set, and that the resulting training set selections either reduce classification error or require only a small fraction of training instances for comparable performance

    An Investigation into the Use of Artificial Intelligence Techniques for the Analysis and Control of Instrumental Timbre and Timbral Combinations

    Get PDF
    Researchers have investigated harnessing computers as a tool to aid in the composition of music for over 70 years. In major part, such research has focused on creating algorithms to work with pitches and rhythm, which has resulted in a selection of sophisticated systems. Although the musical possibilities of these systems are vast, they are not directly considering another important characteristic of sound. Timbre can be defined as all the sound attributes, except pitch, loudness and duration, which allow us to distinguish and recognize that two sounds are dissimilar. This feature plays an essential role in combining instruments as it involves mixing instrumental properties to create unique textures conveying specific sonic qualities. Within this thesis, we explore harnessing techniques for the analysis and control of instrumental timbre and timbral combinations. This thesis begins with investigating the link between musical timbre, auditory perception and psychoacoustics for sounds emerging from instrument mixtures. It resulted in choosing to use verbal descriptors of timbral qualities to represent auditory perception of instrument combination sounds. Therefore, this thesis reports on the developments of methods and tools designed to automatically retrieve and identify perceptual qualities of timbre within audio files, using specific musical acoustic features and artificial intelligence algorithms. Different perceptual experiments have been conducted to evaluate the correlation between selected acoustics cues and humans' perception. Results of these evaluations confirmed the potential and suitability of the presented approaches. Finally, these developments have helped to design a perceptually-orientated generative system harnessing aspects of artificial intelligence to combine sampled instrument notes. The findings of this exploration demonstrate that an artificial intelligence approach can help to harness the perceptual aspect of instrumental timbre and timbral combinations. This investigation suggests that established methods of measuring timbral qualities, based on a diverse selection of sounds, also work for sounds created by combining instrument notes. The development of tools designed to automatically retrieve and identify perceptual qualities of timbre also helped in designing a comparative scale that goes towards standardising metrics for comparing timbral attributes. Finally, this research demonstrates that perceptual characteristics of timbral qualities, using verbal descriptors as a representation, can be implemented in an intelligent computing system designed to combine sampled instrument notes conveying specific perceptual qualities.Arts and Humanities Research Council funded 3D3 Centre for Doctoral Trainin

    An Interval-based Multiobjective Approach to Feature Subset Selection Using Joint Modeling of Objectives and Variables

    Get PDF
    This paper studies feature subset selection in classification using a multiobjective estimation of distribution algorithm. We consider six functions, namely area under ROC curve, sensitivity, specificity, precision, F1 measure and Brier score, for evaluation of feature subsets and as the objectives of the problem. One of the characteristics of these objective functions is the existence of noise in their values that should be appropriately handled during optimization. Our proposed algorithm consists of two major techniques which are specially designed for the feature subset selection problem. The first one is a solution ranking method based on interval values to handle the noise in the objectives of this problem. The second one is a model estimation method for learning a joint probabilistic model of objectives and variables which is used to generate new solutions and advance through the search space. To simplify model estimation, l1 regularized regression is used to select a subset of problem variables before model learning. The proposed algorithm is compared with a well-known ranking method for interval-valued objectives and a standard multiobjective genetic algorithm. Particularly, the effects of the two new techniques are experimentally investigated. The experimental results show that the proposed algorithm is able to obtain comparable or better performance on the tested datasets

    Intelligent Control of Dynamic Range Compressor

    Get PDF
    PhD ThesisMusic production is an essential element in the value chain of modern music. It includes enhancing the recorded audio tracks, balancing the loudness level of multiple tracks as well as making artistic decisions to satisfy music genre, style and emotion. Similarly to related professions in creative media production, the tools for music making are now highly computerised. However, many parts of the work remain labour intensive and time consuming. The demand for intelligent tools is therefore growing. This situation encourages the emerging trend of ever increasing research into intelligent music production tools. Since audio effects are among the main tools used by music producers, there are many discussions and developments targeting the controlling mechanism of audio effects. This thesis is aiming at pushing the boundaries in this field by investigating the intelligent control of one of the essential audio effects, the dynamic range compressor. This research presents an innovative control system design. The core of this design is to learn from a reference audio, and control the dynamic range compressor to make the processed input audio sounds as close as possible to the reference. One of the proposed approaches can be divided into three stages, a feature extractor, a trained regression model, and an objective evaluation algorithm. In the feature extractor stage we firstly test feature sets using conventional audio features commonly used in speech and audio signal analyses. Substantially, we test handcrafted audio features specifically designed to characterise audio properties related to the dynamic range of audio samples. Research into feature design has been completed at different levels of complexity. A series of feature selection schemes are also assessed to select the optimal feature sets from both conventional and specifically designed audio features. In the subsequent stage of the research, feature extraction is replaced by a feature learning deep neural network (DNN). This is addressing the problem that the previous features are exclusive to each parameter, while a general feature extractor may be formed using DNN. A universal feature extractor can reduce the computational cost and become easier to adapt to more complex audio materials as well. The second stage of the control system is a trained regression model. Random forest regression is selected from several algorithms using experimental validation. Since different feature extractors are tested with increasingly complex audio material, as well as exclusive to the DRC’s parameters, e.g., attack time or compression ratio, separate models are trained and tested respectively. The third component of our approach is a method for evaluation. A computational audio similarity algorithm was designed to verify the results using auditory models. This algorithm is based on estimating the distance between two statistical models fitted on perceptually motivated audio features characterising similarity in loudness and timbre. Finally, the overall system is evaluated with both objective and subjective methods. The main contribution of this Thesis is a method for using a reference audio to control a dynamic range compressor. Besides the system design, the analysis of the evaluation provides useful insights of the relations between audio effects and audio features as well as auditory perception. The research is conducted in a way that it is possible to transfer the knowledge to other audio effects and other use case scenarios, providing an alternative research direction in the field of intelligent music production and simplifying how audio effects are controlled for end users.

    Proceedings of the Detection and Classification of Acoustic Scenes and Events 2016 Workshop (DCASE2016)

    Get PDF

    Deep Learning for Music Information Retrieval in Limited Data Scenarios.

    Get PDF
    PhD ThesisWhile deep learning (DL) models have achieved impressive results in settings where large amounts of annotated training data are available, over tting often degrades performance when data is more limited. To improve the generalisation of DL models, we investigate \data-driven priors" that exploit additional unlabelled data or labelled data from related tasks. Unlike techniques such as data augmentation, these priors are applicable across a range of machine listening tasks, since their design does not rely on problem-speci c knowledge. We rst consider scenarios in which parts of samples can be missing, aiming to make more datasets available for model training. In an initial study focusing on audio source separation (ASS), we exploit additionally available unlabelled music and solo source recordings by using generative adversarial networks (GANs), resulting in higher separation quality. We then present a fully adversarial framework for learning generative models with missing data. Our discriminator consists of separately trainable components that can be combined to train the generator with the same objective as in the original GAN framework. We apply our framework to image generation, image segmentation and ASS, demonstrating superior performance compared to the original GAN. To improve performance on any given MIR task, we also aim to leverage datasets which are annotated for similar tasks. We use multi-task learning (MTL) to perform singing voice detection and singing voice separation with one model, improving performance on both tasks. Furthermore, we employ meta-learning on a diverse collection of ten MIR tasks to nd a weight initialisation for a \universal MIR model" so that training the model on any MIR task with this initialisation quickly leads to good performance. Since our data-driven priors encode knowledge shared across tasks and datasets, they are suited for high-dimensional, end-to-end models, instead of small models relying on task-speci c feature engineering, such as xed spectrogram representations of audio commonly used in machine listening. To this end, we propose \Wave-U-Net", an adaptation of the U-Net, which can perform ASS directly on the raw waveform while performing favourably to its spectrogrambased counterpart. Finally, we derive \Seq-U-Net" as a causal variant of Wave- U-Net, which performs comparably to Wavenet and Temporal Convolutional Network (TCN) on a variety of sequence modelling tasks, while being more computationally e cient.

    A Bio-Inspired Music Genre Classification Framework using Modified AIS-Based Classifier

    Get PDF
    For decades now, scientific community are involved in various works to automate the human process of recognizing different types of music using different elements for example different instruments used. These efforts would imitate the human method of recognizing the music by considering every essential component of the songs from artist voice, melody of the music through to the type of instruments used. Various approaches or mechanisms are introduced and developed to automate the classification process since then. The results of these studies so far have been remarkable yet can still be improved. The aim of this research is to investigate Artificial Immune System (AIS) domain by focusing on the modified AIS-based classifier to solve this problem where the focuses are the censoring and monitoring modules. In this highlight, stages of music recognition are emphasized where feature extraction, feature selection, and feature classification processes are explained. Comparison of performances between proposed classifier and WEKA application is discussed. Almost 20 to 30 percent of classification accuracies are increased in this study
    • …
    corecore