17 research outputs found

    Conditional Restricted Boltzmann Machines for Structured Output Prediction

    Full text link
    Conditional Restricted Boltzmann Machines (CRBMs) are rich probabilistic models that have recently been applied to a wide range of problems, including collaborative filtering, classification, and modeling motion capture data. While much progress has been made in training non-conditional RBMs, these algorithms are not applicable to conditional models and there has been almost no work on training and generating predictions from conditional RBMs for structured output problems. We first argue that standard Contrastive Divergence-based learning may not be suitable for training CRBMs. We then identify two distinct types of structured output prediction problems and propose an improved learning algorithm for each. The first problem type is one where the output space has arbitrary structure but the set of likely output configurations is relatively small, such as in multi-label classification. The second problem is one where the output space is arbitrarily structured but where the output space variability is much greater, such as in image denoising or pixel labeling. We show that the new learning algorithms can work much better than Contrastive Divergence on both types of problems

    Capturing the dynamics of cellular automata, for the generation of synthetic persian music, using conditional restricted Boltzmann machines

    Get PDF
    © Springer International Publishing AG 2017. In this paper the generative and feature extracting powers of the family of Boltzmann Machines are employed in an algorithmic music composition system. Liquid Persian Music (LPM) system is an audio generator using cellular automata progressions as a creative core source. LPM provides an infrastructure for creating novel Dastgāh-like Persian music. Pattern matching rules extract features from the cellular automata sequences and populate the parameters of a Persian musical instrument synthesizer [1]. Applying restricted Boltzmann machines, and conditional restricted Boltzmann machines as two family members of Boltzmann machines provide new ways for interpreting the patterns emanating from the cellular automata. Conditional restricted Boltzmann machines are particularly employed for capturing the dynamics of cellular automata

    Apprentissage de représentations musicales à l'aide d'architectures profondes et multiéchelles

    Full text link
    L'apprentissage machine (AM) est un outil important dans le domaine de la recherche d'information musicale (Music Information Retrieval ou MIR). De nombreuses tâches de MIR peuvent être résolues en entraînant un classifieur sur un ensemble de caractéristiques. Pour les tâches de MIR se basant sur l'audio musical, il est possible d'extraire de l'audio les caractéristiques pertinentes à l'aide de méthodes traitement de signal. Toutefois, certains aspects musicaux sont difficiles à extraire à l'aide de simples heuristiques. Afin d'obtenir des caractéristiques plus riches, il est possible d'utiliser l'AM pour apprendre une représentation musicale à partir de l'audio. Ces caractéristiques apprises permettent souvent d'améliorer la performance sur une tâche de MIR donnée. Afin d'apprendre des représentations musicales intéressantes, il est important de considérer les aspects particuliers à l'audio musical dans la conception des modèles d'apprentissage. Vu la structure temporelle et spectrale de l'audio musical, les représentations profondes et multiéchelles sont particulièrement bien conçues pour représenter la musique. Cette thèse porte sur l'apprentissage de représentations de l'audio musical. Des modèles profonds et multiéchelles améliorant l'état de l'art pour des tâches telles que la reconnaissance d'instrument, la reconnaissance de genre et l'étiquetage automatique y sont présentés.Machine learning (ML) is an important tool in the field of music information retrieval (MIR). Many MIR tasks can be solved by training a classifier over a set of features. For MIR tasks based on music audio, it is possible to extract features from the audio with signal processing techniques. However, some musical aspects are hard to extract with simple heuristics. To obtain richer features, we can use ML to learn a representation from the audio. These learned features can often improve performance for a given MIR task. In order to learn interesting musical representations, it is important to consider the particular aspects of music audio when building learning models. Given the temporal and spectral structure of music audio, deep and multi-scale representations are particularly well suited to represent music. This thesis focuses on learning representations from music audio. Deep and multi-scale models that improve the state-of-the-art for tasks such as instrument recognition, genre recognition and automatic annotation are presented

    Machine learning techniques for music information retrieval

    Get PDF
    Tese de doutoramento, Informática (Engenharia Informática), Universidade de Lisboa, Faculdade de Ciências, 2015The advent of digital music has changed the rules of music consumption, distribution and sales. With it has emerged the need to effectively search and manage vast music collections. Music information retrieval is an interdisciplinary field of research that focuses on the development of new techniques with that aim in mind. This dissertation addresses a specific aspect of this field: methods that automatically extract musical information exclusively based on the audio signal. We propose a method for automatic music-based classification, label inference, and music similarity estimation. Our method consist in representing the audio with a finite set of symbols and then modeling the symbols time evolution. The symbols are obtained via vector quantization in which a single codebook is used to quantize the audio descriptors. The symbols time evolution is modeled via a first order Markov process. Based on systematic evaluations we carried out on publicly available sets, we show that our method achieves performances on par with most techniques found in literature. We also present and discuss the problems that appear when computers try to classify or annotate songs using the audio as the only source of information. In our method, the separation of quantization process from the creation and training of classification models helped us in that analysis. It enabled us to examine how instantaneous sound attributes (henceforth features) are distributed in term of musical genre, and how designing codebooks specially tailored for these distributions affects the performance of ours and other classification systems commonly used for this task. On this issue, we show that there is no apparent benefit in seeking a thorough representation of the feature space. This is a bit unexpected since it goes against the assumption that features carry equally relevant information loads and somehow capture the specificities of musical facets, implicit in many genre recognition methods. Label inference is the task of automatically annotating songs with semantic words - this tasks is also known as autotagging. In this context, we illustrate the importance of a number of issues, that in our perspective, are often overlooked. We show that current techniques are fragile in the sense that small alterations in the set of labels may lead to dramatically different results. Furthermore, through a series of experiments, we show that autotagging systems fail to learn tag models capable to generalize to datasets of different origins. We also show that the performance achieved with these techniques is not sufficient to be able to take advantage of the correlations between tags.Fundação para a Ciência e a Tecnologia (FCT

    Retrieval and Annotation of Music Using Latent Semantic Models

    Get PDF
    PhDThis thesis investigates the use of latent semantic models for annotation and retrieval from collections of musical audio tracks. In particular latent semantic analysis (LSA) and aspect models (or probabilistic latent semantic analysis, pLSA) are used to index words in descriptions of music drawn from hundreds of thousands of social tags. A new discrete audio feature representation is introduced to encode musical characteristics of automatically-identified regions of interest within each track, using a vocabulary of audio muswords. Finally a joint aspect model is developed that can learn from both tagged and untagged tracks by indexing both conventional words and muswords. This model is used as the basis of a music search system that supports query by example and by keyword, and of a simple probabilistic machine annotation system. The models are evaluated by their performance in a variety of realistic retrieval and annotation tasks, motivated by applications including playlist generation, internet radio streaming, music recommendation and catalogue searchEngineering and Physical Sciences Research Counci

    Creating Persian-like music using computational intelligence

    Get PDF
    Dastgāh are modal systems in traditional Persian music. Each Dastgāh consists of a group of melodies called Gushé, classified in twelve groups about a century ago (Farhat, 1990). Prior to that time, musical pieces were transferred through oral tradition. The traditional music productions revolve around the existing Dastgāh, and Gushe pieces. In this thesis computational intelligence tools are employed in creating novel Dastgāh-like music.There are three types of creativity: combinational, exploratory, and transformational (Boden, 2000). In exploratory creativity, a conceptual space is navigated for discovering new forms. Sometimes the exploration results in transformational creativity. This is due to meaningful alterations happening on one or more of the governing dimensions of an item. In combinational creativity new links are established between items not previously connected. Boden stated that all these types of creativity can be implemented using artificial intelligence.Various tools, and techniques are employed, in the research reported in this thesis, for generating Dastgāh-like music. Evolutionary algorithms are responsible for navigating the space of sequences of musical motives. Aesthetical critics are employed for constraining the search space in exploratory (and hopefully transformational) type of creativity. Boltzmann machine models are applied for assimilating some of the mechanisms involved in combinational creativity. The creative processes involved are guided by aesthetical critics, some of which are derived from a traditional Persian music database.In this project, Cellular Automata (CA) are the main pattern generators employed to produce raw creative materials. Various methodologies are suggested for extracting features from CA progressions and mapping them to musical space, and input to audio synthesizers. The evaluation of the results of this thesis are assisted by publishing surveys which targeted both public and professional audiences. The generated audio samples are evaluated regarding their Dastgāh-likeness, and the level of creativity of the systems involved

    Learning Contextualized Music Semantics from Tags via a Siamese Network

    Full text link
    Music information retrieval faces a challenge in modeling contextualized musical concepts formulated by a set of co-occurring tags. In this paper, we investigate the suitability of our recently proposed approach based on a Siamese neural network in fighting off this challenge. By means of tag features and probabilistic topic models, the network captures contextualized semantics from tags via unsupervised learning. This leads to a distributed semantics space and a potential solution to the out of vocabulary problem which has yet to be sufficiently addressed. We explore the nature of the resultant music-based semantics and address computational needs. We conduct experiments on three public music tag collections -namely, CAL500, MagTag5K and Million Song Dataset- and compare our approach to a number of state-of-the-art semantics learning approaches. Comparative results suggest that this approach outperforms previous approaches in terms of semantic priming and music tag completion.Comment: 20 pages. To appear in ACM TIST: Intelligent Music Systems and Application
    corecore