3,791 research outputs found

    STRUCTURED SPARSITY FOR AUTOMATIC MUSIC TRANSCRIPTION

    Get PDF
    © 2012 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

    Analysis, Visualization, and Transformation of Audio Signals Using Dictionary-based Methods

    Get PDF
    date-added: 2014-01-07 09:15:58 +0000 date-modified: 2014-01-07 09:15:58 +0000date-added: 2014-01-07 09:15:58 +0000 date-modified: 2014-01-07 09:15:58 +000

    Automatic music transcription: challenges and future directions

    Get PDF
    Automatic music transcription is considered by many to be a key enabling technology in music signal processing. However, the performance of transcription systems is still significantly below that of a human expert, and accuracies reported in recent years seem to have reached a limit, although the field is still very active. In this paper we analyse limitations of current methods and identify promising directions for future research. Current transcription methods use general purpose models which are unable to capture the rich diversity found in music signals. One way to overcome the limited performance of transcription systems is to tailor algorithms to specific use-cases. Semi-automatic approaches are another way of achieving a more reliable transcription. Also, the wealth of musical scores and corresponding audio data now available are a rich potential source of training data, via forced alignment of audio to scores, but large scale utilisation of such data has yet to be attempted. Other promising approaches include the integration of information from multiple algorithms and different musical aspects

    Automatic transcription of polyphonic music exploiting temporal evolution

    Get PDF
    PhDAutomatic music transcription is the process of converting an audio recording into a symbolic representation using musical notation. It has numerous applications in music information retrieval, computational musicology, and the creation of interactive systems. Even for expert musicians, transcribing polyphonic pieces of music is not a trivial task, and while the problem of automatic pitch estimation for monophonic signals is considered to be solved, the creation of an automated system able to transcribe polyphonic music without setting restrictions on the degree of polyphony and the instrument type still remains open. In this thesis, research on automatic transcription is performed by explicitly incorporating information on the temporal evolution of sounds. First efforts address the problem by focusing on signal processing techniques and by proposing audio features utilising temporal characteristics. Techniques for note onset and offset detection are also utilised for improving transcription performance. Subsequent approaches propose transcription models based on shift-invariant probabilistic latent component analysis (SI-PLCA), modeling the temporal evolution of notes in a multiple-instrument case and supporting frequency modulations in produced notes. Datasets and annotations for transcription research have also been created during this work. Proposed systems have been privately as well as publicly evaluated within the Music Information Retrieval Evaluation eXchange (MIREX) framework. Proposed systems have been shown to outperform several state-of-the-art transcription approaches. Developed techniques have also been employed for other tasks related to music technology, such as for key modulation detection, temperament estimation, and automatic piano tutoring. Finally, proposed music transcription models have also been utilized in a wider context, namely for modeling acoustic scenes

    Синтеза и анализа звукова развијених из Бозe-Ајнштајновог кондензата: теорија и експериментални резултати

    Get PDF
    Two seemingly incompatible worlds of quantum physics and acoustics have their meeting point in experiments with the Bose-Einstein Condensate. From the very beginning, the Quantum Music project was based on the idea of converting the acoustic phenomena of quantum physics that appear in experiments into the sound domain accessible to the human ear. Te frst part of this paper describes the experimental conditions in which these acoustic phenomena occur. Te second part of the paper describes the process of sound synthesis which was used to generate fnal sounds. Sound synthesis was based on the use of two types of basic data: theoretical formulas and the results of experiments with the Bose-Einstein condensate. Te process of sound synthesis based on theoretical equations was conducted following the principles of additive synthesis, realized using the Java Script and Max MSP software. The synthesis of sounds based on the results of experiments was done using the MatLab sofware. Te third part or the article deals with the acoustic analysis of the generated sounds, indicating some of the acoustic phenomena that have emerged. Also, we discuss the possible ways of using such sounds in the process of composing and performing contemporary music.Мало је позната чињеница да постоји област квантне физике која се означава термином квантна акустика. Два, наизглед неспојива света – свет звука који је неодвојиви део наше физичке реалности, и свет квантних честица – спајају се у серији експеримената које је могуће спровести над материјом која се хлади на незамисливо ниске температуре,реда величине милијардитог дела једног келвина. Ова температура, која се званично сматра најнижом у целом универзуму, доводи атоме у специјално стање битисања, које носи назив Бозе-Ајнштајнов кондензат. Након освајања технологије потребне за обављање оваквихе ксперимената, дошло је до наглог развоја методологије за праћење најразличитијих својстава које материја у овако чудесном стању почиње да испољава, укључујући и могућност побуђивања механичких,тј. акустичких таласа. Основно исходиште пројекта Квантна музика управо је овај експеримент, чији резултати и математички модели који га описују омогућавају синтезу звука. Ова синтеза обављена је директном применом експерименталних резултата у софтверу MatLab, као и употребом математичких формула које описују вибрације кондензата у наменски дизајнираном адитивном синтисајзеру реализованом у МахМSP софтверском окружењу. Као резултат ове синтезе, формиране су банке звукова који су засновани на вибрацијама квантног система. Овако генерисани звуци испољавају веома интересантна акустичка својства, оличена пре свега у интензивној појави акустичког феномена избијања. Оваква ситуација доводи до појаве интересантних звучних ефеката који звуке, чији се спектрални садржај не мења у времену, преводе у временски променљиве звучне догађаје са изузетно занимљивим ритмичко-мелодијским структурама, које могу бити накнадно контролисане и употребљаване у процесу компоновања и извођења музике. Први део овог рада описује експерименталне услове у којима се ови акустички феномени испољавају. Други део рада описује процес синтезе звука коришћен за генерисањеаудиофајлова.Синтеза звука базирана је на употреби два основна типа података: теоријских формула и резултата експеримента са Бозе-Ајнштајновим кондензатом. Трећи део рада бави се акустичком анализом генерисаних звукова, уз указивање на некеакустичке феномене до чијег испољавања је дошло у процесу синтезе звука. Такође, дате су и основне смернице у вези са начинима употребе овако генерисаних звукова у процесу компоновања и извођења музике

    Non-Negative Group Sparsity with Subspace Note Modelling for Polyphonic Transcription

    Get PDF
    This work was supported by EPSRC Platform Grant EPSRC EP/K009559/1, EPSRC Grant EP/L027119/1, and EPSRC Grant EP/J010375/1

    Sparse and Nonnegative Factorizations For Music Understanding

    Get PDF
    In this dissertation, we propose methods for sparse and nonnegative factorization that are specifically suited for analyzing musical signals. First, we discuss two constraints that aid factorization of musical signals: harmonic and co-occurrence constraints. We propose a novel dictionary learning method that imposes harmonic constraints upon the atoms of the learned dictionary while allowing the dictionary size to grow appropriately during the learning procedure. When there is significant spectral-temporal overlap among the musical sources, our method outperforms popular existing matrix factorization methods as measured by the recall and precision of learned dictionary atoms. We also propose co-occurrence constraints -- three simple and convenient multiplicative update rules for nonnegative matrix factorization (NMF) that enforce dependence among atoms. Using examples in music transcription, we demonstrate the ability of these updates to represent each musical note with multiple atoms and cluster the atoms for source separation purposes. Second, we study how spectral and temporal information extracted by nonnegative factorizations can improve upon musical instrument recognition. Musical instrument recognition in melodic signals is difficult, especially for classification systems that rely entirely upon spectral information instead of temporal information. Here, we propose a simple and effective method of combining spectral and temporal information for instrument recognition. While existing classification methods use traditional features such as statistical moments, we extract novel features from spectral and temporal atoms generated by NMF using a biologically motivated multiresolution gamma filterbank. Unlike other methods that require thresholds, safeguards, and hierarchies, the proposed spectral-temporal method requires only simple filtering and a flat classifier. Finally, we study how to perform sparse factorization when a large dictionary of musical atoms is already known. Sparse coding methods such as matching pursuit (MP) have been applied to problems in music information retrieval such as transcription and source separation with moderate success. However, when the set of dictionary atoms is large, identification of the best match in the dictionary with the residual is slow -- linear in the size of the dictionary. Here, we propose a variant called approximate matching pursuit (AMP) that is faster than MP while maintaining scalability and accuracy. Unlike MP, AMP uses an approximate nearest-neighbor (ANN) algorithm to find the closest match in a dictionary in sublinear time. One such ANN algorithm, locality-sensitive hashing (LSH), is a probabilistic hash algorithm that places similar, yet not identical, observations into the same bin. While the accuracy of AMP is comparable to similar MP methods, the computational complexity is reduced. Also, by using LSH, this method scales easily; the dictionary can be expanded without reorganizing any data structures

    Data-Driven Sound Track Generation

    Get PDF
    Background music is often used to generate a specific atmosphere or to draw our attention to specific events. For example in movies or computer games it is often the accompanying music that conveys the emotional state of a scene and plays an important role for immersing the viewer or player into the virtual environment. In view of home-made videos, slide shows, and other consumer-generated visual media streams, there is a need for computer-assisted tools that allow users to generate aesthetically appealing music tracks in an easy and intuitive way. In this contribution, we consider a data-driven scenario where the musical raw material is given in form of a database containing a variety of audio recordings. Then, for a given visual media stream, the task consists in identifying, manipulating, overlaying, concatenating, and blending suitable music clips to generate a music stream that satisfies certain constraints imposed by the visual data stream and by user specifications. It is our main goal to give an overview of various content-based music processing and retrieval techniques that become important in data-driven sound track generation. In particular, we sketch a general pipeline that highlights how the various techniques act together and come into play when generating musically plausible transitions between subsequent music clips

    Efficient methods for joint estimation of multiple fundamental frequencies in music signals

    Get PDF
    This study presents efficient techniques for multiple fundamental frequency estimation in music signals. The proposed methodology can infer harmonic patterns from a mixture considering interactions with other sources and evaluate them in a joint estimation scheme. For this purpose, a set of fundamental frequency candidates are first selected at each frame, and several hypothetical combinations of them are generated. Combinations are independently evaluated, and the most likely is selected taking into account the intensity and spectral smoothness of its inferred patterns. The method is extended considering adjacent frames in order to smooth the detection in time, and a pitch tracking stage is finally performed to increase the temporal coherence. The proposed algorithms were evaluated in MIREX contests yielding state of the art results with a very low computational burden.This study was supported by the project DRIMS (code TIN2009-14247-C02), the Consolider Ingenio 2010 research programme (project MIPRCV, CSD2007-00018), and the PASCAL2 Network of Excellence, IST-2007-216886

    Analysis and resynthesis of polyphonic music

    Get PDF
    This thesis examines applications of Digital Signal Processing to the analysis, transformation, and resynthesis of musical audio. First I give an overview of the human perception of music. I then examine in detail the requirements for a system that can analyse, transcribe, process, and resynthesise monaural polyphonic music. I then describe and compare the possible hardware and software platforms. After this I describe a prototype hybrid system that attempts to carry out these tasks using a method based on additive synthesis. Next I present results from its application to a variety of musical examples, and critically assess its performance and limitations. I then address these issues in the design of a second system based on Gabor wavelets. I conclude by summarising the research and outlining suggestions for future developments
    corecore