3,791 research outputs found
STRUCTURED SPARSITY FOR AUTOMATIC MUSIC TRANSCRIPTION
© 2012 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
Analysis, Visualization, and Transformation of Audio Signals Using Dictionary-based Methods
date-added: 2014-01-07 09:15:58 +0000 date-modified: 2014-01-07 09:15:58 +0000date-added: 2014-01-07 09:15:58 +0000 date-modified: 2014-01-07 09:15:58 +000
Automatic music transcription: challenges and future directions
Automatic music transcription is considered by many to be a key enabling technology in music signal processing. However, the performance of transcription systems is still significantly below that of a human expert, and accuracies reported in recent years seem to have reached a limit, although the field is still very active. In this paper we analyse limitations of current methods and identify promising directions for future research. Current transcription methods use general purpose models which are unable to capture the rich diversity found in music signals. One way to overcome the limited performance of transcription systems is to tailor algorithms to specific use-cases. Semi-automatic approaches are another way of achieving a more reliable transcription. Also, the wealth of musical scores and corresponding audio data now available are a rich potential source of training data, via forced alignment of audio to scores, but large scale utilisation of such data has yet to be attempted. Other promising approaches include the integration of information from multiple algorithms and different musical aspects
Automatic transcription of polyphonic music exploiting temporal evolution
PhDAutomatic music transcription is the process of converting an audio recording
into a symbolic representation using musical notation. It has numerous applications
in music information retrieval, computational musicology, and the
creation of interactive systems. Even for expert musicians, transcribing polyphonic
pieces of music is not a trivial task, and while the problem of automatic
pitch estimation for monophonic signals is considered to be solved, the creation
of an automated system able to transcribe polyphonic music without setting
restrictions on the degree of polyphony and the instrument type still remains
open.
In this thesis, research on automatic transcription is performed by explicitly
incorporating information on the temporal evolution of sounds. First efforts address
the problem by focusing on signal processing techniques and by proposing
audio features utilising temporal characteristics. Techniques for note onset and
offset detection are also utilised for improving transcription performance. Subsequent
approaches propose transcription models based on shift-invariant probabilistic
latent component analysis (SI-PLCA), modeling the temporal evolution
of notes in a multiple-instrument case and supporting frequency modulations in
produced notes. Datasets and annotations for transcription research have also
been created during this work. Proposed systems have been privately as well as
publicly evaluated within the Music Information Retrieval Evaluation eXchange
(MIREX) framework. Proposed systems have been shown to outperform several
state-of-the-art transcription approaches.
Developed techniques have also been employed for other tasks related to music
technology, such as for key modulation detection, temperament estimation,
and automatic piano tutoring. Finally, proposed music transcription models
have also been utilized in a wider context, namely for modeling acoustic scenes
Синтеза и анализа звукова развијених из Бозe-Ајнштајновог кондензата: теорија и експериментални резултати
Two seemingly incompatible worlds of quantum physics and acoustics have their
meeting point in experiments with the Bose-Einstein Condensate. From the
very beginning, the Quantum Music project was based on the idea of converting
the acoustic phenomena of quantum physics that appear in experiments into the
sound domain accessible to the human ear. Te frst part of this paper describes the
experimental conditions in which these acoustic phenomena occur. Te second
part of the paper describes the process of sound synthesis which was used to
generate fnal sounds. Sound synthesis was based on the use of two types of basic
data: theoretical formulas and the results of experiments with the Bose-Einstein
condensate. Te process of sound synthesis based on theoretical equations was
conducted following the principles of additive synthesis, realized using the Java
Script and Max MSP software. The synthesis of sounds based on the results of
experiments was done using the MatLab sofware. Te third part or the article deals
with the acoustic analysis of the generated sounds, indicating some of the acoustic
phenomena that have emerged. Also, we discuss the possible ways of using such
sounds in the process of composing and performing contemporary music.Мало је позната чињеница да постоји област квантне физике која се означава термином квантна акустика. Два, наизглед неспојива света – свет звука који је неодвојиви део наше физичке реалности, и свет квантних честица – спајају се у серији експеримената које је могуће спровести над материјом која се хлади на незамисливо ниске температуре,реда величине милијардитог дела једног келвина. Ова температура, која се званично сматра најнижом у целом универзуму, доводи атоме у специјално стање битисања, које носи назив Бозе-Ајнштајнов кондензат. Након освајања технологије потребне за обављање оваквихе ксперимената, дошло је до наглог развоја методологије за праћење најразличитијих својстава које материја у овако чудесном стању почиње да испољава, укључујући и могућност побуђивања механичких,тј. акустичких таласа. Основно исходиште пројекта Квантна музика управо је овај експеримент, чији резултати и математички модели који га описују омогућавају синтезу звука. Ова синтеза обављена је директном применом експерименталних резултата у софтверу MatLab, као и употребом математичких формула које описују вибрације кондензата у наменски дизајнираном адитивном синтисајзеру реализованом у МахМSP софтверском окружењу. Као резултат ове синтезе, формиране су банке звукова који су засновани на вибрацијама квантног система. Овако генерисани звуци испољавају веома интересантна акустичка својства, оличена пре свега у интензивној појави акустичког феномена избијања. Оваква ситуација доводи до појаве интересантних звучних ефеката који звуке, чији се спектрални садржај не мења у времену, преводе у временски променљиве звучне догађаје са изузетно занимљивим ритмичко-мелодијским структурама, које могу бити накнадно контролисане и употребљаване у процесу компоновања и извођења музике. Први део овог рада описује експерименталне услове у којима се ови акустички феномени испољавају. Други део рада описује процес синтезе звука коришћен за генерисањеаудиофајлова.Синтеза звука базирана је на употреби два основна типа података: теоријских формула и резултата експеримента са Бозе-Ајнштајновим кондензатом. Трећи део рада бави се акустичком анализом генерисаних звукова, уз указивање на некеакустичке феномене до чијег испољавања је дошло у процесу синтезе звука. Такође, дате су и основне смернице у вези са начинима употребе овако генерисаних звукова у процесу компоновања и извођења музике
Non-Negative Group Sparsity with Subspace Note Modelling for Polyphonic Transcription
This work was supported by EPSRC Platform Grant EPSRC EP/K009559/1, EPSRC Grant EP/L027119/1, and EPSRC Grant EP/J010375/1
Sparse and Nonnegative Factorizations For Music Understanding
In this dissertation, we propose methods for sparse and nonnegative factorization that are specifically suited for analyzing musical signals. First, we discuss two constraints that aid factorization of musical signals: harmonic and co-occurrence constraints. We propose a novel dictionary learning method that imposes harmonic constraints upon the atoms of the learned dictionary while allowing the dictionary size to grow appropriately during the learning procedure. When there is significant spectral-temporal overlap among the musical sources, our method outperforms popular existing matrix factorization methods as measured by the recall and precision of learned dictionary atoms. We also propose co-occurrence constraints -- three simple and convenient multiplicative update rules for nonnegative matrix factorization (NMF) that enforce dependence among atoms. Using examples in music transcription, we demonstrate the ability of these updates to represent each musical note with multiple atoms and cluster the atoms for source separation purposes.
Second, we study how spectral and temporal information extracted by nonnegative factorizations can improve upon musical instrument recognition. Musical instrument recognition in melodic signals is difficult, especially for classification systems that rely entirely upon spectral information instead of temporal information. Here, we propose a simple and effective method of combining spectral and temporal information for instrument recognition. While existing classification methods use traditional features such as statistical moments, we extract novel features from spectral and temporal atoms generated by NMF using a biologically motivated multiresolution gamma filterbank. Unlike other methods that require thresholds, safeguards, and hierarchies, the proposed spectral-temporal method requires only simple filtering and a flat classifier.
Finally, we study how to perform sparse factorization when a large dictionary of musical atoms is already known. Sparse coding methods such as matching pursuit (MP) have been applied to problems in music information retrieval such as transcription and source separation with moderate success. However, when the set of dictionary atoms is large, identification of the best match in the dictionary with the residual is slow -- linear in the size of the dictionary. Here, we propose a variant called approximate matching pursuit (AMP) that is faster than MP while maintaining scalability and accuracy. Unlike MP, AMP uses an approximate nearest-neighbor (ANN) algorithm to find the closest match in a dictionary in sublinear time. One such ANN algorithm, locality-sensitive hashing (LSH), is a probabilistic hash algorithm that places similar, yet not identical, observations into the same bin. While the accuracy of AMP is comparable to similar MP methods, the computational complexity is reduced. Also, by using LSH, this method scales easily; the dictionary can be expanded without reorganizing any data structures
Data-Driven Sound Track Generation
Background music is often used to generate a specific atmosphere or to draw our attention to specific events. For example in movies or computer games it is often the accompanying music that conveys the emotional state of a scene and plays an important role for immersing the viewer or player into the virtual environment. In view of home-made videos, slide shows, and other consumer-generated visual media streams, there is a need for computer-assisted tools that allow users to generate aesthetically appealing music tracks in an easy and intuitive way. In this contribution, we consider a data-driven scenario where the musical raw material is given in form of a database containing a variety of audio recordings. Then, for a given visual media stream, the task consists in identifying, manipulating, overlaying, concatenating, and blending suitable music clips to generate a music stream that satisfies certain constraints imposed by the visual data stream and by user specifications. It is our main goal to give an overview of various content-based music processing and retrieval techniques that become important in data-driven sound track generation. In particular, we sketch a general pipeline that highlights how the various techniques act together and come into play when generating musically plausible transitions between subsequent music clips
Efficient methods for joint estimation of multiple fundamental frequencies in music signals
This study presents efficient techniques for multiple fundamental frequency estimation in music signals. The proposed methodology can infer harmonic patterns from a mixture considering interactions with other sources and evaluate them in a joint estimation scheme. For this purpose, a set of fundamental frequency candidates are first selected at each frame, and several hypothetical combinations of them are generated. Combinations are independently evaluated, and the most likely is selected taking into account the intensity and spectral smoothness of its inferred patterns. The method is extended considering adjacent frames in order to smooth the detection in time, and a pitch tracking stage is finally performed to increase the temporal coherence. The proposed algorithms were evaluated in MIREX contests yielding state of the art results with a very low computational burden.This study was supported by the project DRIMS (code TIN2009-14247-C02), the Consolider Ingenio 2010 research programme (project MIPRCV, CSD2007-00018), and the PASCAL2 Network of Excellence, IST-2007-216886
Analysis and resynthesis of polyphonic music
This thesis examines applications of Digital Signal Processing to the analysis, transformation, and resynthesis of musical audio. First I give an overview of the human perception of music. I then examine in detail the requirements for a system that can analyse, transcribe, process, and resynthesise monaural polyphonic music. I then describe and compare the possible hardware and software platforms. After this I describe a prototype hybrid system that attempts to carry out these tasks using a method based on additive synthesis. Next I present results from its application to a variety of musical examples, and critically assess its performance and limitations. I then address these issues in the design of a second system based on Gabor wavelets. I conclude by summarising the research and outlining suggestions for future developments
- …