35,997 research outputs found
Final Research Report for Sound Design and Audio Player
This deliverable describes the work on Task 4.3 Algorithms for sound design and feature developments for audio player. The audio player runs on the in-store player (ISP) and takes care of rendering the music playlists via beat-synchronous automatic DJ mixing, taking advantage of the rich musical content description extracted in T4.2 (beat markers, structural segmentation into intro and outro, musical and sound content classification).
The deliverable covers prototypes and final results on: (1) automatic beat-synchronous mixing by beat alignment and time stretching â we developed an algorithm for beat alignment and scheduling of time-stretched tracks; (2) compensation of play duration changes introduced by time stretching â in order to make the playlist generator independent of beat mixing, we chose to readjust the tempo of played tracks such that their stretched duration is the same as their original duration; (3) prospective research on the extraction of data from DJ mixes â to alleviate the lack of extensive ground truth databases of DJ mixing practices, we propose steps towards extracting this data from existing mixes by alignment and unmixing of the tracks in a mix. We also show how these methods can be evaluated even without labelled test data, and propose an open dataset for further research; (4) a description of the software player module, a GUI-less application to run on the ISP that performs streaming of tracks from disk and beat-synchronous mixing.
The estimation of cue points where tracks should cross-fade is now described in D4.7 Final Research Report on Auto-Tagging of Music.EC/H2020/688122/EU/Artist-to-Business-to-Business-to-Consumer Audio Branding System/ABC D
Weakly Supervised Audio Source Separation via Spectrum Energy Preserved Wasserstein Learning
Separating audio mixtures into individual instrument tracks has been a long
standing challenging task. We introduce a novel weakly supervised audio source
separation approach based on deep adversarial learning. Specifically, our loss
function adopts the Wasserstein distance which directly measures the
distribution distance between the separated sources and the real sources for
each individual source. Moreover, a global regularization term is added to
fulfill the spectrum energy preservation property regardless separation. Unlike
state-of-the-art weakly supervised models which often involve deliberately
devised constraints or careful model selection, our approach need little prior
model specification on the data, and can be straightforwardly learned in an
end-to-end fashion. We show that the proposed method performs competitively on
public benchmark against state-of-the-art weakly supervised methods
The songwriting coalface: where multiple intelligences collide
This paper investigates pedagogy around songwriting professional practice. Particular focus is given to the multiple intelligence theory of Howard Gardner as a lens through which to view songwriting practice, referenced to recent songwritingâspecific research (e.g. McIntyre, Bennett). Songwriting education provides some unique challenges; firstly, due to the qualitative nature of assessment and the complex and multiâfaceted nature of skills necessary (lyric writing, composing, recording, and performing), and secondly, in some lessâtangible capacities beneficial to the songwriter (creative skills, and nuanced choiceâmaking). From the perspective of songwriting education, Gardnerâs MI theory provides a âuseful fictionâ (his term) for knowledge transfer in the domain, especially (and for this researcher, surprisingly) in naturalistic intelligence
Nonparametric estimation of the dynamic range of music signals
The dynamic range is an important parameter which measures the spread of
sound power, and for music signals it is a measure of recording quality. There
are various descriptive measures of sound power, none of which has strong
statistical foundations. We start from a nonparametric model for sound waves
where an additive stochastic term has the role to catch transient energy. This
component is recovered by a simple rate-optimal kernel estimator that requires
a single data-driven tuning. The distribution of its variance is approximated
by a consistent random subsampling method that is able to cope with the massive
size of the typical dataset. Based on the latter, we propose a statistic, and
an estimation method that is able to represent the dynamic range concept
consistently. The behavior of the statistic is assessed based on a large
numerical experiment where we simulate dynamic compression on a selection of
real music signals. Application of the method to real data also shows how the
proposed method can predict subjective experts' opinions about the hifi quality
of a recording
Final Research Report on Auto-Tagging of Music
The deliverable D4.7 concerns the work achieved by IRCAM until M36 for the âauto-tagging of musicâ. The deliverable is a research report. The software libraries resulting from the research have been integrated into Fincons/HearDis! Music Library Manager or are used by TU Berlin. The final software libraries are described in D4.5.
The research work on auto-tagging has concentrated on four aspects:
1) Further improving IRCAMâs machine-learning system ircamclass. This has been done by developing the new MASSS audio features, including audio augmentation and audio segmentation into ircamclass. The system has then been applied to train HearDis! âsoftâ features (Vocals-1, Vocals-2, Pop-Appeal, Intensity, Instrumentation, Timbre, Genre, Style). This is described in Part 3.
2) Developing two sets of âhardâ features (i.e. related to musical or musicological concepts) as specified by HearDis! (for integration into Fincons/HearDis! Music Library Manager) and TU Berlin (as input for the prediction model of the GMBI attributes). Such features are either derived from previously estimated higher-level concepts (such as structure, key or succession of chords) or by developing new signal processing algorithm (such as HPSS) or main melody estimation. This is described in Part 4.
3) Developing audio features to characterize the audio quality of a music track. The goal is to describe the quality of the audio independently of its apparent encoding. This is then used to estimate audio degradation or music decade. This is to be used to ensure that playlists contain tracks with similar audio quality. This is described in Part 5.
4) Developing innovative algorithms to extract specific audio features to improve music mixes. So far, innovative techniques (based on various Blind Audio Source Separation algorithms and Convolutional Neural Network) have been developed for singing voice separation, singing voice segmentation, music structure boundaries estimation, and DJ cue-region estimation. This is described in Part 6.EC/H2020/688122/EU/Artist-to-Business-to-Business-to-Consumer Audio Branding System/ABC D
- âŠ