54,921 research outputs found
Final Research Report for Sound Design and Audio Player
This deliverable describes the work on Task 4.3 Algorithms for sound design and feature developments for audio player. The audio player runs on the in-store player (ISP) and takes care of rendering the music playlists via beat-synchronous automatic DJ mixing, taking advantage of the rich musical content description extracted in T4.2 (beat markers, structural segmentation into intro and outro, musical and sound content classification).
The deliverable covers prototypes and final results on: (1) automatic beat-synchronous mixing by beat alignment and time stretching â we developed an algorithm for beat alignment and scheduling of time-stretched tracks; (2) compensation of play duration changes introduced by time stretching â in order to make the playlist generator independent of beat mixing, we chose to readjust the tempo of played tracks such that their stretched duration is the same as their original duration; (3) prospective research on the extraction of data from DJ mixes â to alleviate the lack of extensive ground truth databases of DJ mixing practices, we propose steps towards extracting this data from existing mixes by alignment and unmixing of the tracks in a mix. We also show how these methods can be evaluated even without labelled test data, and propose an open dataset for further research; (4) a description of the software player module, a GUI-less application to run on the ISP that performs streaming of tracks from disk and beat-synchronous mixing.
The estimation of cue points where tracks should cross-fade is now described in D4.7 Final Research Report on Auto-Tagging of Music.EC/H2020/688122/EU/Artist-to-Business-to-Business-to-Consumer Audio Branding System/ABC D
Methodological considerations concerning manual annotation of musical audio in function of algorithm development
In research on musical audio-mining, annotated music databases are needed which allow the development of computational tools that extract from the musical audiostream the kind of high-level content that users can deal with in Music Information Retrieval (MIR) contexts. The notion of musical content, and therefore the notion of annotation, is ill-defined, however, both in the syntactic and semantic sense. As a consequence, annotation has been approached from a variety of perspectives (but mainly linguistic-symbolic oriented), and a general methodology is lacking. This paper is a step towards the definition of a general framework for manual annotation of musical audio in function of a computational approach to musical audio-mining that is based on algorithms that learn from annotated data. 1
Recommended from our members
Factors in human recognition of timbre lexicons generated by data clustering
Since the development of sound recording technologies, the palette of sound timbres available for music creation was extended way beyond traditional musical instruments. The organization and categorization of timbre has been a common endeavor. The availability of large databases of sound clips provides an opportunity for obtaining datadriven timbre categorizations via content-based clustering. In this article we describe an experiment aimed at understanding what factors influence the process of learning a given clustering of sound samples. We clustered a large database of short sound clips, and analyzed the success of participants in assigning sounds to the âcorrectâ clusters after listening to a few examples of each. The results of the experiment suggest a number of relevant factors related both to the strategies followed by users and to the quality measures of the clustering solution, which can guide the design of creative applications based on audio clip clustering
Problems and opportunities of applying data-& audio-mining techniques to ethnic music
[TODO] Add abstract here
An experiment in audio classification from compressed data
In this paper we present an algorithm for automatic classification of sound into speech, instrumental sound/ music and silence. The method is based on thresholding of features derived from the modulation envelope of the frequency limited audio signal. Four characteristics are examined for discrimination: the occurrence and duration of energy peaks, rhythmic content and the level of harmonic content. The proposed algorithm allows classification directly on MPEG-1 audio bitstreams. The performance of the classifier was evaluated on TRECVID test data. The test results are above-average among all TREC participants. The approaches adopted by other research groups participating in TREC are also discussed
Audio Caption: Listen and Tell
Increasing amount of research has shed light on machine perception of audio
events, most of which concerns detection and classification tasks. However,
human-like perception of audio scenes involves not only detecting and
classifying audio sounds, but also summarizing the relationship between
different audio events. Comparable research such as image caption has been
conducted, yet the audio field is still quite barren. This paper introduces a
manually-annotated dataset for audio caption. The purpose is to automatically
generate natural sentences for audio scene description and to bridge the gap
between machine perception of audio and image. The whole dataset is labelled in
Mandarin and we also include translated English annotations. A baseline
encoder-decoder model is provided for both English and Mandarin. Similar BLEU
scores are derived for both languages: our model can generate understandable
and data-related captions based on the dataset.Comment: accepted by ICASSP201
- âŠ