20 research outputs found
Multimodal music information processing and retrieval: survey and future challenges
Towards improving the performance in various music information processing
tasks, recent studies exploit different modalities able to capture diverse
aspects of music. Such modalities include audio recordings, symbolic music
scores, mid-level representations, motion, and gestural data, video recordings,
editorial or cultural tags, lyrics and album cover arts. This paper critically
reviews the various approaches adopted in Music Information Processing and
Retrieval and highlights how multimodal algorithms can help Music Computing
applications. First, we categorize the related literature based on the
application they address. Subsequently, we analyze existing information fusion
approaches, and we conclude with the set of challenges that Music Information
Retrieval and Sound and Music Computing research communities should focus in
the next years
Improving Structure Evaluation Through Automatic Hierarchy Expansion
Structural segmentation is the task of partitioning a recording into non-overlapping time intervals, and labeling each segment with an identifying marker such as A, B, or verse. Hierarchical structure annotation expands this idea to allow an annotator to segment a song with multiple levels of granularity. While there has been recent progress in developing evaluation criteria for comparing two hierarchical annotations of the same recording, the existing methods have known deficiencies when dealing with inexact label matchings and sequential label repetition. In this article, we investigate methods for automatically enhancing structural annotations by inferring (and expanding) hierarchical information from the segment labels. The proposed method complements existing techniques for comparing hierarchical structural annotations by coarsening or refining labels with variation markers to either collapse similarly labeled segments together, or separate identically labeled segments from each other. Using the multi-level structure annotations provided in the SALAMI dataset, we demonstrate that automatic hierarchy expansion allows structure comparison methods to more accurately assess similarity between annotations
The Skipping Behavior of Users of Music Streaming Services and its Relation to Musical Structure
The behavior of users of music streaming services is investigated from the
point of view of the temporal dimension of individual songs; specifically, the
main object of the analysis is the point in time within a song at which users
stop listening and start streaming another song ("skip"). The main contribution
of this study is the ascertainment of a correlation between the distribution in
time of skipping events and the musical structure of songs. It is also shown
that such distribution is not only specific to the individual songs, but also
independent of the cohort of users and, under stationary conditions, date of
observation. Finally, user behavioral data is used to train a predictor of the
musical structure of a song solely from its acoustic content; it is shown that
the use of such data, available in large quantities to music streaming
services, yields significant improvements in accuracy over the customary
fashion of training this class of algorithms, in which only smaller amounts of
hand-labeled data are available
Pitchclass2vec: Symbolic Music Structure Segmentation with Chord Embeddings
Structure perception is a fundamental aspect of music cognition in humans.
Historically, the hierarchical organization of music into structures served as
a narrative device for conveying meaning, creating expectancy, and evoking
emotions in the listener. Thereby, musical structures play an essential role in
music composition, as they shape the musical discourse through which the
composer organises his ideas. In this paper, we present a novel music
segmentation method, pitchclass2vec, based on symbolic chord annotations, which
are embedded into continuous vector representations using both natural language
processing techniques and custom-made encodings. Our algorithm is based on
long-short term memory (LSTM) neural network and outperforms the
state-of-the-art techniques based on symbolic chord annotations in the field
Crowdsourcing Emotions in Music Domain
An important source of intelligence for music emotion recognition today comes from user-provided
community tags about songs or artists. Recent crowdsourcing approaches such as harvesting social tags,
design of collaborative games and web services or the use of Mechanical Turk, are becoming popular in
the literature. They provide a cheap, quick and efficient method, contrary to professional labeling of songs
which is expensive and does not scale for creating large datasets. In this paper we discuss the viability of
various crowdsourcing instruments providing examples from research works. We also share our own
experience, illustrating the steps we followed using tags collected from Last.fm for the creation of two
music mood datasets which are rendered public. While processing affect tags of Last.fm, we observed that
they tend to be biased towards positive emotions; the resulting dataset thus contain more positive songs
than negative ones
Razvoj platforme Trubadur in novi izzivi v prihajajočih letih
Trubadur je odprtokodna platforma za urjenje glasbenega posluha z avtomatiziranimi vajami ritmičnega in intervalnega nareka. Platformo smo ovrednotili z dijaki Konservatorija za glasbo in balet Ljubljana v šolskih letih 2018/19–2020/21. Rezultati evalvacije so pokazali, da lahko uporaba platforme poveča uspešnost pri testih in predstavlja dopolnitev učenja na daljavo
Automatic Personalized Playlist Generation
Käesolevas magistritöös on esitatud automaatse personaliseeritud pleilisti tekitaja probleemi lähenemisviiside uuring. Lisaks teoreetilise tausta lühiülevaatele me dokumenteerisime oma lähenemist: meie poolt tehtud katsed ning nende tulemused. Meie algoritm koosneb kahest põhiosast: pleilisti hindamisfunktsiooni konstrueerimine ning pleilisti genereerimisstrateegia valik. Esimese ülesande lahendamiseks on valitud Naive Bayes klassifitseerija ning 5-elemendiline MIRtoolbox tööristakasti poolt kavandatud audio sisupõhiste attribuutide vektor, mis klassiitseerivad pleilisti heaks või halvaks 82% täpsusega - palju parem kui juhuslik klassifitseerija (50%). Teise probleemi lahendamiseks proovisime kolm genereerimisalgoritmi: lohistus (Shuffle), randomiseeritud otsing (Randomized Search) ning geneetiline algoritm (Genetic Algorithm). Vastavalt katsete tulemustele kõige paremini ja kiiremini töötab randomiseeritud otsingu algoritm. Kõik katsed on tehtud 5 ning 10 elemendilistel pleilistidel.
Kokkuvõttes, oleme arendanud automatiseeritud personaliseeritud pleilisti tekitaja algoritmi, mis vastavalt meie hinnangutele vastab ka kasutaja ootustele rohkem, kui juhuslikud lohistajad. Algoritmi võib kasutada keerulisema pleilistide konstrueerimiseks
Multimodal Music Information Processing and Retrieval: Survey and Future Challenges
Towards improving the performance in various music information processing tasks, recent studies exploit different modalities able to capture diverse aspects of music. Such modalities include audio recordings, symbolic music scores, mid-level representations, motion and gestural data, video recordings, editorial or cultural tags, lyrics and album cover arts. This paper critically reviews the various approaches adopted in Music Information Processing and Retrieval, and highlights how multimodal algorithms can help Music Computing applications. First, we categorize the related literature based on the application they address. Subsequently, we analyze existing information fusion approaches, and we conclude with the set of challenges that Music Information Retrieval and Sound and Music Computing research communities should focus in the next years