54 research outputs found
Deep Polyphonic ADSR Piano Note Transcription
We investigate a late-fusion approach to piano transcription, combined with a
strong temporal prior in the form of a handcrafted Hidden Markov Model (HMM).
The network architecture under consideration is compact in terms of its number
of parameters and easy to train with gradient descent. The network outputs are
fused over time in the final stage to obtain note segmentations, with an HMM
whose transition probabilities are chosen based on a model of attack, decay,
sustain, release (ADSR) envelopes, commonly used for sound synthesis. The note
segments are then subject to a final binary decision rule to reject too weak
note segment hypotheses. We obtain state-of-the-art results on the MAPS
dataset, and are able to outperform other approaches by a large margin, when
predicting complete note regions from onsets to offsets.Comment: 5 pages, 2 figures, published as ICASSP'1
From Music Ontology Towards Ethno-Music-Ontology
This paper presents exploratory work investigating the suitability of the Music Ontology - the most widely used formal specification of the music domain - for modelling non-Western musical traditions. Four contrasting case studies from a variety of musical cultures are analysed: Dutch folk song research, reconstructive performance of rural Russian traditions, contemporary performance and composition of Persian classical music, and recreational use of a personal world music collection. We propose semantic models describing the respective do- mains and examine the applications of the Music Ontology for these case studies: which concepts can be successfully reused, where they need adjustments, and which parts of the reality in these case studies are not covered by the Mu- sic Ontology. The variety of traditions, contexts and modelling goals covered by our case studies sheds light on the generality of the Music Ontology and on the limits of generalisation “for all musics” that could be aspired for on the Semantic Web
An efficient temporally-constrained probabilistic model for multiple-instrument music transcription
In this paper, an efficient, general-purpose model for multiple instrument polyphonic music transcription is proposed. The model is based on probabilistic latent component analysis and supports the use of sound state spectral templates, which represent the temporal evolution of each note (e.g. attack, sustain, decay). As input, a variable-Q transform (VQT) time-frequency representation is used. Computational efficiency is achieved by supporting the use of pre-extracted and pre-shifted sound state templates. Two variants are presented: without temporal constraints and with hidden Markov model-based constraints controlling the appearance of sound states. Experiments are performed on benchmark transcription datasets: MAPS, TRIOS, MIREX multiF0, and Bach10; results on multi-pitch detection and instrument assignment show that the proposed models outperform the state-of-the-art for multiple-instrument transcription and is more than 20 times faster compared to a previous sound state-based model. We finally show that a VQT representation can lead to improved multi-pitch detection performance compared with constant-Q representations
Cross-cultural mood perception in pop songs and its alignment with mood detection algorithms
Do people from different cultural backgrounds perceive the mood in music the same way? How closely do human ratings across different cultures approximate automatic mood detection algorithms that are often trained on corpora of predominantly Western popular music? Analyzing 166 participants responses from Brazil, South Korea, and the US, we examined the similarity between the ratings of nine categories of perceived moods in music and estimated their alignment with four popular mood detection algorithms. We created a dataset of 360 recent pop songs drawn from major music charts of the countries and constructed semantically identical mood descriptors across English, Korean, and Portuguese languages. Multiple participants from the three countries rated their familiarity, preference, and perceived moods for a given song. Ratings were highly similar within and across cultures for basic mood attributes such as sad, cheerful, and energetic. However, we found significant cross-cultural differences for more complex characteristics such as dreamy and love. To our surprise, the results of mood detection algorithms were uniformly correlated across human ratings from all three countries and did not show a detectable bias towards any particular culture. Our study thus suggests that the mood detection algorithms can be considered as an objective measure at least within the popular music context
K-Pop Genres: A Cross-Cultural Exploration
The Proceedings can be viewed at: http://www.ppgia.pucpr.br/ismir2013/wp-content/uploads/2013/10/Proceedings-ISMIR2013-Final.pdfPoster Session 3Current music genre research tends to focus heavily on
classical and popular music from Western cultures. Few
studies discuss the particular challenges and issues related
to non-Western music. The objective of this study is to
improve our understanding of how genres are used and
perceived in different cultures. In particular, this study
attempts to fill gaps in our understanding by examining
K-pop music genres used in Korea and comparing them
with genres used in North America. We provide background
information on K-pop genres by analyzing 602
genre-related labels collected from eight major music distribution
websites in Korea. In addition, we report upon a
user study in which American and Korean users annotated
genre information for 1894 K-pop songs in order to
understand how their perceptions might differ or agree.
The results show higher consistency among Korean users
than American users demonstrated by the difference in
Fleiss’ Kappa values and proportion of agreed genre labels.
Asymmetric disagreements between Americans and
Koreans on specific genres reveal some interesting differences
in the perception of genres. Our findings provide
some insights into challenges developers may face in creating
global music services.published_or_final_versio
TarsosDSP, a real-time audio processing framework in Java
This paper presents TarsosDSP, a framework for real-time audio analysis and processing. Most libraries and frameworks offer either audio analysis and feature extraction or audio synthesis and processing. TarsosDSP is one of a only a few frameworks that offers both analysis, processing and feature extraction in real-time, a unique feature in the Java ecosystem. The framework contains practical audio processing algorithms, it can be extended easily, and has no external dependencies. Each algorithm is implemented as simple as possible thanks to a straightforward processing pipeline. TarsosDSP's features include a resampling algorithm, onset detectors, a number of pitch estimation algorithms, a time stretch algorithm, a pitch shifting algorithm, and an algorithm to calculate the Constant-Q. The framework also allows simple audio synthesis, some audio effects, and several filters. The Open Source framework is a valuable contribution to the MIR-Community and ideal fit for interactive MIR-applications on Android
Pitchclass2vec: Symbolic Music Structure Segmentation with Chord Embeddings
Structure perception is a fundamental aspect of music cognition in humans.
Historically, the hierarchical organization of music into structures served as
a narrative device for conveying meaning, creating expectancy, and evoking
emotions in the listener. Thereby, musical structures play an essential role in
music composition, as they shape the musical discourse through which the
composer organises his ideas. In this paper, we present a novel music
segmentation method, pitchclass2vec, based on symbolic chord annotations, which
are embedded into continuous vector representations using both natural language
processing techniques and custom-made encodings. Our algorithm is based on
long-short term memory (LSTM) neural network and outperforms the
state-of-the-art techniques based on symbolic chord annotations in the field
Semantic Integration of MIR Datasets with the Polifonia Ontology Network
Integration between different data formats, and between data belonging to different collections, is an ongoing challenge in the MIR field. Semantic Web tools have proved to be promising resources for making different types of music information interoperable. However, the use of these technologies has so far been limited and scattered in the field. To address this, the Polifonia project is developing an ontological ecosystem that can cover a wide variety of musical aspects (musical features, instruments, emotions, performances). In this paper, we present the Polifonia Ontology Network, an ecosystem that enables and fosters the transition towards semantic MIR
- …