140 research outputs found

    Comparative analysis of music recordings from western and non-western traditions by automatic tonal feature extraction

    No full text
    The automatic analysis of large musical corpora by means of computational models overcomes some limitations of manual analysis, and the unavailability of scores for most existing music makes necessary to work with audio recordings. Until now, research on this area has focused on music from the Western tradition. Nevertheless, we might ask if the available methods are suitable when analyzing music from other cultures. We present an empirical approach to the comparative analysis of audio recordings, focusing on tonal features and data mining techniques. Tonal features are related to the pitch class distribution, pitch range and employed scale, gamut and tuning system. We provide our initial but promising results obtained when trying to automatically distinguish music from Western and nonWestern traditions; we analyze which descriptors are most relevant and study their distribution over 1500 pieces from different traditions and styles. As a result, some feature distributions differ for Western and non-Western music, and the obtained classification accuracy is higher than 80% for different classification algorithms and an independent test set. These results show that automatic description of audio signals together with data mining techniques provide means to characterize huge music collections from different traditions and complement musicological manual analyses

    Voice assignment in vocal quartets using deep learning models based on pitch salience

    No full text
    This paper deals with the automatic transcription of four-part, a cappella singing, audio performances. In particular, we exploit an existing, deep-learning based, multiple F0 estimation method and complement it with two neural network architectures for voice assignment (VA) in order to create a music transcription system that converts an input audio mixture into four pitch contours. To train our VA models, we create a novel synthetic dataset by collecting 5381 choral music scores from public-domain music archives, which we make publicly available for further research. We compare the performance of the proposed VA models on different types of input data, as well as to a hidden Markov model-based baseline system. In addition, we assess the generalization capabilities of these models on audio recordings with differing pitch distributions and vocal music styles. Our experiments show that the two proposed models, a CNN and a ConvLSTM, have very similar performance, and both of them outperform the baseline HMM-based system. We also observe a high confusion rate between the alto and tenor voice parts, which commonly have overlapping pitch ranges, while the bass voice has the highest scores in all evaluated scenarios.This work is partially supported by the European Commission under the TROMPA project (H2020 770376), the Spanish Ministry of Science and Innovation under the Musical AI project (PID2019-111403GB-I00), and by AGAUR (Generalitat de Catalunya) through an FI Predoctoral Grant (2018FI-B01015)

    Automatic transcription of flamenco singing from polyphonic music recordings

    No full text
    Automatic note-level transcription is considered one of the most challenging tasks in music information retrieval. The specific case of flamenco singing transcription poses a particular challenge due to its complex melodic progressions, intonation inaccuracies, the use of a high degree of ornamentation, and the presence of guitar accompaniment. In this study, we explore the limitations of existing state of the art transcription systems for the case of flamenco singing and propose a specific solution for this genre: We first extract the predominant melody and apply a novel contour filtering process to eliminate segments of the pitch contour which originate from the guitar accompaniment. We formulate a set of onset detection functions based on volume and pitch characteristics to segment the resulting vocal pitch contour into discrete note events. A quantised pitch label is assigned to each note event by combining global pitch class probabilities with local pitch contour statistics. The proposed system outperforms state of the art singing transcription systems with respect to voicing accuracy, onset detection, and overall performance when evaluated on flamenco singing datasets.This work was supported in part by the Ph.D. Fellowship of the Department of Information and Communication Technologies, Universitat Pompeu Fabra and in part by the projects SIGMUS (TIN2012-36650) and COFLA II (P12-TIC-1362)

    A Model for evaluating popularity and semantic information variations in radio listening sessions

    No full text
    Comunicació presentada a: 1st Workshop on the Impact of Recommender Systems dins el 13th ACM Conference on Recommender Systems (RecSys 2019) celebrat el 19 de setembre de 2019 a Copenhagen, Dinamarca.Listening to music radios is an activity that since the 20th century is part of the cultural habits for people all over the world. While in the case of analog radios DJs are in charge of selecting the music to be broadcasted, nowadays recommender systems analyzing users’ behaviours can automatically generate radios tailored to users’ musical taste. Nonetheless, in both cases listening sessions do not depend on the listener choices, but on a set of external recommendations received. In this preliminary study, we propose a model for estimating features’ variation during listening sessions, comparing different scenarios, namely analog radios, personalized and not-personalized streaming radios. In particular, we focus on the analysis of track popularity and semantic information, features well-established in the Music Information Retrieval literature. The presented model aims to quantify the possible impacts of the sessions’ variation on the user listening experience.This work is partially supported by the European Commission under the TROMPA project (H2020 770376)

    20 years of playlists: A statistical analysis on popularity and diversity

    No full text
    Comunicació presentada a: 20th annual conference of the International Society for Music Information Retrieval (ISMIR) celebrart del 4 al 8 de novembre de 2019 a Delft, Països Baixos.Grouping songs together, according to music preferences, mood or other characteristics, is an activity which reflects personal listening behaviours and tastes. In the last two decades, due to the increasing size of music catalogue accessible and to improvements of recommendation algorithms, people have been exposed to new ways for creating playlists. In this work, through the statistical analysis of more than 400K playlists from four datasets, created in different temporal and technological contexts, we aim to understand if it is possible to extract information about the evolution of humans strategies for playlist creation. We focus our analysis on two driving concepts of the Music Information Retrieval literature: popularity and diversity.This work is partially supported by the European Commission under the TROMPA project (H2020 770376)

    Melody extraction from polyphonic music signal susing pitch contour characteristics

    No full text
    We present a novel system for the automatic extraction of the main melody from polyphonic music recordings. Our approach is based on the creation and characterisation of pitch contours, time continuous sequences of pitch candidates grouped using auditory streaming cues. We define a set of contour characteristics and show that by studying their distributions we can devise rules to distinguish between melodic and non-melodic contours. This leads to the development of new voicing detection, octave error minimisation and melody selection techniques. A comparative evaluation of the proposed approach shows that it outperforms current state-of-the-art melody extraction systems in terms of overall accuracy. Further evaluation of the algorithm is provided in the form of a qualitative error analysis and the study of the effect of key parameters and algorithmic components on system performance. Finally, we conduct a glass ceiling analysis to study the current limitations of the method, and possible directions for future work are proposed

    Automatic transcription of flamenco singing from polyphonic music recordings

    No full text
    Automatic note-level transcription is considered one of the most challenging tasks in music information retrieval. The specific case of flamenco singing transcription poses a particular challenge due to its complex melodic progressions, intonation inaccuracies, the use of a high degree of ornamentation, and the presence of guitar accompaniment. In this study, we explore the limitations of existing state of the art transcription systems for the case of flamenco singing and propose a specific solution for this genre: We first extract the predominant melody and apply a novel contour filtering process to eliminate segments of the pitch contour which originate from the guitar accompaniment. We formulate a set of onset detection functions based on volume and pitch characteristics to segment the resulting vocal pitch contour into discrete note events. A quantised pitch label is assigned to each note event by combining global pitch class probabilities with local pitch contour statistics. The proposed system outperforms state of the art singing transcription systems with respect to voicing accuracy, onset detection, and overall performance when evaluated on flamenco singing datasets.This work was supported in part by the Ph.D. Fellowship of the Department of Information and Communication Technologies, Universitat Pompeu Fabra and in part by the projects SIGMUS (TIN2012-36650) and COFLA II (P12-TIC-1362)

    A Model for evaluating popularity and semantic information variations in radio listening sessions

    No full text
    Comunicació presentada a: 1st Workshop on the Impact of Recommender Systems dins el 13th ACM Conference on Recommender Systems (RecSys 2019) celebrat el 19 de setembre de 2019 a Copenhagen, Dinamarca.Listening to music radios is an activity that since the 20th century is part of the cultural habits for people all over the world. While in the case of analog radios DJs are in charge of selecting the music to be broadcasted, nowadays recommender systems analyzing users’ behaviours can automatically generate radios tailored to users’ musical taste. Nonetheless, in both cases listening sessions do not depend on the listener choices, but on a set of external recommendations received. In this preliminary study, we propose a model for estimating features’ variation during listening sessions, comparing different scenarios, namely analog radios, personalized and not-personalized streaming radios. In particular, we focus on the analysis of track popularity and semantic information, features well-established in the Music Information Retrieval literature. The presented model aims to quantify the possible impacts of the sessions’ variation on the user listening experience.This work is partially supported by the European Commission under the TROMPA project (H2020 770376)

    Trustworthy artificial intelligence requirements in the autonomous driving domain

    No full text
    We identify the maturity level of the different requirements for artificial intelligence (AI) in autonomous driving and outline the main challenges to be addressed in the future to ensure that automotive AI systems are developed in a trustworthy way.The authors acknowledge main funding from the Human Behavior and Machine Intelligence project of the European Commission Joint Research Center. Other research grants that have partially contributed to this work were provided by Community Region of Madrid (S2018/EMT-4362 and SEGVAUTO 4.0-CM) and Spanish Ministry of Science and Innovation (DPI2017-90035-R and PID2020-114924RB-I00)

    Hierarchical multi-scale set-class analysis

    No full text
    This work presents a systematic methodology for set-class surface analysis using temporal multi-scale techniques. The method extracts the set-class content of all the possible temporal segments, addressing the representational problems derived from the massive overlapping of segments. A time versus time-scale representation, named class-scape, provides a global hierarchical overview of the class content in the piece, and it serves as a visual index for interactive inspection. Additional data structures summarize the set-class inclusion relations over time and quantify the class and subclass content in pieces or collections, helping to decide about sets of analytical interest. Case studies include the comparative subclass characterization of diatonicism in Victoria's masses (in Ionian mode) and Bach's preludes and fugues (in major mode), as well as the structural analysis of Webern's Variations for piano op. 27, under different class-equivalences.This work was supported by the EU Seventh Framework Programme FP7/2007-2013 through PHENICX project [grant no. 601166]
    corecore