19 research outputs found
The Skipping Behavior of Users of Music Streaming Services and its Relation to Musical Structure
The behavior of users of music streaming services is investigated from the
point of view of the temporal dimension of individual songs; specifically, the
main object of the analysis is the point in time within a song at which users
stop listening and start streaming another song ("skip"). The main contribution
of this study is the ascertainment of a correlation between the distribution in
time of skipping events and the musical structure of songs. It is also shown
that such distribution is not only specific to the individual songs, but also
independent of the cohort of users and, under stationary conditions, date of
observation. Finally, user behavioral data is used to train a predictor of the
musical structure of a song solely from its acoustic content; it is shown that
the use of such data, available in large quantities to music streaming
services, yields significant improvements in accuracy over the customary
fashion of training this class of algorithms, in which only smaller amounts of
hand-labeled data are available
Accelerating the Mixing Phase in Studio Recording Productions by Automatic Audio Alignemtnt
International audienceWe propose a system for accelerating the mixing phase in a recording production, by making use of audio alignment techniques to automatically align multiple takes of excerpts of a music piece against a performance of the whole work. We extend the approach of our previous work, based on sequential Montecarlo inference techniques, that was targeted at real-time alignment for score/audio following. The proposed approach is capable of producing partial alignments as well as identifying relevant regions in the partial results with regards to the reference, for better integration within a studio mix workflow. The approach is evaluated using data obtained from two recording sessions of classical music pieces, and we discuss its effectiveness for reducing manual work in a production chain
Alignment and Identification of Multimedia Data: Application to Music and Gesture Processing
The overwhelming availability of large multimedia collections poses increasingly challenging research problems regarding the organization of, and access to data. A general consensus has been reached in the Information Retrieval community, asserting the need for tools that move past metadata-based techniques and exploit directly the information contained in the media. At the same time, interaction with content has evolved beyond the traditional passive enjoyment paradigm, bringing forth the demand for advanced control and manipulation options.
The aim of this thesis is to investigate techniques for multimedia data alignment and identification. In particular, music audio streams and gesture-capture time series are considered. Special attention is given to the efficiency of the proposed approaches, namely the realtime applicability of alignment algorithms and the scalability of identification strategies.
The concept of alignment refers to the identification and matching of corresponding substructures in related entities. The focus of this thesis is directed towards alignment of sequences with respect to a single dimension, aiming at the identification and matching of significant events in related time series.
The alignment of audio recordings of music to their symbolic representations serves as a starting point to explore different methodologies based on statistical models. A unified model for the real time alignment of music audio streams to both symbolic scores and audio references is proposed. Its advantages are twofold: unlike most state-of-the-art systems, tempo is an explicit parameter within the stochastic framework; moreover, both alignment problems can be formulated within a common framework by exploiting a continuous representation of the reference content. A novel application of audio alignment techniques was found in the domain of studio recording productions, reducing the human effort spent in manual repetitive tasks.
Gesture alignment is closely related to the domain of music alignment, as the artistic aims and engineering solutions of both areas largely overlap. Expressivity in gesture performance can be characterized by both the choice of a particular gesture and the way the gesture is executed. The former aspect involves a gesture recognition task, while the latter is addressed considering the time-evolution of features and the way these differ from pre-recorded templates. A model, closely related to the mentioned music alignment strategy, is proposed, capable of simultaneously recognizing a gesture among many templates and aligning it against the correct reference in realtime, while jointly estimating signal feature such as rotation, scaling, velocity.
Due to the increasingly large volume of music collections, the organization of media items according to their perceptual characteristics has become of fundamental importance. In particular, content-based identification technologies provide the tools to retrieve and organize music documents. Music identification techniques should ideally be able to identify a recording -- by comparing it against a set of known recordings -- independently from the particular performance, even in case of significantly different arrangements and interpretations.
Even though alignment techniques play a central role in many works of the music identification literature, the proposed methodology addresses the task using techniques that are usually associated to textual IR. Similarity computation is based on hashing, attempting at creating collisions between vectors that are close in the feature space. The resulting compactness of the representation of audio content allows index-based retrieval strategies to be exploited for maximizing computational efficiency.
A particular application is considered, regarding Cultural Heritage preservation institutions. A methodology is proposed to automatically identify recordings in collections of digitized tapes and vinyl discs. This scenario differs significantly from that of a typical identification task, as a query most often contains more than one relevant result (distinct music work). The audio alignment methodology mentioned above is finally exploited to carry out a precise segmentation of recordings into their individual tracks.La crescente disponibilità di grandi collezioni multimediali porta all'attenzione problemi di ricerca sempre più complessi in materia di organizzazione e accesso ai dati. Nell'ambito della comunità dell'Information Retrieval è stato raggiunto un consenso generale nel ritenere indispensabili nuovi strumenti di reperimento in grado di superare i limiti delle metodologie basate su meta-dati, sfruttando direttamente l'informazione che risiede nel contenuto multimediale.
Lo scopo di questa tesi è lo sviluppo di tecniche per l'allineamento e l'identificazione di contenuti multimediali; la trattazione si focalizza su flussi audio musicali e sequenze numeriche registrate tramite dispositivi di cattura del movimento. Una speciale attenzione è dedicata all'efficienza degli approcci proposti, in particolare per quanto riguarda l'applicabilità in tempo reale degli algoritmi di allineamento e la scalabilità delle metodologie di identificazione.
L'allineamento di entità comparabili si riferisce al processo di aggiustamento di caratteristiche strutturali allo scopo di permettere una comparazione diretta tra elementi costitutivi corrispondenti. Questa tesi si concentra sull'allineamento di sequenze rispettivamente ad una sola dimensione, con l'obiettivo di identificare e confrontare eventi significativi in sequenze temporali collegate.
L'allineamento di registrazioni musicali alla loro rappresentazione simbolica è il punto di partenza adottato per esplorare differenti metodologie basate su modelli statistici. Si propone un modello unificato per l'allineamento in tempo reale di flussi musicali a partiture simboliche e registrazioni audio. I principali vantaggi sono collegati alla trattazione esplicita del tempo (velocità di esecuzione musicale) nell'architettura del modello statistico; inoltre, ambedue i problemi di allineamento sono formulati sfruttando una rappresentazione continua della dimensione temporale. Un'innovativa applicazione delle tecnologie di allineamento audio è proposta nel contesto della produzione di registrazioni musicali, dove l'intervento umano in attività ripetitive è drasticamente ridotto.
L'allineamento di movimenti gestuali è strettamente correlato al contesto dell'allineamento musicale, in quanto gli obiettivi artistici e le soluzioni ingegneristiche delle due aree sono largamente coincidenti. L'espressività di un'esecuzione gestuale è caratterizzata simultaneamente dalla scelta del particolare gesto e dal modo di eseguirlo. Il primo aspetto è collegato ad un problema di riconoscimento, mentre il secondo è affrontato considerando l'evoluzione temporale delle caratteristiche del segnale ed il modo in cui queste differiscono da template pre-registrati. Si propone un modello, strettamente legato alla controparte musicale sopra citata, capace di riconoscere un gesto in tempo reale tra una libreria di templates, simultaneamente allineandolo mentre caratteristiche del segnale come rotazione, dimensionamento e velocità sono congiuntamente stimate.
Il drastico incremento delle dimensioni delle collezioni musicali ha portato all'attenzione il problema dell'organizzazione di contenuti multimediali secondo caratteristiche percettive. In particolare, le tecnologie di identificazione basate sul contenuto forniscono strumenti appropriati per reperire e organizzare documenti musicali. Queste tecnologie dovrebbero idealmente essere in grado di identificare una registrazione -- attraverso il confronto con un insieme di registrazioni conosciute -- indipendentemente dalla particolare esecuzione, anche in caso di arrangiamenti o interpretazioni significativamente differenti.
Sebbene le tecniche di allineamento assumano un ruolo centrale in letteratura, la metodologia proposta sfrutta strategie solitamente associate al reperimento di informazione testuale. Il calcolo della similarità musicale è basato su tecniche di hashing per creare collisioni fra vettori prossimi nello spazio. La compattezza della risultante rappresentazione del contenuto acustico permette l'utilizzo di tecniche di reperimento basate su indicizzazione, allo scopo di massimizzare l'efficienza computazionale.
Un'applicazione in particolare è considerata nell'ambito della preservazione dei Beni Culturali, per l'identificazione automatica di collezioni di nastri e dischi in vinile digitalizzati. In questo contesto un supporto generalmente contiene più di un'opera rilevante. La metodologia di allineamento audio citata sopra è infine utilizzata per segmentare registrazioni in tracce individuali
A Discrete Filter Bank Approach to Audio to Score Matching for Polyphonic Music
This paper presents a system developed for tracking the position of a polyphonic music performance in a symbolic score, possibly in real time. The system, based on Hidden Markov Models, is briefly presented, focusing on specific aspects such as observation modeling based on discrete filterbanks, in contrast with traditional FFT-based approaches, and describing the approaches to decoding. Experimental results are provided to assess the validity of the presented model. Proof-of-concept applications are shown, which
effectively employ the described approach beyond the traditional automatic accompaniment system
Automatic Alignment of Music Performances with Scores Aimed at Educational Applications
We present a system for automatic real time alignment of an acoustic music performance with a digital representation of its score, a problem which is usually defined score following. The alignment is based on an application of hidden Markov models. A model is automatically built from a music score, while decoding is used to compute the most probable location of the performance along the score model. The effectiveness of the proposed approach has been tested with a collection of recordings of orchestral music. Even it the typical application of a score following system is automatic accompaniment, in this paper we propose a set of novel applications that are targeted also to non musicians, for educational use
A Unified Approach to Real Time Audio-to-Score and Audio-to-Audio Alignment Using Sequential Montecarlo Inference Techniques
International audienceWe present a methodology for the real time alignment of music signals using sequential Montecarlo inference techniques. The alignment problem is formulated as the state tracking of a dynamical system, and differs from traditional Hidden Markov Model - Dynamic Time Warping based systems in that the hidden state is continuous rather than discrete. The major contribution of this paper is addressing both problems of audio-to-score and audio-to-audio alignment within the same framework in a real time setting. Performances of the proposed methodology on both problems are then evaluated and discussed
Content-based Cover Song Identification in Music Digital Libraries
The availability of large music repositories poses challenging research problems. Among all, content-based identification is gaining increasing interest because it can provide new tools for easy music access and retrieval. In this paper, we propose a methodology for cover identification, in particular focusing on pop and rock genres, which is also motivated by the large amount of user-generated music content that is available online. Identification is based on the use of chroma features, which are indexed to achieve high scalability on the size of the collection. To this end, the approach does not exploit techniques that have a complexity linear with the size of the collection, such as alignment between the query and the documents or other forms of direct comparison. Evaluation results are provided to show the validity of the approach in two different scenarios, either as an independent cover identification system or as a clustering technique for a
query-based partitioning of the music collection