483 research outputs found
Addressing Tempo Estimation Octave Errors in Electronic Music by Incorporating Style Information Extracted From Wikipedia
(Abstract to follow
Masked Conditional Neural Networks for sound classification
The remarkable success of deep convolutional neural networks in image-related applications has led to their adoption also for sound processing. Typically the input is a time–frequency representation such as a spectrogram, and in some cases this is treated as a two-dimensional image. However, spectrogram properties are very different to those of natural images. Instead of an object occupying a contiguous region in a natural image, frequencies of a sound are scattered about the frequency axis of a spectrogram in a pattern unique to that particular sound. Applying conventional convolution neural networks has therefore required extensive hand-tuning, and presented the need to find an architecture better suited to the time–frequency properties of audio. We introduce the ConditionaL Neural Network (CLNN)1 and its extension, the Masked ConditionaL Neural Network (MCLNN) designed to exploit the nature of sound in a time–frequency representation. The CLNN is, broadly speaking, linear across frequencies but non-linear across time: it conditions its inference at a particular time based on preceding and succeeding time slices, and the MCLNN use a controlled systematic sparseness that embeds a filterbank-like behavior within the network. Additionally, the MCLNN automates the concurrent exploration of several feature combinations analogous to hand-crafting the optimum combination of features for a recognition task. We have applied the MCLNN to the problem of music genre classification, and environmental sound recognition on several music (Ballroom, GTZAN, ISMIR2004, and Homburg), and environmental sound (Urbansound8K, ESC-10, and ESC-50) datasets. The classification accuracy of the MCLNN surpasses neural networks based architectures including state-of-the-art Convolutional Neural Networks and several hand-crafted attempts
Interacting Attention-gated Recurrent Networks for Recommendation
Capturing the temporal dynamics of user preferences over items is important
for recommendation. Existing methods mainly assume that all time steps in
user-item interaction history are equally relevant to recommendation, which
however does not apply in real-world scenarios where user-item interactions can
often happen accidentally. More importantly, they learn user and item dynamics
separately, thus failing to capture their joint effects on user-item
interactions. To better model user and item dynamics, we present the
Interacting Attention-gated Recurrent Network (IARN) which adopts the attention
model to measure the relevance of each time step. In particular, we propose a
novel attention scheme to learn the attention scores of user and item history
in an interacting way, thus to account for the dependencies between user and
item dynamics in shaping user-item interactions. By doing so, IARN can
selectively memorize different time steps of a user's history when predicting
her preferences over different items. Our model can therefore provide
meaningful interpretations for recommendation results, which could be further
enhanced by auxiliary features. Extensive validation on real-world datasets
shows that IARN consistently outperforms state-of-the-art methods.Comment: Accepted by ACM International Conference on Information and Knowledge
Management (CIKM), 201
Potential Passenger Flow Prediction: A Novel Study for Urban Transportation Development
Recently, practical applications for passenger flow prediction have brought
many benefits to urban transportation development. With the development of
urbanization, a real-world demand from transportation managers is to construct
a new metro station in one city area that never planned before. Authorities are
interested in the picture of the future volume of commuters before constructing
a new station, and estimate how would it affect other areas. In this paper,
this specific problem is termed as potential passenger flow (PPF) prediction,
which is a novel and important study connected with urban computing and
intelligent transportation systems. For example, an accurate PPF predictor can
provide invaluable knowledge to designers, such as the advice of station scales
and influences on other areas, etc. To address this problem, we propose a
multi-view localized correlation learning method. The core idea of our strategy
is to learn the passenger flow correlations between the target areas and their
localized areas with adaptive-weight. To improve the prediction accuracy, other
domain knowledge is involved via a multi-view learning process. We conduct
intensive experiments to evaluate the effectiveness of our method with
real-world official transportation datasets. The results demonstrate that our
method can achieve excellent performance compared with other available
baselines. Besides, our method can provide an effective solution to the
cold-start problem in the recommender system as well, which proved by its
outperformed experimental results
Automatic transcription of polyphonic music exploiting temporal evolution
PhDAutomatic music transcription is the process of converting an audio recording
into a symbolic representation using musical notation. It has numerous applications
in music information retrieval, computational musicology, and the
creation of interactive systems. Even for expert musicians, transcribing polyphonic
pieces of music is not a trivial task, and while the problem of automatic
pitch estimation for monophonic signals is considered to be solved, the creation
of an automated system able to transcribe polyphonic music without setting
restrictions on the degree of polyphony and the instrument type still remains
open.
In this thesis, research on automatic transcription is performed by explicitly
incorporating information on the temporal evolution of sounds. First efforts address
the problem by focusing on signal processing techniques and by proposing
audio features utilising temporal characteristics. Techniques for note onset and
offset detection are also utilised for improving transcription performance. Subsequent
approaches propose transcription models based on shift-invariant probabilistic
latent component analysis (SI-PLCA), modeling the temporal evolution
of notes in a multiple-instrument case and supporting frequency modulations in
produced notes. Datasets and annotations for transcription research have also
been created during this work. Proposed systems have been privately as well as
publicly evaluated within the Music Information Retrieval Evaluation eXchange
(MIREX) framework. Proposed systems have been shown to outperform several
state-of-the-art transcription approaches.
Developed techniques have also been employed for other tasks related to music
technology, such as for key modulation detection, temperament estimation,
and automatic piano tutoring. Finally, proposed music transcription models
have also been utilized in a wider context, namely for modeling acoustic scenes
Music feature extraction and analysis through Python
En l'era digital, plataformes com Spotify s'han convertit en els principals canals de consum de mĂşsica, ampliant les possibilitats per analitzar i entendre la mĂşsica a travĂ©s de les dades. Aquest projecte es centra en un examen exhaustiu d'un conjunt de dades obtingut de Spotify, utilitzant Python com a eina per a l'extracciĂł i anĂ lisi de dades. L'objectiu principal es centra en la creaciĂł d'aquest conjunt de dades, emfatitzant una Ă mplia varietat de cançons de diversos subgèneres. La intenciĂł Ă©s representar tant el panorama musical mĂ©s tendenciĂłs i popular com els nĂnxols, alineant-se amb el concepte de distribuciĂł de Cua Llarga, terme popularitzat com a "Long Tail" en anglès, que destaca el potencial de mercat de productes de nĂnxols amb menor popularitat. A travĂ©s de l'anĂ lisi, es posen de manifest patrons en l'evoluciĂł de les caracterĂstiques musicals al llarg de les dècades passades. Canvis en caracterĂstiques com l'energia, el volum, la capacitat de ball, el positivisme que desprèn una cançó i la seva correlaciĂł amb la popularitat sorgeixen del conjunt de dades. Paral·lelament a aquesta anĂ lisi, es concep un sistema de recomanaciĂł musical basat en el contingut del conjunt de dades creat. L'objectiu Ă©s connectar cançons, especialment les menys conegudes, amb possibles oients. Aquest projecte ofereix perspectives beneficioses per a entusiastes de la mĂşsica, cientĂfics de dades i professionals de la indĂşstria. Les metodologies implementades i l'anĂ lisi realitzat presenten un punt de convergència de la ciència de dades i la indĂşstria de la mĂşsica en el context digital actualEn la era digital, plataformas como Spotify se han convertido en los principales canales de consumo de mĂşsica, ampliando las posibilidades para analizar y entender la mĂşsica a travĂ©s de los datos. Este proyecto se centra en un examen exhaustivo de un conjunto de datos obtenido de Spotify, utilizando Python como herramienta para la extracciĂłn y análisis de datos. El objetivo principal se centra en la creaciĂłn de este conjunto de datos, enfatizando una amplia variedad de canciones de diversos subgĂ©neros. La intenciĂłn es representar tanto el panorama musical más tendencioso y popular como los nichos, alineándose con el concepto de distribuciĂłn de Cola Larga, tĂ©rmino popularizado como Long Tail en inglĂ©s, que destaca el potencial de mercado de productos de nichos con menor popularidad. A travĂ©s del análisis, se evidencian patrones en la evoluciĂłn de las caracterĂsticas musicales a lo largo de las dĂ©cadas pasadas. Cambios en caracterĂsticas como la energĂa, el volumen, la capacidad de baile, el positivismo que desprende una canciĂłn y su correlaciĂłn con la popularidad surgen del conjunto de datos. Paralelamente a este análisis, se concibe un sistema de recomendaciĂłn musical basado en el contenido del conjunto de datos creado. El objetivo es conectar canciones, especialmente las menos conocidas, con posibles oyentes. Este proyecto ofrece perspectivas beneficiosas para entusiastas de la mĂşsica, cientĂficos de datos y profesionales de la industria. Las metodologĂas implementadas y el análisis realizado presentan un punto de convergencia de la ciencia de datos y la industria de la mĂşsica en el contexto digital actualIn the digital era, platforms like Spotify have become the primary channels of music consumption, broadening the possibilities for analyzing and understanding music through data. This project focuses on a comprehensive examination of a dataset sourced from Spotify, with Python as the tool for data extraction and analysis. The primary objective centers around the creation of this dataset, emphasizing a diverse range of songs from various subgenres. The intention is to represent both mainstream and niche musical landscapes, aligning with the Long Tail distribution concept, which highlights the market potential of less popular niche products. Through analysis, patterns in the evolution of musical features over past decades become evident. Shifts in features such as energy, loudness, danceability, and valence and their correlation with popularity emerge from the dataset. Parallel to this analysis is the conceptualization of a music recommendation system based on the content of the data set. The aim is to connect tracks, especially lesser-known ones, with potential listeners. This project provides insights beneficial for music enthusiasts, data scientists, and industry professionals. The methodologies and analyses present a convergence of data science and the music industry in today's digital contex
Audio source separation for music in low-latency and high-latency scenarios
Aquesta tesi proposa mètodes per tractar les limitacions de les tècniques existents de separaciĂł de fonts musicals en condicions de baixa i alta latència. En primer lloc, ens centrem en els mètodes amb un baix cost computacional i baixa latència. Proposem l'Ăşs de la regularitzaciĂł de Tikhonov com a mètode de descomposiciĂł de l'espectre en el context de baixa latència. El comparem amb les tècniques existents en tasques d'estimaciĂł i seguiment dels tons, que sĂłn passos crucials en molts mètodes de separaciĂł. A continuaciĂł utilitzem i avaluem el mètode de descomposiciĂł de l'espectre en tasques de separaciĂł de veu cantada, baix i percussiĂł. En segon lloc, proposem diversos mètodes d'alta latència que milloren la separaciĂł de la veu cantada, grĂ cies al modelatge de components especĂfics, com la respiraciĂł i les consonants. Finalment, explorem l'Ăşs de correlacions temporals i anotacions manuals per millorar la separaciĂł dels instruments de percussiĂł i dels senyals musicals polifònics complexes.Esta tesis propone mĂ©todos para tratar las limitaciones de las tĂ©cnicas existentes de separaciĂłn de fuentes musicales en condiciones de baja y alta latencia. En primer lugar, nos centramos en los mĂ©todos con un bajo coste computacional y baja latencia. Proponemos el uso de la regularizaciĂłn de Tikhonov como mĂ©todo de descomposiciĂłn del espectro en el contexto de baja latencia. Lo comparamos con las tĂ©cnicas existentes en tareas de estimaciĂłn y seguimiento de los tonos, que son pasos cruciales en muchos mĂ©todos de separaciĂłn. A continuaciĂłn utilizamos y evaluamos el mĂ©todo de descomposiciĂłn del espectro en tareas de separaciĂłn de voz cantada, bajo y percusiĂłn. En segundo lugar, proponemos varios mĂ©todos de alta latencia que mejoran la separaciĂłn de la voz cantada, gracias al modelado de componentes que a menudo no se toman en cuenta, como la respiraciĂłn y las consonantes. Finalmente, exploramos el uso de correlaciones temporales y anotaciones manuales para mejorar la separaciĂłn de los instrumentos de percusiĂłn y señales musicales polifĂłnicas complejas.This thesis proposes specific methods to address the limitations of current music source separation methods in low-latency and high-latency scenarios. First, we focus on methods with low computational cost and low latency. We propose the use of Tikhonov regularization as a method for spectrum decomposition in the low-latency context. We compare it to existing techniques in pitch estimation and tracking tasks, crucial steps in many separation methods. We then use the proposed spectrum decomposition method in low-latency separation tasks targeting singing voice, bass and drums. Second, we propose several high-latency methods that improve the separation of singing voice by modeling components that are often not accounted for, such as breathiness and consonants. Finally, we explore using temporal correlations and human annotations to enhance the separation of drums and complex polyphonic music signals
- …