4 research outputs found
Motivic Pattern Classification of Music Audio Signals Combining Residual and LSTM Networks
Motivic pattern classification from music audio recordings is a challenging task. More so in the case of a cappella flamenco cantes, characterized by complex melodic variations, pitch instability, timbre changes, extreme vibrato oscillations, microtonal ornamentations, and noisy conditions of the recordings. Convolutional Neural Networks (CNN) have proven to be very effective algorithms in image classification. Recent work in large-scale audio classification has shown that CNN architectures, originally developed for image problems, can be applied successfully to audio event recognition and classification with little or no modifications to the networks. In this paper, CNN architectures are tested in a more nuanced problem: flamenco cantes intra-style classification using small motivic patterns. A new architecture is proposed that uses the advantages of residual CNN as feature extractors, and a bidirectional LSTM layer to exploit the sequential nature of musical audio data. We present a full end-to-end pipeline for audio music classification that includes a sequential pattern mining technique and a contour simplification method to extract relevant motifs from audio recordings. Mel-spectrograms of the extracted motifs are then used as the input for the different architectures tested. We investigate the usefulness of motivic patterns for the automatic classification of music recordings and the effect of the length of the audio and corpus size on the overall classification accuracy. Results show a relative accuracy improvement of up to 20.4% when CNN architectures are trained using acoustic representations from motivic patterns
NATSA: A Near-Data Processing Accelerator for Time Series Analysis
Time series analysis is a key technique for extracting and predicting events
in domains as diverse as epidemiology, genomics, neuroscience, environmental
sciences, economics, and more. Matrix profile, the state-of-the-art algorithm
to perform time series analysis, computes the most similar subsequence for a
given query subsequence within a sliced time series. Matrix profile has low
arithmetic intensity, but it typically operates on large amounts of time series
data. In current computing systems, this data needs to be moved between the
off-chip memory units and the on-chip computation units for performing matrix
profile. This causes a major performance bottleneck as data movement is
extremely costly in terms of both execution time and energy.
In this work, we present NATSA, the first Near-Data Processing accelerator
for time series analysis. The key idea is to exploit modern 3D-stacked High
Bandwidth Memory (HBM) to enable efficient and fast specialized matrix profile
computation near memory, where time series data resides. NATSA provides three
key benefits: 1) quickly computing the matrix profile for a wide range of
applications by building specialized energy-efficient floating-point arithmetic
processing units close to HBM, 2) improving the energy efficiency and execution
time by reducing the need for data movement over slow and energy-hungry buses
between the computation units and the memory units, and 3) analyzing time
series data at scale by exploiting low-latency, high-bandwidth, and
energy-efficient memory access provided by HBM. Our experimental evaluation
shows that NATSA improves performance by up to 14.2x (9.9x on average) and
reduces energy by up to 27.2x (19.4x on average), over the state-of-the-art
multi-core implementation. NATSA also improves performance by 6.3x and reduces
energy by 10.2x over a general-purpose NDP platform with 64 in-order cores.Comment: To appear in the 38th IEEE International Conference on Computer
Design (ICCD 2020
Repertoire-Specific Vocal Pitch Data Generation for Improved Melodic Analysis of Carnatic Music
Deep Learning methods achieve state-of-the-art in many tasks, including vocal pitch extraction. However, these methods rely on the availability of pitch track annotations without errors, which are scarce and expensive to obtain for Carnatic Music. Here we identify the tradition-related challenges and propose tailored solutions to generate a novel, large, and open dataset, the Saraga-Carnatic-Melody-Synth (SCMS), comprising audio mixtures and time-aligned vocal pitch annotations. Through a cross-cultural evaluation leveraging this novel dataset, we show improvements in the performance of Deep Learning vocal pitch extraction methods on Indian Art Music recordings. Additional experiments show that the trained models outperform the currently used heuristic-based pitch extraction solutions for the computational melodic analysis of Carnatic Music and that this improvement leads to better results in the musicologically relevant task of repeated melodic pattern discovery when evaluated using expert annotations. The code and annotations are made available for reproducibility. The novel dataset and trained models are also integrated into the Python package compIAM1 which allows them to be used out-of-the-box
Mining melodic patterns in large audio collections of Indian art music
Comunicaci贸 presentada a la 10th International Conference on Signal Image Technology and Internet Based Systems celebrada del 23 al 27 de novembre de 2014 a Marr脿queix (Marroc).Discovery of repeating structures in music is
fundamental to its analysis, understanding and interpretation.
We present a data-driven approach for the discovery of shorttime
melodic patterns in large collections of Indian art music.
The approach first discovers melodic patterns within an audio
recording and subsequently searches for their repetitions in the
entire music collection. We compute similarity between melodic
patterns using dynamic time warping (DTW). Furthermore, we
investigate four different variants of the DTW cost function
for rank refinement of the obtained results. The music collection
used in this study comprises 1,764 audio recordings
with a total duration of 365 hours. Over 13 trillion DTW
distance computations are done for the entire dataset. Due
to the computational complexity of the task, different lower
bounding and early abandoning techniques are applied during
DTW distance computation. An evaluation based on expert
feedback on a subset of the dataset shows that the discovered
melodic patterns are musically relevant. Several musically
interesting relationships are discovered, yielding further scope
for establishing novel similarity measures based on melodic
patterns. The discovered melodic patterns can further be used
in challenging computational tasks such as automatic r炉aga
recognition, composition identification and music recommendation.This work is partly supported by the European Research
Council under the European Union鈥檚 Seventh Framework
Program, as part of the CompMusic project (ERC
grant agreement 267583). J.S. acknowledges 2009-SGR-
1434 from Generalitat de Catalunya, ICT-2011-8-318770
from the European Commission, JAEDOC069/2010 from
CSIC, and European Social Funds