795 research outputs found
Identification of expressive descriptors for style extraction in music analysis using linear and nonlinear models
La formalización de las interpretaciones expresivas aún se considera relevante debido a la complejidad de la música. La interpretación expresiva forma un aspecto importante de la música, teniendo en cuenta diferentes convenciones como géneros o estilos que una interpretación puede desarrollar con el tiempo. Modelar la relación entre las expresiones musicales y los aspectos estructurales de la información acústica requiere una base probabilística y estadística mínima para la robustez, validación y reproducibilidad de aplicaciones computacionales. Por lo tanto, es necesaria una relación cohesiva y una justificación sobre los resultados. Esta tesis se sustenta en la teoría y aplicaciones de modelos discriminativos y generativos en el marco del aprendizaje de maquina y la relación de procedimientos sistemáticos con los conceptos de la musicología utilizando técnicas de procesamiento de señales y minería de datos. Los resultados se validaron mediante pruebas estadísticas y una experimentación no paramétrica con la implementación de un conjunto de métricas para medir aspectos acústicos y temporales de archivos de audio para entrenar un modelo discriminativo y mejorar el proceso de síntesis de un modelo neuronal profundo. Adicionalmente, el modelo implementado presenta la oportunidad para la aplicación de procedimientos sistemáticos, automatización de transcripciones usando notación musical, entrenamiento de habilidades auditivas para estudiantes de música y mejorar la implementación de redes neuronales profundas usando CPU en lugar de GPU debido a las ventajas de las redes convolucionales para el procesamiento de archivos de audio como vectores o matriz con una secuencia de notas.MaestríaMagister en Ingeniería Electrónic
Prediction in polyphony: modelling musical auditory scene analysis
PhDHow do we know that a melody is a melody? In other words, how does the human brain extract
melody from a polyphonic musical context? This thesis begins with a theoretical presentation
of musical auditory scene analysis (ASA) in the context of predictive coding and rule-based
approaches and takes methodological and analytical steps to evaluate selected components of
a proposed integrated framework for musical ASA, unified by prediction. Predictive coding
has been proposed as a grand unifying model of perception, action and cognition and is based
on the idea that brains process error to refine models of the world. Existing models of ASA
tackle distinct subsets of ASA and are currently unable to integrate all the acoustic and
extensive contextual information needed to parse auditory scenes. This thesis proposes a
framework capable of integrating all relevant information contributing to the understanding of
musical auditory scenes, including auditory features, musical features, attention, expectation
and listening experience, and examines a subset of ASA issues – timbre perception in relation
to musical training, modelling temporal expectancies, the relative salience of musical
parameters and melody extraction – using probabilistic approaches. Using behavioural
methods, attention is shown to influence streaming perception based on timbre more than
instrumental experience. Using probabilistic methods, information content (IC) for temporal
aspects of music as generated by IDyOM (information dynamics of music; Pearce, 2005), are
validated and, along with IC for pitch and harmonic aspects of the music, are subsequently
linked to perceived complexity but not to salience. Furthermore, based on the hypotheses that
a melody is internally coherent and the most complex voice in a piece of polyphonic music,
IDyOM has been extended to extract melody from symbolic representations of chorales by J.S.
Bach and a selection of string quartets by W.A. Mozart
Measuring Expressive Music Performances: a Performance Science Model using Symbolic Approximation
Music Performance Science (MPS), sometimes termed systematic musicology in Northern Europe, is concerned with designing, testing and applying quantitative measurements to music performances. It has applications in art musics, jazz and other genres. It is least concerned with aesthetic judgements or with ontological considerations of artworks that stand alone from their instantiations in performances. Musicians deliver expressive performances by manipulating multiple, simultaneous variables including, but not limited to: tempo, acceleration and deceleration, dynamics, rates of change of dynamic levels, intonation and articulation. There are significant complexities when handling multivariate music datasets of significant scale. A critical issue in analyzing any types of large datasets is the likelihood of detecting meaningless relationships the more dimensions are included. One possible choice is to create algorithms that address both volume and complexity. Another, and the approach chosen here, is to apply techniques that reduce both the dimensionality and numerosity of the music datasets while assuring the statistical significance of results. This dissertation describes a flexible computational model, based on symbolic approximation of timeseries, that can extract time-related characteristics of music performances to generate performance fingerprints (dissimilarities from an ‘average performance’) to be used for comparative purposes. The model is applied to recordings of Arnold Schoenberg’s Phantasy for Violin with Piano Accompaniment, Opus 47 (1949), having initially been validated on Chopin Mazurkas.1 The results are subsequently used to test hypotheses about evolution in performance styles of the Phantasy since its composition. It is hoped that further research will examine other works and types of music in order to improve this model and make it useful to other music researchers. In addition to its benefits for performance analysis, it is suggested that the model has clear applications at least in music fraud detection, Music Information Retrieval (MIR) and in pedagogical applications for music education
Text-based Sentiment Analysis and Music Emotion Recognition
Nowadays, with the expansion of social media, large amounts of user-generated
texts like tweets, blog posts or product reviews are shared online. Sentiment polarity
analysis of such texts has become highly attractive and is utilized in recommender
systems, market predictions, business intelligence and more. We also witness deep
learning techniques becoming top performers on those types of tasks. There are
however several problems that need to be solved for efficient use of deep neural
networks on text mining and text polarity analysis.
First of all, deep neural networks are data hungry. They need to be fed with
datasets that are big in size, cleaned and preprocessed as well as properly labeled.
Second, the modern natural language processing concept of word embeddings as a
dense and distributed text feature representation solves sparsity and dimensionality
problems of the traditional bag-of-words model. Still, there are various uncertainties
regarding the use of word vectors: should they be generated from the same dataset
that is used to train the model or it is better to source them from big and popular
collections that work as generic text feature representations? Third, it is not easy for
practitioners to find a simple and highly effective deep learning setup for various
document lengths and types. Recurrent neural networks are weak with longer texts
and optimal convolution-pooling combinations are not easily conceived. It is thus
convenient to have generic neural network architectures that are effective and can
adapt to various texts, encapsulating much of design complexity.
This thesis addresses the above problems to provide methodological and practical
insights for utilizing neural networks on sentiment analysis of texts and achieving
state of the art results. Regarding the first problem, the effectiveness of various
crowdsourcing alternatives is explored and two medium-sized and emotion-labeled
song datasets are created utilizing social tags. One of the research interests of Telecom
Italia was the exploration of relations between music emotional stimulation and
driving style. Consequently, a context-aware music recommender system that aims
to enhance driving comfort and safety was also designed. To address the second
problem, a series of experiments with large text collections of various contents and
domains were conducted. Word embeddings of different parameters were exercised
and results revealed that their quality is influenced (mostly but not only) by the
size of texts they were created from. When working with small text datasets, it is
thus important to source word features from popular and generic word embedding
collections. Regarding the third problem, a series of experiments involving convolutional
and max-pooling neural layers were conducted. Various patterns relating
text properties and network parameters with optimal classification accuracy were
observed. Combining convolutions of words, bigrams, and trigrams with regional
max-pooling layers in a couple of stacks produced the best results. The derived
architecture achieves competitive performance on sentiment polarity analysis of
movie, business and product reviews.
Given that labeled data are becoming the bottleneck of the current deep learning
systems, a future research direction could be the exploration of various data programming
possibilities for constructing even bigger labeled datasets. Investigation
of feature-level or decision-level ensemble techniques in the context of deep neural
networks could also be fruitful. Different feature types do usually represent complementary
characteristics of data. Combining word embedding and traditional text
features or utilizing recurrent networks on document splits and then aggregating the
predictions could further increase prediction accuracy of such models
Pattern Recognition
A wealth of advanced pattern recognition algorithms are emerging from the interdiscipline between technologies of effective visual features and the human-brain cognition process. Effective visual features are made possible through the rapid developments in appropriate sensor equipments, novel filter designs, and viable information processing architectures. While the understanding of human-brain cognition process broadens the way in which the computer can perform pattern recognition tasks. The present book is intended to collect representative researches around the globe focusing on low-level vision, filter design, features and image descriptors, data mining and analysis, and biologically inspired algorithms. The 27 chapters coved in this book disclose recent advances and new ideas in promoting the techniques, technology and applications of pattern recognition
Backward glances: The cultural and industrial uses of nostalgia in 2010s Hollywood cinema
Over the course of the 2010s, one identifiable trend in Hollywood cinema was the significant presence of nostalgia films. These films stage idealized recollections of the past, appealing to affective longing for its perceived comforts and stability. This thesis utilizes an interdisciplinary approach to present a historical narrative of recent Hollywood cinema and its intersection with broader American culture and society. I argue that the most recent cinematic “nostalgia wave” is attributable to the broad, epochal conditions of modernity and late modernity, specific historical events and trends of the 2010s, and Hollywood-specific technological and industrial discontinuities. In an attempt to weather this multitude of discontinuities, the contemporary American film industry can be seen to have internalized the logic of cultural nostalgia in a plea for continuity. This nostalgic outlook is also positioned alongside simultaneous attempts to contend with social progress in recent Hollywood cinema. Nostalgia is thus theorized as a potentially productive way of negotiating past and future, providing a narrative and industrial model for processing social change during a period of widespread uncertainty
Verifying tag annotation and performing genre classification in music data via association analysis
Music Information Retrieval aims to automate the access to large-volume music data, including browsing, retrieval, storage, etc. The work presented in this thesis tackles two non-trivial problems in the field.
First problem deals with music tags, which provide descriptive and rich information about a music piece, including its genre, artist, emotion, instrument, etc. At present, tag annotation is largely a manual process, which often results in tags that are subjective, ambiguous, and error-prone. We propose a novel approach to verify the quality of tag annotation in a music dataset through association analysis.
Second, we employ association analysis to predict music genres based on features extracted directly from music. We build an association-based classifier, which finds inherent associations between music features and genres.
We demonstrate the effectiveness of our approaches through a series of simulations and experiments using various benchmark music datasets
- …