795 research outputs found

    Identification of expressive descriptors for style extraction in music analysis using linear and nonlinear models

    La formalización de las interpretaciones expresivas aún se considera relevante debido a la complejidad de la música. La interpretación expresiva forma un aspecto importante de la música, teniendo en cuenta diferentes convenciones como géneros o estilos que una interpretación puede desarrollar con el tiempo. Modelar la relación entre las expresiones musicales y los aspectos estructurales de la información acústica requiere una base probabilística y estadística mínima para la robustez, validación y reproducibilidad de aplicaciones computacionales. Por lo tanto, es necesaria una relación cohesiva y una justificación sobre los resultados. Esta tesis se sustenta en la teoría y aplicaciones de modelos discriminativos y generativos en el marco del aprendizaje de maquina y la relación de procedimientos sistemáticos con los conceptos de la musicología utilizando técnicas de procesamiento de señales y minería de datos. Los resultados se validaron mediante pruebas estadísticas y una experimentación no paramétrica con la implementación de un conjunto de métricas para medir aspectos acústicos y temporales de archivos de audio para entrenar un modelo discriminativo y mejorar el proceso de síntesis de un modelo neuronal profundo. Adicionalmente, el modelo implementado presenta la oportunidad para la aplicación de procedimientos sistemáticos, automatización de transcripciones usando notación musical, entrenamiento de habilidades auditivas para estudiantes de música y mejorar la implementación de redes neuronales profundas usando CPU en lugar de GPU debido a las ventajas de las redes convolucionales para el procesamiento de archivos de audio como vectores o matriz con una secuencia de notas.MaestríaMagister en Ingeniería Electrónic

    Prediction in polyphony: modelling musical auditory scene analysis

    PhDHow do we know that a melody is a melody? In other words, how does the human brain extract melody from a polyphonic musical context? This thesis begins with a theoretical presentation of musical auditory scene analysis (ASA) in the context of predictive coding and rule-based approaches and takes methodological and analytical steps to evaluate selected components of a proposed integrated framework for musical ASA, unified by prediction. Predictive coding has been proposed as a grand unifying model of perception, action and cognition and is based on the idea that brains process error to refine models of the world. Existing models of ASA tackle distinct subsets of ASA and are currently unable to integrate all the acoustic and extensive contextual information needed to parse auditory scenes. This thesis proposes a framework capable of integrating all relevant information contributing to the understanding of musical auditory scenes, including auditory features, musical features, attention, expectation and listening experience, and examines a subset of ASA issues – timbre perception in relation to musical training, modelling temporal expectancies, the relative salience of musical parameters and melody extraction – using probabilistic approaches. Using behavioural methods, attention is shown to influence streaming perception based on timbre more than instrumental experience. Using probabilistic methods, information content (IC) for temporal aspects of music as generated by IDyOM (information dynamics of music; Pearce, 2005), are validated and, along with IC for pitch and harmonic aspects of the music, are subsequently linked to perceived complexity but not to salience. Furthermore, based on the hypotheses that a melody is internally coherent and the most complex voice in a piece of polyphonic music, IDyOM has been extended to extract melody from symbolic representations of chorales by J.S. Bach and a selection of string quartets by W.A. Mozart

    Measuring Expressive Music Performances: a Performance Science Model using Symbolic Approximation

    Music Performance Science (MPS), sometimes termed systematic musicology in Northern Europe, is concerned with designing, testing and applying quantitative measurements to music performances. It has applications in art musics, jazz and other genres. It is least concerned with aesthetic judgements or with ontological considerations of artworks that stand alone from their instantiations in performances. Musicians deliver expressive performances by manipulating multiple, simultaneous variables including, but not limited to: tempo, acceleration and deceleration, dynamics, rates of change of dynamic levels, intonation and articulation. There are significant complexities when handling multivariate music datasets of significant scale. A critical issue in analyzing any types of large datasets is the likelihood of detecting meaningless relationships the more dimensions are included. One possible choice is to create algorithms that address both volume and complexity. Another, and the approach chosen here, is to apply techniques that reduce both the dimensionality and numerosity of the music datasets while assuring the statistical significance of results. This dissertation describes a flexible computational model, based on symbolic approximation of timeseries, that can extract time-related characteristics of music performances to generate performance fingerprints (dissimilarities from an ‘average performance’) to be used for comparative purposes. The model is applied to recordings of Arnold Schoenberg’s Phantasy for Violin with Piano Accompaniment, Opus 47 (1949), having initially been validated on Chopin Mazurkas.1 The results are subsequently used to test hypotheses about evolution in performance styles of the Phantasy since its composition. It is hoped that further research will examine other works and types of music in order to improve this model and make it useful to other music researchers. In addition to its benefits for performance analysis, it is suggested that the model has clear applications at least in music fraud detection, Music Information Retrieval (MIR) and in pedagogical applications for music education

    Text-based Sentiment Analysis and Music Emotion Recognition

    Nowadays, with the expansion of social media, large amounts of user-generated texts like tweets, blog posts or product reviews are shared online. Sentiment polarity analysis of such texts has become highly attractive and is utilized in recommender systems, market predictions, business intelligence and more. We also witness deep learning techniques becoming top performers on those types of tasks. There are however several problems that need to be solved for efficient use of deep neural networks on text mining and text polarity analysis. First of all, deep neural networks are data hungry. They need to be fed with datasets that are big in size, cleaned and preprocessed as well as properly labeled. Second, the modern natural language processing concept of word embeddings as a dense and distributed text feature representation solves sparsity and dimensionality problems of the traditional bag-of-words model. Still, there are various uncertainties regarding the use of word vectors: should they be generated from the same dataset that is used to train the model or it is better to source them from big and popular collections that work as generic text feature representations? Third, it is not easy for practitioners to find a simple and highly effective deep learning setup for various document lengths and types. Recurrent neural networks are weak with longer texts and optimal convolution-pooling combinations are not easily conceived. It is thus convenient to have generic neural network architectures that are effective and can adapt to various texts, encapsulating much of design complexity. This thesis addresses the above problems to provide methodological and practical insights for utilizing neural networks on sentiment analysis of texts and achieving state of the art results. Regarding the first problem, the effectiveness of various crowdsourcing alternatives is explored and two medium-sized and emotion-labeled song datasets are created utilizing social tags. One of the research interests of Telecom Italia was the exploration of relations between music emotional stimulation and driving style. Consequently, a context-aware music recommender system that aims to enhance driving comfort and safety was also designed. To address the second problem, a series of experiments with large text collections of various contents and domains were conducted. Word embeddings of different parameters were exercised and results revealed that their quality is influenced (mostly but not only) by the size of texts they were created from. When working with small text datasets, it is thus important to source word features from popular and generic word embedding collections. Regarding the third problem, a series of experiments involving convolutional and max-pooling neural layers were conducted. Various patterns relating text properties and network parameters with optimal classification accuracy were observed. Combining convolutions of words, bigrams, and trigrams with regional max-pooling layers in a couple of stacks produced the best results. The derived architecture achieves competitive performance on sentiment polarity analysis of movie, business and product reviews. Given that labeled data are becoming the bottleneck of the current deep learning systems, a future research direction could be the exploration of various data programming possibilities for constructing even bigger labeled datasets. Investigation of feature-level or decision-level ensemble techniques in the context of deep neural networks could also be fruitful. Different feature types do usually represent complementary characteristics of data. Combining word embedding and traditional text features or utilizing recurrent networks on document splits and then aggregating the predictions could further increase prediction accuracy of such models

    Pattern Recognition

    A wealth of advanced pattern recognition algorithms are emerging from the interdiscipline between technologies of effective visual features and the human-brain cognition process. Effective visual features are made possible through the rapid developments in appropriate sensor equipments, novel filter designs, and viable information processing architectures. While the understanding of human-brain cognition process broadens the way in which the computer can perform pattern recognition tasks. The present book is intended to collect representative researches around the globe focusing on low-level vision, filter design, features and image descriptors, data mining and analysis, and biologically inspired algorithms. The 27 chapters coved in this book disclose recent advances and new ideas in promoting the techniques, technology and applications of pattern recognition

    Backward glances: The cultural and industrial uses of nostalgia in 2010s Hollywood cinema

    Over the course of the 2010s, one identifiable trend in Hollywood cinema was the significant presence of nostalgia films. These films stage idealized recollections of the past, appealing to affective longing for its perceived comforts and stability. This thesis utilizes an interdisciplinary approach to present a historical narrative of recent Hollywood cinema and its intersection with broader American culture and society. I argue that the most recent cinematic “nostalgia wave” is attributable to the broad, epochal conditions of modernity and late modernity, specific historical events and trends of the 2010s, and Hollywood-specific technological and industrial discontinuities. In an attempt to weather this multitude of discontinuities, the contemporary American film industry can be seen to have internalized the logic of cultural nostalgia in a plea for continuity. This nostalgic outlook is also positioned alongside simultaneous attempts to contend with social progress in recent Hollywood cinema. Nostalgia is thus theorized as a potentially productive way of negotiating past and future, providing a narrative and industrial model for processing social change during a period of widespread uncertainty

    Verifying tag annotation and performing genre classification in music data via association analysis

    Music Information Retrieval aims to automate the access to large-volume music data, including browsing, retrieval, storage, etc. The work presented in this thesis tackles two non-trivial problems in the field. First problem deals with music tags, which provide descriptive and rich information about a music piece, including its genre, artist, emotion, instrument, etc. At present, tag annotation is largely a manual process, which often results in tags that are subjective, ambiguous, and error-prone. We propose a novel approach to verify the quality of tag annotation in a music dataset through association analysis. Second, we employ association analysis to predict music genres based on features extracted directly from music. We build an association-based classifier, which finds inherent associations between music features and genres. We demonstrate the effectiveness of our approaches through a series of simulations and experiments using various benchmark music datasets