2,256 research outputs found

    Semantic Priming of Familiar Songs

    Get PDF
    We explored the functional organization of semantic memory for music by comparing priming across familiar songs both within modalities (Experiment 1, tune to tune; Experiment 3, category label to lyrics) and across modalities (Experiment 2, category label to tune; Experiment 4, tune to lyrics). Participants judged whether or not the target tune or lyrics were real (akin to lexical decision tasks). We found significant priming, analogous to linguistic associative-priming effects, in reaction times for related primes as compared to unrelated primes, but primarily for within-modality comparisons. Reaction times to tunes (e.g., Silent Night ) were faster following related tunes ( Deck the Hall ) than following unrelated tunes ( God Bless America ). However, a category label (e.g., Christmas) did not prime tunes from within that category. Lyrics were primed by a related category label, but not by a related tune. These results support the conceptual organization of music in semantic memory, but with potentially weaker associations across modalities

    Music emotion recognition: a multimodal machine learning approach

    Get PDF
    Music emotion recognition (MER) is an emerging domain of the Music Information Retrieval (MIR) scientific community, and besides, music searches through emotions are one of the major selection preferred by web users. As the world goes to digital, the musical contents in online databases, such as Last.fm have expanded exponentially, which require substantial manual efforts for managing them and also keeping them updated. Therefore, the demand for innovative and adaptable search mechanisms, which can be personalized according to users’ emotional state, has gained increasing consideration in recent years. This thesis concentrates on addressing music emotion recognition problem by presenting several classification models, which were fed by textual features, as well as audio attributes extracted from the music. In this study, we build both supervised and semisupervised classification designs under four research experiments, that addresses the emotional role of audio features, such as tempo, acousticness, and energy, and also the impact of textual features extracted by two different approaches, which are TF-IDF and Word2Vec. Furthermore, we proposed a multi-modal approach by using a combined feature-set consisting of the features from the audio content, as well as from context-aware data. For this purpose, we generated a ground truth dataset containing over 1500 labeled song lyrics and also unlabeled big data, which stands for more than 2.5 million Turkish documents, for achieving to generate an accurate automatic emotion classification system. The analytical models were conducted by adopting several algorithms on the crossvalidated data by using Python. As a conclusion of the experiments, the best-attained performance was 44.2% when employing only audio features, whereas, with the usage of textual features, better performances were observed with 46.3% and 51.3% accuracy scores considering supervised and semi-supervised learning paradigms, respectively. As of last, even though we created a comprehensive feature set with the combination of audio and textual features, this approach did not display any significant improvement for classification performanc

    Text-based Sentiment Analysis and Music Emotion Recognition

    Get PDF
    Nowadays, with the expansion of social media, large amounts of user-generated texts like tweets, blog posts or product reviews are shared online. Sentiment polarity analysis of such texts has become highly attractive and is utilized in recommender systems, market predictions, business intelligence and more. We also witness deep learning techniques becoming top performers on those types of tasks. There are however several problems that need to be solved for efficient use of deep neural networks on text mining and text polarity analysis. First of all, deep neural networks are data hungry. They need to be fed with datasets that are big in size, cleaned and preprocessed as well as properly labeled. Second, the modern natural language processing concept of word embeddings as a dense and distributed text feature representation solves sparsity and dimensionality problems of the traditional bag-of-words model. Still, there are various uncertainties regarding the use of word vectors: should they be generated from the same dataset that is used to train the model or it is better to source them from big and popular collections that work as generic text feature representations? Third, it is not easy for practitioners to find a simple and highly effective deep learning setup for various document lengths and types. Recurrent neural networks are weak with longer texts and optimal convolution-pooling combinations are not easily conceived. It is thus convenient to have generic neural network architectures that are effective and can adapt to various texts, encapsulating much of design complexity. This thesis addresses the above problems to provide methodological and practical insights for utilizing neural networks on sentiment analysis of texts and achieving state of the art results. Regarding the first problem, the effectiveness of various crowdsourcing alternatives is explored and two medium-sized and emotion-labeled song datasets are created utilizing social tags. One of the research interests of Telecom Italia was the exploration of relations between music emotional stimulation and driving style. Consequently, a context-aware music recommender system that aims to enhance driving comfort and safety was also designed. To address the second problem, a series of experiments with large text collections of various contents and domains were conducted. Word embeddings of different parameters were exercised and results revealed that their quality is influenced (mostly but not only) by the size of texts they were created from. When working with small text datasets, it is thus important to source word features from popular and generic word embedding collections. Regarding the third problem, a series of experiments involving convolutional and max-pooling neural layers were conducted. Various patterns relating text properties and network parameters with optimal classification accuracy were observed. Combining convolutions of words, bigrams, and trigrams with regional max-pooling layers in a couple of stacks produced the best results. The derived architecture achieves competitive performance on sentiment polarity analysis of movie, business and product reviews. Given that labeled data are becoming the bottleneck of the current deep learning systems, a future research direction could be the exploration of various data programming possibilities for constructing even bigger labeled datasets. Investigation of feature-level or decision-level ensemble techniques in the context of deep neural networks could also be fruitful. Different feature types do usually represent complementary characteristics of data. Combining word embedding and traditional text features or utilizing recurrent networks on document splits and then aggregating the predictions could further increase prediction accuracy of such models

    The Effect of Background Music on the Visual Categorization of Printed Words in Normal Younger and Older Adults

    Get PDF
    Aim: Research has shown that background music, with and without vocal content, has a detrimental effect on cognitive task performance. Research has also shown a decline in processing speed as age increases. The present study seeks to answer the following questions: 1. Will background vocal music have any detrimental effects on performance of a visual semantic word categorization task? 2. Does age have any effect on performance of visual semantic word categorization in the presence of background music? Participants: Participants consisted of 36 adult native speakers of English with normal speech and language divided in to two groups based on age, an older group (63-79 years) and a younger group (18-33 years). The younger group was recruited from the population of students of the University of Tennessee and the Knoxville community. The older group was recruited from the Knoxville Office on Aging and the Knoxville community. Stimuli: Printed words were chosen from superordinate categories such as tools, utensils, animals, food, clothing, furniture, body parts, vehicles, toys, instruments, and insects. The auditory stimulus was Adele’s song “Someone Like You,” from the commercial CD recording. Instrumental recordings of the song were constructed using the music notation software program, Finale and sampled instruments. Procedure: Participants performed a categorization task of printed words on the computer screen in the presence of background music. Participants’ reaction times and the accuracy of their responses were recorded by a software program, SuperLab Pro. The experiment was presented four times consecutively for four randomized auditory conditions consisting of 26 word sets per condition. A questionnaire was administered at the end of the final experiment. Statistical Analysis: A mixed design 2x4 ANOVA was performed (between subjects factor – age group and within-subjects factor – condition) to test the main effects and/or interactions between groups and within groups. Paired sample T-tests were computed to test for comparisons within groups for any significant differences among conditions. Correlations and covariate analyses were performed for questionnaire data. Results: The results did not indicate any significant effect of auditory condition on categorization task performance. Vocal music did not increase reaction times or decrease the accuracy of word categorization. On the other hand, a significant effect of age was found for reaction time and accuracy. Older adults performed significantly more slowly and less accurately than younger adults

    Music information retrieval: conceptuel framework, annotation and user behaviour

    Get PDF
    Understanding music is a process both based on and influenced by the knowledge and experience of the listener. Although content-based music retrieval has been given increasing attention in recent years, much of the research still focuses on bottom-up retrieval techniques. In order to make a music information retrieval system appealing and useful to the user, more effort should be spent on constructing systems that both operate directly on the encoding of the physical energy of music and are flexible with respect to users’ experiences. This thesis is based on a user-centred approach, taking into account the mutual relationship between music as an acoustic phenomenon and as an expressive phenomenon. The issues it addresses are: the lack of a conceptual framework, the shortage of annotated musical audio databases, the lack of understanding of the behaviour of system users and shortage of user-dependent knowledge with respect to high-level features of music. In the theoretical part of this thesis, a conceptual framework for content-based music information retrieval is defined. The proposed conceptual framework - the first of its kind - is conceived as a coordinating structure between the automatic description of low-level music content, and the description of high-level content by the system users. A general framework for the manual annotation of musical audio is outlined as well. A new methodology for the manual annotation of musical audio is introduced and tested in case studies. The results from these studies show that manually annotated music files can be of great help in the development of accurate analysis tools for music information retrieval. Empirical investigation is the foundation on which the aforementioned theoretical framework is built. Two elaborate studies involving different experimental issues are presented. In the first study, elements of signification related to spontaneous user behaviour are clarified. In the second study, a global profile of music information retrieval system users is given and their description of high-level content is discussed. This study has uncovered relationships between the users’ demographical background and their perception of expressive and structural features of music. Such a multi-level approach is exceptional as it included a large sample of the population of real users of interactive music systems. Tests have shown that the findings of this study are representative of the targeted population. Finally, the multi-purpose material provided by the theoretical background and the results from empirical investigations are put into practice in three music information retrieval applications: a prototype of a user interface based on a taxonomy, an annotated database of experimental findings and a prototype semantic user recommender system. Results are presented and discussed for all methods used. They show that, if reliably generated, the use of knowledge on users can significantly improve the quality of music content analysis. This thesis demonstrates that an informed knowledge of human approaches to music information retrieval provides valuable insights, which may be of particular assistance in the development of user-friendly, content-based access to digital music collections

    The Constructivistly-Organised Dimensional-Appraisal (CODA) Model and Evidence for the Role of Goal-directed Processes in Emotional Episodes Induced by Music

    Get PDF
    The study of affective responses to music is a flourishing field. Advancements in the study of this phenomena have been complemented by the introduction of several music-specific models of emotion, with two of the most well-cited ones being the BRECVEMA and the Multifactorial Process Model. These two models have undoubtedly contributed to the field. However, contemporary developments in the wider affective sciences (broadly described as the ‘rise of affectivism’) have yet to be incorporated into the music emotion literature. These developments in the affective sciences may aid in addressing remaining gaps in the music literature, in particular for acknowledging individual and contextual differences. The first aim of this thesis was to outline contemporary theories from the wider affective sciences and subsequently critique current popular models of musical emotions through the lens of these advancements. The second aim was to propose a new model based on this critique: the Constructivistly-Organised Dimensional-Appraisal (CODA) model. This CODA model draws together multiple competing models into a single framework centralised around goal-directed appraisal mechanisms which are key to the wider affective sciences but are a less commonly acknowledged component of musical affect. The third aim was to empirically test some of the core hypotheses of the CODA model. In particular, examining goal-directed mechanisms, their validity in a musical context, and their ability to address individual and contextual differences in musically induced affect. Across four experiments which include exploratory and lab-based designs through to real- world applications, the results are supportive of the role of goal-directed mechanisms in musically induced emotional episodes. Experiment one presents a first test battery of multiple appraisal dimensions developed for music. The results show that several of the hypothesised appraisal dimensions are valid dimensions is a musical context. Moreover, these mechanisms cluster into goal-directed latent variables. Experiment two develops a new set of stimuli annotations relating to musical goals, showing that music can be more or less appropriate for different musical goals (functions). Experiment three, using the new stimuli set from experiment two, tests the effects of different goals with more or less appropriate music on musically induced affect. These results show that goal-directed mechanisms can change induced core-affect (valence and arousal) and intensity, even for the same piece of music. Experiment four extends the study of goal-directed mechanisms into a real-world context through an interdisciplinary and cross-cultural design. The final experiment demonstrates how goal-directed mechanisms can be manipulated through different algorithms to induce negative affect in a Colombian population. The main conclusions of this thesis are that the CODA model, more specifically goal-directed mechanisms, provide a valuable, non-reductive, and more eïŹ€icient approach to addressing individual and contextual differences for musically induced emotional episodes in the new era of affectivism

    Music Information Retrieval: An Inspirational Guide to Transfer from Related Disciplines

    Get PDF
    The emerging field of Music Information Retrieval (MIR) has been influenced by neighboring domains in signal processing and machine learning, including automatic speech recognition, image processing and text information retrieval. In this contribution, we start with concrete examples for methodology transfer between speech and music processing, oriented on the building blocks of pattern recognition: preprocessing, feature extraction, and classification/decoding. We then assume a higher level viewpoint when describing sources of mutual inspiration derived from text and image information retrieval. We conclude that dealing with the peculiarities of music in MIR research has contributed to advancing the state-of-the-art in other fields, and that many future challenges in MIR are strikingly similar to those that other research areas have been facing

    The Automatic Prediction of Pleasure and Arousal Ratings

    Get PDF
    Music’s allure lies in its power to stir the emotions. But the relation between the physical properties of an acoustic signal and its emotional impact remains an open area of research. This paper reports the results and possible implications of a pilot study and survey used to construct an emotion index for subjective ratings of music. The dimensions of pleasure and arousal exhibit high reliability. Eighty-five participants’ ratings of 100 song excerpts are used to benchmark the predictive accuracy of several combinations of acoustic preprocessing and statistical learning algorithms. The Euclidean distance between acoustic representations of an excerpt and corresponding emotionweighted visualizations of a corpus of music excerpts provided predictor variables for linear regression that resulted in the highest predictive accuracy of mean pleasure and arousal values of test songs. This new technique also generated visualizations that show how rhythm, pitch, and loudness interrelate to influence our appreciation of the emotional content of music
    • 

    corecore