14 research outputs found

    Hybrid Approach for Emotion Classification of Audio Conversation Based on Text and Speech Mining

    Get PDF
    AbstractOne of the greatest challenges in speech technology is estimating the speaker's emotion. Most of the existing approaches concentrate either on audio or text features. In this work, we propose a novel approach for emotion classification of audio conversation based on both speech and text. The novelty in this approach is in the choice of features and the generation of a single feature vector for classification. Our main intention is to increase the accuracy of emotion classification of speech by considering both audio and text features. In this work we use standard methods such as Natural Language Processing, Support Vector Machines, WordNet Affect and SentiWordNet. The dataset for this work have been taken from Semval -2007 and eNTERFACE’05 EMOTION Database

    Inferring semantics from lyrics using weakly annotated data

    Get PDF
    Tese de mestrado integrado. Engenharia Informåtica e Computação. Universidade do Porto. Faculdade de Engenharia. 201

    Pop psych: the impact of music and lyrics on emotion

    Get PDF
    This item is only available electronically.While the effects of music on emotion have been heavily researched, the added influence of lyrics is notoriously difficult to measure. Generally, negative music has been linked with decreased wellbeing and increased aggressive behaviour, but the specific contribution of lyrics remains largely unexplored. To further understand this interaction, original pop songs were written and produced to test the effect of lyrics while controlling for the effect of music. Using a 3 x 2 within-subject design, participants (N = 61) listened to songs in three categories – vitality, unease and sublimity – building on research by Zentner et al. (2008). Each category had two versions with either positive or negative lyrics. 172 words (86 positive, 86 negative) were selected from Warriner et al.’s (2013) database and incorporated into the three song pairs. The track order was counterbalanced between participants. After each song, perceived emotions were reported using the three-dimensional model (Schimmack & Grob, 2000). Participants also responded with felt levels of prosocial (or antisocial) sentiment induced by the stimuli. Intended music emotions were accurately perceived by participants. Importantly, songs with negative lyrics led to lower feelings of prosociality than songs with positive lyrics. This is the first empirical demonstration that lyrics have an effect on felt emotion above and beyond music category. By using such stimuli in future research, along with the use of more subconscious measures, the effects of music and lyrics could be harnessed to facilitate emotions associated with wellbeing and prosocial behaviour. Keywords: music, lyrics, pop, emotion, valence, stimuli, prosocialThesis (B.PsychSc(Hons)) -- University of Adelaide, School of Psychology, 201

    Music emotion recognition: a multimodal machine learning approach

    Get PDF
    Music emotion recognition (MER) is an emerging domain of the Music Information Retrieval (MIR) scientific community, and besides, music searches through emotions are one of the major selection preferred by web users. As the world goes to digital, the musical contents in online databases, such as Last.fm have expanded exponentially, which require substantial manual efforts for managing them and also keeping them updated. Therefore, the demand for innovative and adaptable search mechanisms, which can be personalized according to users’ emotional state, has gained increasing consideration in recent years. This thesis concentrates on addressing music emotion recognition problem by presenting several classification models, which were fed by textual features, as well as audio attributes extracted from the music. In this study, we build both supervised and semisupervised classification designs under four research experiments, that addresses the emotional role of audio features, such as tempo, acousticness, and energy, and also the impact of textual features extracted by two different approaches, which are TF-IDF and Word2Vec. Furthermore, we proposed a multi-modal approach by using a combined feature-set consisting of the features from the audio content, as well as from context-aware data. For this purpose, we generated a ground truth dataset containing over 1500 labeled song lyrics and also unlabeled big data, which stands for more than 2.5 million Turkish documents, for achieving to generate an accurate automatic emotion classification system. The analytical models were conducted by adopting several algorithms on the crossvalidated data by using Python. As a conclusion of the experiments, the best-attained performance was 44.2% when employing only audio features, whereas, with the usage of textual features, better performances were observed with 46.3% and 51.3% accuracy scores considering supervised and semi-supervised learning paradigms, respectively. As of last, even though we created a comprehensive feature set with the combination of audio and textual features, this approach did not display any significant improvement for classification performanc

    INTERACTIVE SONIFICATION STRATEGIES FOR THE MOTION AND EMOTION OF DANCE PERFORMANCES

    Get PDF
    The Immersive Interactive SOnification Platform, or iISoP for short, is a research platform for the creation of novel multimedia art, as well as exploratory research in the fields of sonification, affective computing, and gesture-based user interfaces. The goal of the iISoP’s dancer sonification system is to “sonify the motion and emotion” of a dance performance via musical auditory display. An additional goal of this dissertation is to develop and evaluate musical strategies for adding layer of emotional mappings to data sonification. The result of the series of dancer sonification design exercises led to the development of a novel musical sonification framework. The overall design process is divided into three main iterative phases: requirement gathering, prototype generation, and system evaluation. For the first phase help was provided from dancers and musicians in a participatory design fashion as domain experts in the field of non-verbal affective communication. Knowledge extraction procedures took the form of semi-structured interviews, stimuli feature evaluation, workshops, and think aloud protocols. For phase two, the expert dancers and musicians helped create test-able stimuli for prototype evaluation. In phase three, system evaluation, experts (dancers, musicians, etc.) and novice participants were recruited to provide subjective feedback from the perspectives of both performer and audience. Based on the results of the iterative design process, a novel sonification framework that translates motion and emotion data into descriptive music is proposed and described

    Text-based Sentiment Analysis and Music Emotion Recognition

    Get PDF
    Nowadays, with the expansion of social media, large amounts of user-generated texts like tweets, blog posts or product reviews are shared online. Sentiment polarity analysis of such texts has become highly attractive and is utilized in recommender systems, market predictions, business intelligence and more. We also witness deep learning techniques becoming top performers on those types of tasks. There are however several problems that need to be solved for efficient use of deep neural networks on text mining and text polarity analysis. First of all, deep neural networks are data hungry. They need to be fed with datasets that are big in size, cleaned and preprocessed as well as properly labeled. Second, the modern natural language processing concept of word embeddings as a dense and distributed text feature representation solves sparsity and dimensionality problems of the traditional bag-of-words model. Still, there are various uncertainties regarding the use of word vectors: should they be generated from the same dataset that is used to train the model or it is better to source them from big and popular collections that work as generic text feature representations? Third, it is not easy for practitioners to find a simple and highly effective deep learning setup for various document lengths and types. Recurrent neural networks are weak with longer texts and optimal convolution-pooling combinations are not easily conceived. It is thus convenient to have generic neural network architectures that are effective and can adapt to various texts, encapsulating much of design complexity. This thesis addresses the above problems to provide methodological and practical insights for utilizing neural networks on sentiment analysis of texts and achieving state of the art results. Regarding the first problem, the effectiveness of various crowdsourcing alternatives is explored and two medium-sized and emotion-labeled song datasets are created utilizing social tags. One of the research interests of Telecom Italia was the exploration of relations between music emotional stimulation and driving style. Consequently, a context-aware music recommender system that aims to enhance driving comfort and safety was also designed. To address the second problem, a series of experiments with large text collections of various contents and domains were conducted. Word embeddings of different parameters were exercised and results revealed that their quality is influenced (mostly but not only) by the size of texts they were created from. When working with small text datasets, it is thus important to source word features from popular and generic word embedding collections. Regarding the third problem, a series of experiments involving convolutional and max-pooling neural layers were conducted. Various patterns relating text properties and network parameters with optimal classification accuracy were observed. Combining convolutions of words, bigrams, and trigrams with regional max-pooling layers in a couple of stacks produced the best results. The derived architecture achieves competitive performance on sentiment polarity analysis of movie, business and product reviews. Given that labeled data are becoming the bottleneck of the current deep learning systems, a future research direction could be the exploration of various data programming possibilities for constructing even bigger labeled datasets. Investigation of feature-level or decision-level ensemble techniques in the context of deep neural networks could also be fruitful. Different feature types do usually represent complementary characteristics of data. Combining word embedding and traditional text features or utilizing recurrent networks on document splits and then aggregating the predictions could further increase prediction accuracy of such models

    Lyrics Matter: Using Lyrics to Solve Music Information Retrieval Tasks

    Get PDF
    Music Information Retrieval (MIR) research tends to focus on audio features like melody and timbre of songs while largely ignoring lyrics. Lyrics and poetry adhere to a specific rhyme and meter structure which set them apart from prose. This structure could be exploited to obtain useful information, which can be used to solve Music Information Retrieval tasks. In this thesis we show the usefulness of lyrics in solving MIR tasks. For our first result, we show that the presence of lyrics has a variety of significant effects on how people perceive songs, though it is unable to significantly increase the agreement between Canadian and Chinese listeners about the mood of the song. We find that the mood assigned to a song is dependent on whether people listen to it, read the lyrics or both together. Our results suggests that music mood is so dependent on cultural and experiental context to make it difficult to claim it as a true concept. We also show that we can predict the genre of a document based on the adjective choices made by the authors. Using this approach, we show that adjectives more likely to be used in lyrics are more rhymable than those more likely to be used in poetry and are also able to successfully separate poetic lyricists like Bob Dylan from non-poetic lyricists like Bryan Adams. We then proceed to develop a hit song detection model using 31 rhyme, meter and syllable features and commonly used Machine Learning algorithms (Bayesian Network and SVM). We find that our lyrics features outperform audio features at separating hits and flops. Using the same features we can also detect songs which are likely to be shazamed heavily. Since most of the Shazam Hall of Fame songs are by upcoming artists, our advice to them is to write lyrically complicated songs with lots of complicated rhymes in order to rise above the "sonic wallpaper", get noticed and shazamed, and become famous. We argue that complex rhyme and meter is a detectable property of lyrics that indicates quality songmaking and artisanship and allows artists to become successful

    Modelling Chart Trajectories using Song Features

    Get PDF
    Over the years, hit song science has been a controversial topic within music information retrieval. Researchers have debated whether an unbiased dataset can be constructed to model song performance in a meaningful way. Often, classes for modelling are derived from one dimension of song performance, like for example, a song’s peak position on some chart. We aim to develop target variables for modelling song performance as trajectory patterns that consider both a song's lasting power and its listener reach. We model our target variables over various datasets using a wide array of features across different domains, which include metadata, audio, and lyric features. We found that the metadata features, which act as baseline song attributes, oftentimes had the most power in distinguishing our proposed task classes. When modelling hits and flops along one dimension of song success, we observed that the dimensions carried contrasting information, thus justifying their fusion into a two-dimensional target variable, which could be useful for future researchers who want to better understand the relationships between song features and performance. We were unable to show that our target variables were all that useful for modelling more than two classes, but we believe that this is more a limitation of the features, which were often high level, rather than the target variables' separability. Along with our model analysis, we also carried out a re-implementation of a related study by Askin & Mauskapf and considered different applications of our data using methods from time series analysis

    An evaluation of the efficacy of digital real-time noise control techniques in evoking the musical effect

    Get PDF
    This study sought to determine whether or not it may be possible to evoke ‘the musical effect' – the emotional response perceived by music listeners – using white noise as a sound-source and real-time digital signal processing techniques. This information was considered to be valuable as in a world driven by technological progress the potential use of new or different technologies in creating music could lead to the development of new methods of – and tools for – composition and performance. More specifically this research asked the question 'what is music?' and investigated how humans – both trained musicians and untrained people – perceive it. The elements of music were investigated for their affective strengths and new fields of research explored for insights into emotion identification in music. Thereafter the focus shifted into the realm of Digital Signal Processing. Common operations and techniques for signal manipulation were investigated and an understanding of the field as a whole was sought. The culmination of these two separate, yet related, investigations was the design and implementation of a listening experiment conducted on adult subjects. They were asked to listen to various manipulated noise-signals and answer a questionnaire with regard to their perceptions of the audio material. The data from the listening experiment suggest that certain DSP techniques can evoke ‘the musical effect’. Various musical elements were represented via digital techniques and in many cases respondents reported perceptions which suggest that some effect was felt. The techniques implemented and musical elements represented were discussed, and possible applications for these techniques, both musical and non-musical, were explored. Areas for further research were discussed and include the implementation of even more DSP techniques, and also into garnering a more specific idea of the emotion perceived by respondents in response to the experiment material
    corecore