Search CORE

56 research outputs found

Recommended from our members

Music Emotion Recognition based on Feature Combination, Deep Learning and Chord Detection

Author: Zhang Fan
Publication venue: Brunel University London
Publication date: 01/01/2019
Field of study

This thesis was submitted for the award of Doctor of Philosophy and was awarded by Brunel University London.As one of the most classic human inventions, music appeared in many artworks, such as songs, movies and theatres. It can be seen as another language, used to express the authors thoughts and emotion. In many cases, music can express the meaning and emotion emerged which is the authors hope and the audience feeling. However, the emotions which appear during human enjoying the music is complex and difﬁcult to precisely explain. Therefore, Music Emotion Recognition (MER) is an interesting research topic in artiﬁcial intelligence ﬁeld for recognising the emotions from the music. The recognition methods and tools for the music signals are growing fast recently. With recent development of the signal processing, machine learning and algorithm optimization, the recognition accuracy is approaching perfection. In this thesis, the research is focused on three differentsigniﬁcantpartsofMER,thatarefeatures, learningmethodsandmusicemotion theory, to explain and illustrate how to effectively build MER systems. Firstly, an automatic MER system for classing 4 emotions was proposed where OpenSMILE is used for feature extraction and IS09 feature was selected. After the combination with STAT statistic features, Random Forest classiﬁer produced the best performance than previous systems. It shows that this approach of feature selection and machine learning can indeed improve the accuracy of MER by at least 3.5% from other combinations under suitable parameter setting and the performance of system was improved by new features combination by IS09 and STAT reaching 83.8% accuracy. Secondly, another MER system for 4 emotions was proposed basedon the dynamic property of music signals where the features are extracted from segments of music signals instead of the whole recording in APM database. Then Long Shot-Term Memory (LSTM) deep learning model was used for classiﬁcation. The model can use the dynamic continuous information between the different time frame segments for more effective emotion recognition. However, the ﬁnal performance just achieved 65.7% which was not as good as expected. The reason might be that the database is not suitable to the LSTM as the initial thoughts. The information between the segments might be not good enough to improve the performance of recognition in comparison with the traditional methods. The complex deep learning method do not suitable for every database was proved by the conclusion,which shown that the LSTM dynamic deep learning method did not work well in this continuous database. Finally, it was targeted to recognise the emotion by the identiﬁcation of chord inside as these chords have particular emotion information inside stated in previous theoretical work. The research starts by building a new chord database that uses the Adobe audition to extract the chord clip from the piano chord teaching audio. Then the FFT features based on the 1000 points sampling pre-process data and STAT features were extracted for the selected samples from the database. After the calculation and comparison using Euclidean distance and correlation, the results shown the STAT features work well in most of chords except the Augmented chord. The new approach of recognise 6 emotions from the music was ﬁrst time used in this research and approached 75% accuracy of chord identiﬁcation. In summary, the research proposed new MER methods through the three different approaches. Some of them achieved good recognition performance and some of them will have more broad application prospect

Brunel University Research Archive

Platform, culture, identities: exploring young people's game-making

Author: de Paula Bruno Henrique
Publication venue: UCL (University College London)
Publication date: 28/11/2019
Field of study

Digital games are an important component in the contemporary media landscape. They are cultural artefacts and, as such, are subjected to specific conventions. These conventions shape our imaginary about games, defining, for example, what a game is, who can play them and where. Different research has been developed to understand and challenge these conventions, and one of the strategies often adopted is fostering game-making among “gaming minorities”. By popularising games and their means of production, critical skills towards these objects could be developed, these conventions could be fought, and our perceptions of those artefacts could be transformed. Nevertheless, digital games, as obvious as it sounds, are also digital: they depend on technology to exist and are subjected to different technologies’ affordances and constraints. Technologies, however, are not neutral and objective, but are also cultural: they too are influenced by values and conventions. This means that, even if the means of production of digital games are distributed among more diverse groups, we should not ignore the role played by technology in this process of shaping our imaginary about games. Cultural and technical aspects of digital media are not, therefore, as conflicting as it might seem, finding themselves entangled in digital games. They are also equally influential in our understanding and our cultural uses of these artefacts; but how influential are they? How easy can one go against cultural and technical conventions when producing a game as a non-professional? Can anyone make any kind of game? In this research, I explore young people’s game-making practices in non-professional contexts to understand how repertoires, gaming conventions and platform affordances and constraints can be influential in this creative process. I organised two different game-making clubs for young people in London/UK (one at a community-led centre for Latin American migrants and other at a comprehensive primary school). The clubs consisted in a series of workshops offered in a weekly basis, totalling a minimum of 12 hours of instruction/production at each research site. The participants were aged between 11 and 18 and produced a total of 11 games across these two sites with MissionMaker, a software that facilitates the creation of 3D games by non-specialists through ready-made 3D assets, custom audio and image files, and a simplified drop-down-list-based scripting language. Three games and their production teams were selected as case studies and investigated through qualitative methods and under a descriptive-interpretive approach. Throughout the game-making clubs, short surveys, observations, unstructured and semi-structured interviews and a game archive (with week-by-week saves of participants’ games) were employed to generate data that was then analysed through a Multimodal Sociosemiotics framework to explore how cultural and technical conventions were appropriated by participants during this experience. Discourses, gaming conventions and MissionMaker’s affordances and constraints were appropriated in different ways by participants in the process of game production, culminating in the realisation of different discourses and the construction of diverse identities. These results are relevant since they restate the value of a more holistic approach – one that looks at both culture and technology – to critical videogame production within non-professional contexts. These results are also useful to the mapping of the influence of repertoires, conventions and platforms in non-professional game-making contexts, highlighting how these elements are influential but at the same time not prescriptive to the games produced, and how game development processes within these contexts are better understood as dialogical

UCL Discovery

One Deep Music Representation to Rule Them All? : A comparative analysis of different representation learning strategies

Author: Hanjalic Alan
Kim Jaehun
Liem Cynthia C. S.
Urbano Julián
Publication venue
Publication date: 01/01/2019
Field of study

Inspired by the success of deploying deep learning in the fields of Computer Vision and Natural Language Processing, this learning paradigm has also found its way into the field of Music Information Retrieval. In order to benefit from deep learning in an effective, but also efficient manner, deep transfer learning has become a common approach. In this approach, it is possible to reuse the output of a pre-trained neural network as the basis for a new learning task. The underlying hypothesis is that if the initial and new learning tasks show commonalities and are applied to the same type of input data (e.g. music audio), the generated deep representation of the data is also informative for the new task. Since, however, most of the networks used to generate deep representations are trained using a single initial learning source, their representation is unlikely to be informative for all possible future tasks. In this paper, we present the results of our investigation of what are the most important factors to generate deep representations for the data and learning tasks in the music domain. We conducted this investigation via an extensive empirical study that involves multiple learning sources, as well as multiple deep learning architectures with varying levels of information sharing between sources, in order to learn music representations. We then validate these representations considering multiple target datasets for evaluation. The results of our experiments yield several insights on how to approach the design of methods for learning widely deployable deep data representations in the music domain.Comment: This work has been accepted to "Neural Computing and Applications: Special Issue on Deep Learning for Music and Audio

arXiv.org e-Print Archive

TU Delft Repository

Understanding Agreement and Disagreement in Listeners’ Perceived Emotion in Live Music Performance

Author: Yang S
Publication venue
Publication date: 03/11/2023
Field of study

Emotion perception of music is subjective and time dependent. Most computational music emotion recognition (MER) systems overlook time- and listener-dependent factors by averaging emotion judgments across listeners. In this work, we investigate the influence of music, setting (live vs lab vs online), and individual factors on music emotion perception over time. In an initial study, we explore changes in perceived music emotions among audience members during live classical music performances. Fifteen audience members used a mobile application to annotate time-varying emotion judgments based on the valence-arousal model. Inter-rater reliability analyses indicate that consistency in emotion judgments varies significantly across rehearsal segments, with systematic disagreements in certain segments. In a follow-up study, we examine listeners' reasons for their ratings in segments with high and low agreement. We relate these reasons to acoustic features and individual differences. Twenty-one listeners annotated perceived emotions while watching a recorded video of the live performance. They then reflected on their judgments and provided explanations retrospectively. Disagreements were attributed to listeners attending to different musical features or being uncertain about the expressed emotions. Emotion judgments were significantly associated with personality traits, gender, cultural background, and music preference. Thematic analysis of explanations revealed cognitive processes underlying music emotion perception, highlighting attributes less frequently discussed in MER studies, such as instrumentation, arrangement, musical structure, and multimodal factors related to performer expression. Exploratory models incorporating these semantic features and individual factors were developed to predict perceived music emotion over time. Regression analyses confirmed the significance of listener-informed semantic features as independent variables, with individual factors acting as moderators between loudness, pitch range, and arousal. In our final study, we analyzed the effects of individual differences on music emotion perception among 128 participants with diverse backgrounds. Participants annotated perceived emotions for 51 piano performances of different compositions from the Western canon, spanning various era. Linear mixed effects models revealed significant variations in valence and arousal ratings, as well as the frequency of emotion ratings, with regard to several individual factors: music sophistication, music preferences, personality traits, and mood states. Additionally, participants' ratings of arousal, valence, and emotional agreement were significantly associated to the historical time periods of the examined clips. This research highlights the complexity of music emotion perception, revealing it to be a dynamic, individual and context-dependent process. It paves the way for the development of more individually nuanced, time-based models in music psychology, opening up new avenues for personalised music emotion recognition and recommendation, music emotion-driven generation and therapeutic applications

Queen Mary Research Online

Handbook of Stemmatology

Author
Publication venue: 'Walter de Gruyter GmbH'
Publication date
Field of study

Stemmatology studies aspects of textual criticism that use genealogical methods. This handbook is the first to cover the entire field, encompassing both theoretical and practical aspects, ranging from traditional to digital methods. Authors from all the disciplines involved examine topics such as the material aspects of text traditions, methods of traditional textual criticism and their genesis, and modern digital approaches used in the field

OAPEN Library

Music Encoding Conference Proceedings 2021, 19–22 July, 2021 University of Alicante (Spain): Onsite & Online

Author: Münnich Stefan
Rizo David
Publication venue: 'Universidad de Alicante Servicio de Publicaciones'
Publication date: 18/05/2022
Field of study

Este documento incluye los artículos y pósters presentados en el Music Encoding Conference 2021 realizado en Alicante entre el 19 y el 22 de julio de 2022.Funded by project Multiscore, MCIN/AEI/10.13039/50110001103

Repositorio Institucional de la Universidad de Alicante

2018 FSDG Combined Abstracts

Author: Grand Valley State University
Publication venue: ScholarWorks@GVSU
Publication date: 01/01/2018
Field of study

https://scholarworks.gvsu.edu/fsdg_abstracts/1000/thumbnail.jp

Scholarworks@GVSU

A Periodic Table of Movements:Two Reference Frameworks for Quantifiable Emotion, A Practice Based Investigation of Human Expressive Movement and Gesture

Author: Hrynczenko Iwona
Publication venue
Publication date: 01/01/2014
Field of study

University of Dundee Online Publications

Low-resource speech translation

Author: Bansal Sameer
Publication venue: The University of Edinburgh
Publication date: 17/12/2019
Field of study

We explore the task of speech-to-text translation (ST), where speech in one language (source) is converted to text in a different one (target). Traditional ST systems go through an intermediate step where the source language speech is first converted to source language text using an automatic speech recognition (ASR) system, which is then converted to target language text using a machine translation (MT) system. However, this pipeline based approach is impractical for unwritten languages spoken by millions of people around the world, leaving them without access to free and automated translation services such as Google Translate. The lack of such translation services can have important real-world consequences. For example, in the aftermath of a disaster scenario, easily available translation services can help better co-ordinate relief efforts. How can we expand the coverage of automated ST systems to include scenarios which lack source language text? In this thesis we investigate one possible solution: we build ST systems to directly translate source language speech into target language text, thereby forgoing the dependency on source language text. To build such a system, we use only speech data paired with text translations as training data. We also specifically focus on low-resource settings, where we expect at most tens of hours of training data to be available for unwritten or endangered languages. Our work can be broadly divided into three parts. First we explore how we can leverage prior work to build ST systems. We find that neural sequence-to-sequence models are an effective and convenient method for ST, but produce poor quality translations when trained in low-resource settings. In the second part of this thesis, we explore methods to improve the translation performance of our neural ST systems which do not require labeling additional speech data in the low-resource language, a potentially tedious and expensive process. Instead we exploit labeled speech data for high-resource languages which is widely available and relatively easier to obtain. We show that pretraining a neural model with ASR data from a high-resource language, different from both the source and target ST languages, improves ST performance. In the final part of our thesis, we study whether ST systems can be used to build applications which have traditionally relied on the availability of ASR systems, such as information retrieval, clustering audio documents, or question/answering. We build proof-of-concept systems for two downstream applications: topic prediction for speech and cross-lingual keyword spotting. Our results indicate that low-resource ST systems can still outperform simple baselines for these tasks, leaving the door open for further exploratory work. This thesis provides, for the first time, an in-depth study of neural models for the task of direct ST across a range of training data settings on a realistic multi-speaker speech corpus. Our contributions include a set of open-source tools to encourage further research

Edinburgh Research Archive

A Critical Look at the Music Classification Experiment Pipeline: Using Interventions to Detect and Account for Confounding Effects

Author: RODRÍGUEZ-ALGARRA Francisco
Publication venue: 'Queen Mary University of London'
Publication date: 10/06/2020
Field of study

PhD ThesisThis dissertation focuses on the problemof confounding in the design and analysis of music classification experiments. Classification experiments dominate evaluation of music content analysis systems and methods, but achieving high performance on such experiments does not guarantee systems properly address the intended problem. The research presented here proposes and illustrates modifications to the conventional experimental pipeline, which aim at improving the understanding of the evaluated systems and methods, facilitating valid conclusions on their suitability for the target problem. Firstly,multiple analyses are conducted to determinewhich cues scattering-based systems use to predict the annotations of the GTZAN music genre collection. In-depth system analysis informs empirical approaches that alter the experimental pipeline. In particular, deflation manipulations and targeted interventions on the partitioning strategy, the learning algorithm and the frequency content of the data reveal that systems using scattering-based features exploit faults in GTZAN and previously unknown information at inaudible frequencies. Secondly, the use of interventions on the experimental pipeline is extended and systematised to a procedure for characterising effects of confounding information in the results of classification experiments. Regulated bootstrap, a novel resampling strategy, is proposed to address challenges associated with interventions dealing with partitioning. The procedure is demonstrated on GTZAN, analysing the effect of artist replication and infrasonic information on performance measurements using a wide range of systemconstruction methods. Finally, mathematical models relating measurements from classification experiments and potentially contributing factors are proposed and discussed. Suchmodels enable decomposing measurements into contributions of interest, which may differ depending on the goals of the study, including those from pipeline interventions. The adequacy for classification experiments of some conventional assumptions underlying such models is also examined. The reported research highlights the need for evaluation procedures that go beyond performance maximisation. Accounting for the effects of confounding information using procedures grounded on the principles of experimental design promises to facilitate the development of systems that generalise beyond the restricted experimental settings

Queen Mary Research Online