56 research outputs found

    Platform, culture, identities: exploring young people's game-making

    Get PDF
    Digital games are an important component in the contemporary media landscape. They are cultural artefacts and, as such, are subjected to specific conventions. These conventions shape our imaginary about games, defining, for example, what a game is, who can play them and where. Different research has been developed to understand and challenge these conventions, and one of the strategies often adopted is fostering game-making among ā€œgaming minoritiesā€. By popularising games and their means of production, critical skills towards these objects could be developed, these conventions could be fought, and our perceptions of those artefacts could be transformed. Nevertheless, digital games, as obvious as it sounds, are also digital: they depend on technology to exist and are subjected to different technologiesā€™ affordances and constraints. Technologies, however, are not neutral and objective, but are also cultural: they too are influenced by values and conventions. This means that, even if the means of production of digital games are distributed among more diverse groups, we should not ignore the role played by technology in this process of shaping our imaginary about games. Cultural and technical aspects of digital media are not, therefore, as conflicting as it might seem, finding themselves entangled in digital games. They are also equally influential in our understanding and our cultural uses of these artefacts; but how influential are they? How easy can one go against cultural and technical conventions when producing a game as a non-professional? Can anyone make any kind of game? In this research, I explore young peopleā€™s game-making practices in non-professional contexts to understand how repertoires, gaming conventions and platform affordances and constraints can be influential in this creative process. I organised two different game-making clubs for young people in London/UK (one at a community-led centre for Latin American migrants and other at a comprehensive primary school). The clubs consisted in a series of workshops offered in a weekly basis, totalling a minimum of 12 hours of instruction/production at each research site. The participants were aged between 11 and 18 and produced a total of 11 games across these two sites with MissionMaker, a software that facilitates the creation of 3D games by non-specialists through ready-made 3D assets, custom audio and image files, and a simplified drop-down-list-based scripting language. Three games and their production teams were selected as case studies and investigated through qualitative methods and under a descriptive-interpretive approach. Throughout the game-making clubs, short surveys, observations, unstructured and semi-structured interviews and a game archive (with week-by-week saves of participantsā€™ games) were employed to generate data that was then analysed through a Multimodal Sociosemiotics framework to explore how cultural and technical conventions were appropriated by participants during this experience. Discourses, gaming conventions and MissionMakerā€™s affordances and constraints were appropriated in different ways by participants in the process of game production, culminating in the realisation of different discourses and the construction of diverse identities. These results are relevant since they restate the value of a more holistic approach ā€“ one that looks at both culture and technology ā€“ to critical videogame production within non-professional contexts. These results are also useful to the mapping of the influence of repertoires, conventions and platforms in non-professional game-making contexts, highlighting how these elements are influential but at the same time not prescriptive to the games produced, and how game development processes within these contexts are better understood as dialogical

    One Deep Music Representation to Rule Them All? : A comparative analysis of different representation learning strategies

    Full text link
    Inspired by the success of deploying deep learning in the fields of Computer Vision and Natural Language Processing, this learning paradigm has also found its way into the field of Music Information Retrieval. In order to benefit from deep learning in an effective, but also efficient manner, deep transfer learning has become a common approach. In this approach, it is possible to reuse the output of a pre-trained neural network as the basis for a new learning task. The underlying hypothesis is that if the initial and new learning tasks show commonalities and are applied to the same type of input data (e.g. music audio), the generated deep representation of the data is also informative for the new task. Since, however, most of the networks used to generate deep representations are trained using a single initial learning source, their representation is unlikely to be informative for all possible future tasks. In this paper, we present the results of our investigation of what are the most important factors to generate deep representations for the data and learning tasks in the music domain. We conducted this investigation via an extensive empirical study that involves multiple learning sources, as well as multiple deep learning architectures with varying levels of information sharing between sources, in order to learn music representations. We then validate these representations considering multiple target datasets for evaluation. The results of our experiments yield several insights on how to approach the design of methods for learning widely deployable deep data representations in the music domain.Comment: This work has been accepted to "Neural Computing and Applications: Special Issue on Deep Learning for Music and Audio

    Understanding Agreement and Disagreement in Listenersā€™ Perceived Emotion in Live Music Performance

    Get PDF
    Emotion perception of music is subjective and time dependent. Most computational music emotion recognition (MER) systems overlook time- and listener-dependent factors by averaging emotion judgments across listeners. In this work, we investigate the influence of music, setting (live vs lab vs online), and individual factors on music emotion perception over time. In an initial study, we explore changes in perceived music emotions among audience members during live classical music performances. Fifteen audience members used a mobile application to annotate time-varying emotion judgments based on the valence-arousal model. Inter-rater reliability analyses indicate that consistency in emotion judgments varies significantly across rehearsal segments, with systematic disagreements in certain segments. In a follow-up study, we examine listeners' reasons for their ratings in segments with high and low agreement. We relate these reasons to acoustic features and individual differences. Twenty-one listeners annotated perceived emotions while watching a recorded video of the live performance. They then reflected on their judgments and provided explanations retrospectively. Disagreements were attributed to listeners attending to different musical features or being uncertain about the expressed emotions. Emotion judgments were significantly associated with personality traits, gender, cultural background, and music preference. Thematic analysis of explanations revealed cognitive processes underlying music emotion perception, highlighting attributes less frequently discussed in MER studies, such as instrumentation, arrangement, musical structure, and multimodal factors related to performer expression. Exploratory models incorporating these semantic features and individual factors were developed to predict perceived music emotion over time. Regression analyses confirmed the significance of listener-informed semantic features as independent variables, with individual factors acting as moderators between loudness, pitch range, and arousal. In our final study, we analyzed the effects of individual differences on music emotion perception among 128 participants with diverse backgrounds. Participants annotated perceived emotions for 51 piano performances of different compositions from the Western canon, spanning various era. Linear mixed effects models revealed significant variations in valence and arousal ratings, as well as the frequency of emotion ratings, with regard to several individual factors: music sophistication, music preferences, personality traits, and mood states. Additionally, participants' ratings of arousal, valence, and emotional agreement were significantly associated to the historical time periods of the examined clips. This research highlights the complexity of music emotion perception, revealing it to be a dynamic, individual and context-dependent process. It paves the way for the development of more individually nuanced, time-based models in music psychology, opening up new avenues for personalised music emotion recognition and recommendation, music emotion-driven generation and therapeutic applications

    Handbook of Stemmatology

    Get PDF
    Stemmatology studies aspects of textual criticism that use genealogical methods. This handbook is the first to cover the entire field, encompassing both theoretical and practical aspects, ranging from traditional to digital methods. Authors from all the disciplines involved examine topics such as the material aspects of text traditions, methods of traditional textual criticism and their genesis, and modern digital approaches used in the field

    Music Encoding Conference Proceedings 2021, 19ā€“22 July, 2021 University of Alicante (Spain): Onsite & Online

    Get PDF
    Este documento incluye los artĆ­culos y pĆ³sters presentados en el Music Encoding Conference 2021 realizado en Alicante entre el 19 y el 22 de julio de 2022.Funded by project Multiscore, MCIN/AEI/10.13039/50110001103

    2018 FSDG Combined Abstracts

    Get PDF
    https://scholarworks.gvsu.edu/fsdg_abstracts/1000/thumbnail.jp

    Low-resource speech translation

    Get PDF
    We explore the task of speech-to-text translation (ST), where speech in one language (source) is converted to text in a different one (target). Traditional ST systems go through an intermediate step where the source language speech is first converted to source language text using an automatic speech recognition (ASR) system, which is then converted to target language text using a machine translation (MT) system. However, this pipeline based approach is impractical for unwritten languages spoken by millions of people around the world, leaving them without access to free and automated translation services such as Google Translate. The lack of such translation services can have important real-world consequences. For example, in the aftermath of a disaster scenario, easily available translation services can help better co-ordinate relief efforts. How can we expand the coverage of automated ST systems to include scenarios which lack source language text? In this thesis we investigate one possible solution: we build ST systems to directly translate source language speech into target language text, thereby forgoing the dependency on source language text. To build such a system, we use only speech data paired with text translations as training data. We also specifically focus on low-resource settings, where we expect at most tens of hours of training data to be available for unwritten or endangered languages. Our work can be broadly divided into three parts. First we explore how we can leverage prior work to build ST systems. We find that neural sequence-to-sequence models are an effective and convenient method for ST, but produce poor quality translations when trained in low-resource settings. In the second part of this thesis, we explore methods to improve the translation performance of our neural ST systems which do not require labeling additional speech data in the low-resource language, a potentially tedious and expensive process. Instead we exploit labeled speech data for high-resource languages which is widely available and relatively easier to obtain. We show that pretraining a neural model with ASR data from a high-resource language, different from both the source and target ST languages, improves ST performance. In the final part of our thesis, we study whether ST systems can be used to build applications which have traditionally relied on the availability of ASR systems, such as information retrieval, clustering audio documents, or question/answering. We build proof-of-concept systems for two downstream applications: topic prediction for speech and cross-lingual keyword spotting. Our results indicate that low-resource ST systems can still outperform simple baselines for these tasks, leaving the door open for further exploratory work. This thesis provides, for the first time, an in-depth study of neural models for the task of direct ST across a range of training data settings on a realistic multi-speaker speech corpus. Our contributions include a set of open-source tools to encourage further research

    A Critical Look at the Music Classification Experiment Pipeline: Using Interventions to Detect and Account for Confounding Effects

    Get PDF
    PhD ThesisThis dissertation focuses on the problemof confounding in the design and analysis of music classification experiments. Classification experiments dominate evaluation of music content analysis systems and methods, but achieving high performance on such experiments does not guarantee systems properly address the intended problem. The research presented here proposes and illustrates modifications to the conventional experimental pipeline, which aim at improving the understanding of the evaluated systems and methods, facilitating valid conclusions on their suitability for the target problem. Firstly,multiple analyses are conducted to determinewhich cues scattering-based systems use to predict the annotations of the GTZAN music genre collection. In-depth system analysis informs empirical approaches that alter the experimental pipeline. In particular, deflation manipulations and targeted interventions on the partitioning strategy, the learning algorithm and the frequency content of the data reveal that systems using scattering-based features exploit faults in GTZAN and previously unknown information at inaudible frequencies. Secondly, the use of interventions on the experimental pipeline is extended and systematised to a procedure for characterising effects of confounding information in the results of classification experiments. Regulated bootstrap, a novel resampling strategy, is proposed to address challenges associated with interventions dealing with partitioning. The procedure is demonstrated on GTZAN, analysing the effect of artist replication and infrasonic information on performance measurements using a wide range of systemconstruction methods. Finally, mathematical models relating measurements from classification experiments and potentially contributing factors are proposed and discussed. Suchmodels enable decomposing measurements into contributions of interest, which may differ depending on the goals of the study, including those from pipeline interventions. The adequacy for classification experiments of some conventional assumptions underlying such models is also examined. The reported research highlights the need for evaluation procedures that go beyond performance maximisation. Accounting for the effects of confounding information using procedures grounded on the principles of experimental design promises to facilitate the development of systems that generalise beyond the restricted experimental settings
    • ā€¦
    corecore