1,083 research outputs found
Recommended from our members
Adaptive Frequency Neural Networks for Dynamic Pulse and Metre Perception.
Beat induction, the means by which humans listen to music and perceive a steady pulse, is achieved via a perceptualand cognitive process. Computationally modelling this phenomenon is an open problem, especially when processing expressive shaping of the music such as tempo change.To meet this challenge we propose Adaptive Frequency Neural Networks (AFNNs), an extension of Gradient Frequency Neural Networks (GFNNs).GFNNs are based on neurodynamic models and have been applied successfully to a range of difficult music perception problems including those with syncopated and polyrhythmic stimuli. AFNNs extend GFNNs by applying a Hebbian learning rule to the oscillator frequencies. Thus the frequencies in an AFNN adapt to the stimulus through an attraction to local areas of resonance, and allow for a great dimensionality reduction in the network.Where previous work with GFNNs has focused on frequency and amplitude responses, we also consider phase information as critical for pulse perception. Evaluating the time-based output, we find significantly improved re-sponses of AFNNs compared to GFNNs to stimuli with both steady and varying pulse frequencies. This leads us to believe that AFNNs could replace the linear filtering methods commonly used in beat tracking and tempo estimationsystems, and lead to more accurate methods
Recommended from our members
Modelling metrical flux: an adaptive frequency neural network for expressive rhythmic perception and prediction
Beat induction is the perceptual and cognitive process by which humans listen to music and perceive a steady pulse. Computationally modelling beat induction is important for many Music Information Retrieval (MIR) methods and is in general an open problem, especially when processing expressive timing, e.g. tempo changes or rubato.
A neuro-cognitive model has been proposed, the Gradient Frequency Neural Network (GFNN), which can model the perception of pulse and metre. GFNNs have been applied successfully to a range of ‘difficult’ music perception problems such as polyrhythms and syncopation.
This thesis explores the use of GFNNs for expressive rhythm perception and modelling, addressing the current gap in knowledge for how to deal with varying tempo and expressive timing in automated and interactive music systems. The cannonical oscillators contained in a GFNN have entrainment properties, allowing phase shifts and resulting in changes to the observed frequencies. This makes them good candidates for solving the expressive timing problem.
It is found that modelling a metrical perception with GFNNs can improve a machine learning music model. However, it is also discovered that GFNNs perform poorly when dealing with tempo changes in the stimulus.
Therefore, a novel Adaptive Frequency Neural Network (AFNN) is introduced; extending the GFNN with a Hebbian learning rule on oscillator frequencies. Two new adaptive behaviours (attraction and elasticity) increase entrainment in the oscillators, and increase the computational efficiency of the model by allowing for a great reduction in the size of the network.
The AFNN is evaluated over a series of experiments on sets of symbolic and audio rhythms both from the literature and created specifically for this research. Where previous work with GFNNs has focused on frequency and amplitude responses, this thesis considers phase information as critical for pulse perception. Evaluating the time-based output, it was found that AFNNs behave differently to GFNNs: responses to symbolic stimuli with both steady and varying pulses are significantly improved, and on audio data the AFNNs performance matches the GFNN, despite its lower density.
The thesis argues that AFNNs could replace the linear filtering methods commonly used in beat tracking and tempo estimation systems, and lead to more accurate methods
Situational influences on rhythmicity in speech, music, and their interaction.
Brain processes underlying the production and perception of rhythm indicate considerable flexibility in how physical signals are interpreted. This paper explores how that flexibility might play out in rhythmicity in speech and music. There is much in common across the two domains, but there are also significant differences. Interpretations are explored that reconcile some of the differences, particularly with respect to how functional properties modify the rhythmicity of speech, within limits imposed by its structural constraints. Functional and structural differences mean that music is typically more rhythmic than speech, and that speech will be more rhythmic when the emotions are more strongly engaged, or intended to be engaged. The influence of rhythmicity on attention is acknowledged, and it is suggested that local increases in rhythmicity occur at times when attention is required to coordinate joint action, whether in talking or music-making. Evidence is presented which suggests that while these short phases of heightened rhythmical behaviour are crucial to the success of transitions in communicative interaction, their modality is immaterial: they all function to enhance precise temporal prediction and hence tightly coordinated joint action
Computational models of auditory perception from feature extraction to stream segregation and behavior
This is the final version. Available on open access from Elsevier via the DOI in this recordData availability: This is a review study, and as such did not generate any new data.Audition is by nature dynamic, from brainstem processing on sub-millisecond time scales, to segregating and tracking sound sources with changing features, to the pleasure of listening to music and the satisfaction of getting the beat. We review recent advances from computational models of sound localization, of auditory stream segregation and of beat perception/generation. A wealth of behavioral, electrophysiological and imaging studies shed light on these processes, typically with synthesized sounds having regular temporal structure. Computational models integrate knowledge from different experimental fields and at different levels of description. We advocate a neuromechanistic modeling approach that incorporates knowledge of the auditory system from various fields, that utilizes plausible neural mechanisms, and that bridges our understanding across disciplines.Engineering and Physical Sciences Research Council (EPSRC
Recommended from our members
Generating Time: Rhythmic Perception, Prediction and Production with Recurrent Neural Networks
In the quest for a convincing musical agent that performs in real time alongside human performers, the issues surrounding expressively timed rhythm must be addressed. Current beat tracking methods are not sufficient to follow rhythms automatically when dealing with varying tempo and expressive timing. In the generation of rhythm, some existing interactive systems ignore the pulse entirely, or fix a tempo after some time spent listening to input. Since music unfolds in time, we take the view that musical timing needs to be at the core of a music generation system.
Our research explores a connectionist machine learning approach to expressive rhythm generation, based on cognitive and neurological models. Two neural network models are combined within one integrated system. A Gradient Frequency Neural Network (GFNN) models the perception of periodicities by resonating nonlinearly with the musical input, creating a hierarchy of strong and weak oscillations that relate to the metrical structure. A Long Short-term Memory Recurrent Neural Network (LSTM) models longer-term temporal relations based on the GFNN output.
The output of the system is a prediction of when in time the next rhythmic event is likely to occur. These predictions can be used to produce new rhythms, forming a generative model.
We have trained the system on a dataset of expressively performed piano solos and evaluated its ability to accurately predict rhythmic events. Based on the encouraging results, we conclude that the GFNN-LSTM model has great potential to add the ability to follow and generate expressive rhythmic structures to real-time interactive system
Rhythmic complexity and predictive coding::A novel approach to modeling rhythm and meter perception in music
Musical rhythm, consisting of apparently abstract intervals of accented temporal events, has a remarkable capacity to move our minds and bodies. How does the cognitive system enable our experiences of rhythmically complex music? In this paper, we describe some common forms of rhythmic complexity in music and propose the theory of predictive coding (PC) as a framework for understanding how rhythm and rhythmic complexity are processed in the brain. We also consider why we feel so compelled by rhythmic tension in music. First, we consider theories of rhythm and meter perception, which provide hierarchical and computational approaches to modeling. Second, we present the theory of PC, which posits a hierarchical organization of brain responses reflecting fundamental, survival-related mechanisms associated with predicting future events. According to this theory, perception and learning is manifested through the brain’s Bayesian minimization of the error between the input to the brain and the brain’s prior expectations. Third, we develop a PC model of musical rhythm, in which rhythm perception is conceptualized as an interaction between what is heard (“rhythm”) and the brain’s anticipatory structuring of music (“meter”). Finally, we review empirical studies of the neural and behavioral effects of syncopation, polyrhythm and groove, and propose how these studies can be seen as special cases of the PC theory. We argue that musical rhythm exploits the brain’s general principles of prediction and propose that pleasure and desire for sensorimotor synchronization from musical rhythm may be a result of such mechanisms
Music in the brain
Music is ubiquitous across human cultures — as a source of affective and pleasurable experience, moving us both physically and emotionally — and learning to play music shapes both brain structure and brain function. Music processing in the brain — namely, the perception of melody, harmony and rhythm — has traditionally been studied as an auditory phenomenon using passive listening paradigms. However, when listening to music, we actively generate predictions about what is likely to happen next. This enactive aspect has led to a more comprehensive understanding of music processing involving brain structures implicated in action, emotion and learning. Here we review the cognitive neuroscience literature of music perception. We show that music perception, action, emotion and learning all rest on the human brain’s fundamental capacity for prediction — as formulated by the predictive coding of music model. This Review elucidates how this formulation of music perception and expertise in individuals can be extended to account for the dynamics and underlying brain mechanisms of collective music making. This in turn has important implications for human creativity as evinced by music improvisation. These recent advances shed new light on what makes music meaningful from a neuroscientific perspective
- …