3 research outputs found

    A music cognition-guided framework for multi-pitch estimation.

    Get PDF
    As one of the most important subtasks of automatic music transcription (AMT), multi-pitch estimation (MPE) has been studied extensively for predicting the fundamental frequencies in the frames of audio recordings during the past decade. However, how to use music perception and cognition for MPE has not yet been thoroughly investigated. Motivated by this, this demonstrates how to effectively detect the fundamental frequency and the harmonic structure of polyphonic music using a cognitive framework. Inspired by cognitive neuroscience, an integration of the constant Q transform and a state-of-the-art matrix factorization method called shift-invariant probabilistic latent component analysis (SI-PLCA) are proposed to resolve the polyphonic short-time magnitude log-spectra for multiple pitch estimation and source-specific feature extraction. The cognitions of rhythm, harmonic periodicity and instrument timbre are used to guide the analysis of characterizing contiguous notes and the relationship between fundamental frequency and harmonic frequencies for detecting the pitches from the outcomes of SI-PLCA. In the experiment, we compare the performance of proposed MPE system to a number of existing state-of-the-art approaches (seven weak learning methods and four deep learning methods) on three widely used datasets (i.e. MAPS, BACH10 and TRIOS) in terms of F-measure (F1) values. The experimental results show that the proposed MPE method provides the best overall performance against other existing methods

    Prediction in polyphony: modelling musical auditory scene analysis

    Get PDF
    PhDHow do we know that a melody is a melody? In other words, how does the human brain extract melody from a polyphonic musical context? This thesis begins with a theoretical presentation of musical auditory scene analysis (ASA) in the context of predictive coding and rule-based approaches and takes methodological and analytical steps to evaluate selected components of a proposed integrated framework for musical ASA, unified by prediction. Predictive coding has been proposed as a grand unifying model of perception, action and cognition and is based on the idea that brains process error to refine models of the world. Existing models of ASA tackle distinct subsets of ASA and are currently unable to integrate all the acoustic and extensive contextual information needed to parse auditory scenes. This thesis proposes a framework capable of integrating all relevant information contributing to the understanding of musical auditory scenes, including auditory features, musical features, attention, expectation and listening experience, and examines a subset of ASA issues – timbre perception in relation to musical training, modelling temporal expectancies, the relative salience of musical parameters and melody extraction – using probabilistic approaches. Using behavioural methods, attention is shown to influence streaming perception based on timbre more than instrumental experience. Using probabilistic methods, information content (IC) for temporal aspects of music as generated by IDyOM (information dynamics of music; Pearce, 2005), are validated and, along with IC for pitch and harmonic aspects of the music, are subsequently linked to perceived complexity but not to salience. Furthermore, based on the hypotheses that a melody is internally coherent and the most complex voice in a piece of polyphonic music, IDyOM has been extended to extract melody from symbolic representations of chorales by J.S. Bach and a selection of string quartets by W.A. Mozart
    corecore