12 research outputs found

    Estimation of Multiple Pitches in Stereophonic Mixtures using a Codebook-based Approach

    Get PDF

    Estimation of Fundamental Frequencies in Stereophonic Music Mixtures

    Get PDF

    Estimation of Source Panning Parameters and Segmentation of Stereophonic Mixtures

    Get PDF

    Speech Modeling and Robust Estimation for Diagnosis of Parkinson’s Disease

    Get PDF

    Pre-processing of Speech Signals for Robust Parameter Estimation

    Get PDF

    Geometric Approaches to the Pitch Estimation of Acoustic Musical Signals

    Get PDF
    Multi Pitch Estimation (MPE) is a challenging problem in the field of Music Information Retrieval (MIR). In recent literature in particular, it has been approached with Machine Learning (ML) methods, which are largely opaque, hard to interpret, and often difficult to reproduce from just the information provided in the literature. This Thesis presents a model for pitch detection that reduces the problem of MPE to that of distinguishing between false fundamentals (⊗) and their real counterparts. Initially the model is explored from a discrete viewpoint—one that is generally understudied in the field—before incorporating the notion of intensity and assigning Real values to tones. It further provides an in depth characterisation of precisely the ways in which these so-called edge cases can occur, looking in particular at the notion of ‘basic’ edge cases—ones in which the constituent parts are satisfied precisely once. From there, their occurrence is reduced to eight basic edge types (and a ninth type, which is proved to be the only irreducible non-basic type). The results of analysing simulated data on the model are then presented, highlighting the prevalence of the various types with respect to the number of simultaneous fundamentals. In addition, some insight into the use of the model on real data is given, alongside evaluation of a number of simple algorithms utilising the acquired knowledge of edge cases. Finally, this Thesis presents a range of logical future additions and directions for research, including the possibility of adopting a similar approach for other data—not necessarily musical audio

    Proceedings of the 7th Sound and Music Computing Conference

    Get PDF
    Proceedings of the SMC2010 - 7th Sound and Music Computing Conference, July 21st - July 24th 2010

    Complex Neural Networks for Audio

    Get PDF
    Audio is represented in two mathematically equivalent ways: the real-valued time domain (i.e., waveform) and the complex-valued frequency domain (i.e., spectrum). There are advantages to the frequency-domain representation, e.g., the human auditory system is known to process sound in the frequency-domain. Furthermore, linear time-invariant systems are convolved with sources in the time-domain, whereas they may be factorized in the frequency-domain. Neural networks have become rather useful when applied to audio tasks such as machine listening and audio synthesis, which are related by their dependencies on high quality acoustic models. They ideally encapsulate fine-scale temporal structure, such as that encoded in the phase of frequency-domain audio, yet there are no authoritative deep learning methods for complex audio. This manuscript is dedicated to addressing the shortcoming. Chapter 2 motivates complex networks by their affinity with complex-domain audio, while Chapter 3 contributes methods for building and optimizing complex networks. We show that the naive implementation of Adam optimization is incorrect for complex random variables and show that selection of input and output representation has a significant impact on the performance of a complex network. Experimental results with novel complex neural architectures are provided in the second half of this manuscript. Chapter 4 introduces a complex model for binaural audio source localization. We show that, like humans, the complex model can generalize to different anatomical filters, which is important in the context of machine listening. The complex model\u27s performance is better than that of the real-valued models, as well as real- and complex-valued baselines. Chapter 5 proposes a two-stage method for speech enhancement. In the first stage, a complex-valued stochastic autoencoder projects complex vectors to a discrete space. In the second stage, long-term temporal dependencies are modeled in the discrete space. The autoencoder raises the performance ceiling for state of the art speech enhancement, but the dynamic enhancement model does not outperform other baselines. We discuss areas for improvement and note that the complex Adam optimizer improves training convergence over the naive implementation
    corecore