Search CORE

47 research outputs found

High-resolution sinusoidal analysis for resolving harmonic collisions in music audio signal processing

Author: Ehmann Andreas
Publication venue
Publication date: 01/12/2011
Field of study

Many music signals can largely be considered an additive combination of multiple sources, such as musical instruments or voice. If the musical sources are pitched instruments, the spectra they produce are predominantly harmonic, and are thus well suited to an additive sinusoidal model. However, due to resolution limits inherent in time-frequency analyses, when the harmonics of multiple sources occupy equivalent time-frequency regions, their individual properties are additively combined in the time-frequency representation of the mixed signal. Any such time-frequency point in a mixture where multiple harmonics overlap produces a single observation from which the contributions owed to each of the individual harmonics cannot be trivially deduced. These overlaps are referred to as overlapping partials or harmonic collisions. If one wishes to infer some information about individual sources in music mixtures, the information carried in regions where collided harmonics exist becomes unreliable due to interference from other sources. This interference has ramifications in a variety of music signal processing applications such as multiple fundamental frequency estimation, source separation, and instrumentation identification. This thesis addresses harmonic collisions in music signal processing applications. As a solution to the harmonic collision problem, a class of signal subspace-based high-resolution sinusoidal parameter estimators is explored. Specifically, the direct matrix pencil method, or equivalently, the Estimation of Signal Parameters via Rotational Invariance Techniques (ESPRIT) method, is used with the goal of producing estimates of the salient parameters of individual harmonics that occupy equivalent time-frequency regions. This estimation method is adapted here to be applicable to time-varying signals such as musical audio. While high-resolution methods have been previously explored in the context of music signal processing, previous work has not addressed whether or not such methods truly produce high-resolution sinusoidal parameter estimates in real-world music audio signals. Therefore, this thesis answers the question of whether high-resolution sinusoidal parameter estimators are really high-resolution for real music signals. This work directly explores the capabilities of this form of sinusoidal parameter estimation to resolve collided harmonics. The capabilities of this analysis method are also explored in the context of music signal processing applications. Potential benefits of high-resolution sinusoidal analysis are examined in experiments involving multiple fundamental frequency estimation and audio source separation. This work shows that there are indeed benefits to high-resolution sinusoidal analysis in music signal processing applications, especially when compared to methods that produce sinusoidal parameter estimates based on more traditional time-frequency representations. The benefits of this form of sinusoidal analysis are made most evident in multiple fundamental frequency estimation applications, where substantial performance gains are seen. High-resolution analysis in the context of computational auditory scene analysis-based source separation shows similar performance to existing comparable methods

Illinois Digital Environment for Access to Learning and Scholarship Repository

Computational Modelling and Analysis of Vibrato and Portamento in Expressive Music Performance

Author: Yang Luwei
Publication venue: 'Queen Mary University of London'
Publication date: 13/07/2017
Field of study

PhD, 148ppVibrato and portamento constitute two expressive devices involving continuous pitch modulation and is widely employed in string, voice, wind music instrument performance. Automatic extraction and analysis of such expressive features form some of the most important aspects of music performance research and represents an under-explored area in music information retrieval. This thesis aims to provide computational and scalable solutions for the automatic extraction and analysis of performed vibratos and portamenti. Applications of the technologies include music learning, musicological analysis, music information retrieval (summarisation, similarity assessment), and music expression synthesis. To automatically detect vibratos and estimate their parameters, we propose a novel method based on the Filter Diagonalisation Method (FDM). The FDM remains robust over short time frames, allowing frame sizes to be set at values small enough to accurately identify local vibrato characteristics and pinpoint vibrato boundaries. For the determining of vibrato presence, we test two alternate decision mechanisms—the Decision Tree and Bayes’ Rule. The FDM systems are compared to state-of-the-art techniques and obtains the best results. The FDM’s vibrato rate accuracies are above 92.5%, and the vibrato extent accuracies are about 85%. We use the Hidden Markov Model (HMM) with Gaussian Mixture Model (GMM) to detect portamento existence. Upon extracting the portamenti, we propose a Logistic Model for describing portamento parameters. The Logistic Model has the lowest root mean squared error and the highest adjusted Rsquared value comparing to regression models employing Polynomial and Gaussian functions, and the Fourier Series. The vibrato and portamento detection and analysis methods are implemented in AVA, an interactive tool for automated detection, analysis, and visualisation of vibrato and portamento. Using the system, we perform crosscultural analyses of vibrato and portamento differences between erhu and violin performance styles, and between typical male or female roles in Beijing opera singing

Queen Mary Research Online

A very low latency pitch tracker for audio to midi conversion

Author: Derrien Olivier
Publication venue: HAL CCSD
Publication date: 01/09/2014
Field of study

International audienceAn algorithm for estimating the fundamental frequency of a single-pitch audio signal is described, for application to audio-to-MIDI conversion. In order to minimize latency, this method is based on the ESPRIT algorithm, together with a statistical model for partials frequencies. It is tested on real guitar recordings and compared to the YIN estimator. We show that, in this particular context, both methods exhibit a similar accuracy but the periodicity measure, used for note segmentation, is much more stable with the ESPRIT-based algorithm. This allows to significantly reduce ghost notes. This method is also able to get very close to the theoretical mini-mum latency, i.e. the fundamental period of the lowest observable pitch. Furthermore, it appears that fast implementations can reach a reasonable complexity and could be compatible with real-time, although this is not tested is this study

HAL AMU

Advances In Internal Model Principle Control Theory

Author: Lu Jin
Publication venue: Scholarship@Western
Publication date: 11/02/2011
Field of study

In this thesis, two advanced implementations of the internal model principle (IMP) are presented. The first is the identification of exponentially damped sinusoidal (EDS) signals with unknown parameters which are widely used to model audio signals. This application is developed in discrete time as a signal processing problem. An IMP based adaptive algorithm is developed for estimating two EDS parameters, the damping factor and frequency. The stability and convergence of this adaptive algorithm is analyzed based on a discrete time two time scale averaging theory. Simulation results demonstrate the identification performance of the proposed algorithm and verify its stability. The second advanced implementation of the IMP control theory is the rejection of disturbances consisting of both predictable and unpredictable components. An IMP controller is used for rejecting predictable disturbances. But the phase lag introduced by the IMP controller limits the rejection capability of the wideband disturbance controller, which is used for attenuating unpredictable disturbance, such as white noise. A combination of open and closed-loop control strategy is presented. In the closed-loop mode, both controllers are active. Once the tracking error is insignificant, the input to the IMP controller is disconnected while its output control action is maintained. In the open loop mode, the wideband disturbance controller is made more aggressive for attenuating white noise. Depending on the level of the tracking error, the input to the IMP controller is connected intermittently. Thus the system switches between open and closed-loop modes. A state feedback controller is designed as the wideband disturbance controller in this application. Two types of predictable disturbances are considered, constant and periodic. For a constant disturbance, an integral controller, the simplest IMP controller, is used. For a periodic disturbance with unknown frequencies, adaptive IMP controllers are used to estimate the frequencies before cancelling the disturbances. An extended multiple Lyapunov functions (MLF) theorem is developed for the stability analysis of this intermittent control strategy. Simulation results justify the optimal rejection performance of this switched control by comparing with two other traditional controllers

Scholarship@Western

Physically Informed Subtraction of a String's Resonances from Monophonic, Discretely Attacked Tones : a Phase Vocoder Approach

Author: Hodgkinson Matthieu
Publication venue
Publication date: 01/05/2012
Field of study

A method for the subtraction of a string's oscillations from monophonic, plucked- or hit-string tones is presented. The remainder of the subtraction is the response of the instrument's body to the excitation, and potentially other sources, such as faint vibrations of other strings, background noises or recording artifacts. In some respects, this method is similar to a stochastic-deterministic decomposition based on Sinusoidal Modeling Synthesis [MQ86, IS87]. However, our method targets string partials expressly, according to a physical model of the string's vibrations described in this thesis. Also, the method sits on a Phase Vocoder scheme. This approach has the essential advantage that the subtraction of the partials can take place \instantly", on a frame-by-frame basis, avoiding the necessity of tracking the partials and therefore availing of the possibility of a real-time implementation. The subtraction takes place in the frequency domain, and a method is presented whereby the computational cost of this process can be reduced through the reduction of a partial's frequency-domain data to its main lobe. In each frame of the Phase Vocoder, the string is encoded as a set of partials, completely described by four constants of frequency, phase, magnitude and exponential decay. These parameters are obtained with a novel method, the Complex Exponential Phase Magnitude Evolution (CSPME), which is a generalisation of the CSPE [SG06] to signals with exponential envelopes and which surpasses the nite resolution of the Discrete Fourier Transform. The encoding obtained is an intuitive representation of the string, suitable to musical processing

MURAL - Maynooth University Research Archive Library

Acoustic Speaker Localization with Strong Reverberation and Adaptive Feature Filtering with a Bayes RFS Framework

Author: Lin Shoufeng
Publication venue: Curtin University
Publication date: 01/01/2019
Field of study

The thesis investigates the challenges of speaker localization in presence of strong reverberation, multi-speaker tracking, and multi-feature multi-speaker state filtering, using sound recordings from microphones. Novel reverberation-robust speaker localization algorithms are derived from the signal and room acoustics models. A multi-speaker tracking filter and a multi-feature multi-speaker state filter are developed based upon the generalized labeled multi-Bernoulli random finite set framework. Experiments and comparative studies have verified and demonstrated the benefits of the proposed methods

espace@Curtin

Recommended from our members

Short-Range Millimeter-Wave Sensing and Imaging: Theory, Experiments and Super-Resolution Algorithms

Author: Mamandipoor Babak
Publication venue: eScholarship, University of California
Publication date: 01/01/2017
Field of study

Recent advancements in silicon technology offer the possibility of realizing low-cost and highly integrated radar sensor and imaging systems in mm-wave (between 30 and 300 GHz) and beyond. Such active short-range mm-wave systems have a wide range of applications including medical imaging, security scanning, autonomous vehicle navigation, and human gesture recognition. Moving to higher frequencies provides us with the spectral and spatial degrees of freedom that we need for high resolution imaging and sensing application. Increased bandwidth availability enhances range resolution by increasing the degrees of freedom in the time-frequency domain. Cross-range resolution is enhanced by the increase in the number of spatial degrees of freedom for a constrained form factor. The focus of this thesis is to explore system design and algorithmic development to utilize the available degrees of freedom in mm-wave frequencies in order to realize imaging and sensing capabilities under cost, complexity and form factor constraints. We first consider the fundamental problem of estimating frequencies and gains in a noisy mixture of sinusoids. This problem is ubiquitous in radar sensing applications, including target range and velocity estimation using standard radar waveforms (e.g., chirp or stepped frequency continuous wave), and direction of arrival estimation using an array of antenna elements. We have developed a fast and robust iterative algorithm for super-resolving the frequencies and gains, and have demonstrated near-optimal performance in terms of frequency estimation accuracy by benchmarking against the Cramer Rao Bound in various scenarios.Next, we explore cross-range radar imaging using an array of antenna elements under severe cost, complexity and form factor constraints. We show that we must account for such constraints in a manner that is quite different from that of conventional radar, and introduce new models and algorithms validated by experimental results. In order to relax the synchronization requirements across multiple transceiver elements we have considered the monostatic architecture in which only the co-located elements are synchronized. We investigate the impact of sparse spatial sampling by reducing the number of array antenna elements, and show that ``sparse monostatic'' architecture leads to grating lobe artifact, which introduces ambiguity in the detection/estimation of point targets in the scene. At short ranges, however, targets are ``low-pass'' and contain extended features (consisting of a continuum of points), and are not well-modeled by a small number of point scatterers. We introduce the concept of ``spatial aggregation,'' which provides the flexibility of constructing a dictionary in which each atom corresponds to a collection of point scatterers, and demonstrate its effectiveness in suppressing the grating lobes and preserving the information in the scene.Finally, we take a more fundamental and systematic approach based on singular decomposition of the imaging system, to understand the information capacity and the limits of performance for various geometries. In general, a scene can be described by an infinite number of independent parameters. However, the number of independent parameters that can be measured through an imaging system (also known as the degrees of freedom of the system) is typically finite, and is constrained by the geometry and wavelength. We introduce a measure to predict the number of spatial degrees of freedom of 1D imaging systems for both monostatic and multistatic array architectures. Our analysis reveals that there is no fundamental benefit in multistatic architecture compared to monostatic in terms of achievable degrees of freedom. The real benefit of multistatic architecture from a practical point of view, is in being able to design sparse transmit and receive antenna arrays that are capable of achieving the available degrees of freedom. Moreover, our analytical framework opens up new avenues to investigate image formation techniques that aim to reconstruct the reflectivity function of the scene by solving an inverse scattering problem, and provides crucial insights on the achievable resolution

eScholarship - University of California

Model-based Analysis and Processing of Speech and Audio Signals

Author: Christensen Mads Græsbøll
Publication venue: Aalborg Universitetsforlag
Publication date: 01/01/2020
Field of study

VBN

Recommended from our members

Human presence detection using millimeter-wave radiometry

Author: Nanzer Jeffrey A. (Jeffrey Allan)
Publication venue
Publication date: 01/05/2008
Field of study

textA novel method of human presence detection using passive millimeter-wave sensors is presented. The method focuses on detecting a standing human from a moving platform in a cluttered outdoor environment using millimeter-wave radiometry, which has not been attempted before. Ka-band radiometers are used in total power mode as well as correlation mode, which ideally responds well to self-luminous objects such as humans. The intrinsic radiative power from a human is derived as well as the responses of the total power and correlation mode. The application of correlation radiometer theory to the detection of self-luminous objects at close range is presented in the context of human presence detection. Modifications and additions to techniques developed in radio astronomy and remote sensing for close range terrestrial situations are developed and discussed. The correlation radiometer fringe frequency is analyzed in the context of the scanning beam detection system and is estimated using MUSIC and ESPRIT. Detection and classification of humans is accomplished using a Naïve Bayesian classifier. The performance of the classifier is measured using the F1-measure and the receiver operating characteristic.Electrical and Computer Engineerin

Texas ScholarWorks

Enhancement of Periodic Signals:with Application to Speech Signals

Author: Jensen Jesper Rindom
Publication venue
Publication date: 01/01/2012
Field of study

VBN