99 research outputs found

    High-resolution sinusoidal analysis for resolving harmonic collisions in music audio signal processing

    Get PDF
    Many music signals can largely be considered an additive combination of multiple sources, such as musical instruments or voice. If the musical sources are pitched instruments, the spectra they produce are predominantly harmonic, and are thus well suited to an additive sinusoidal model. However, due to resolution limits inherent in time-frequency analyses, when the harmonics of multiple sources occupy equivalent time-frequency regions, their individual properties are additively combined in the time-frequency representation of the mixed signal. Any such time-frequency point in a mixture where multiple harmonics overlap produces a single observation from which the contributions owed to each of the individual harmonics cannot be trivially deduced. These overlaps are referred to as overlapping partials or harmonic collisions. If one wishes to infer some information about individual sources in music mixtures, the information carried in regions where collided harmonics exist becomes unreliable due to interference from other sources. This interference has ramifications in a variety of music signal processing applications such as multiple fundamental frequency estimation, source separation, and instrumentation identification. This thesis addresses harmonic collisions in music signal processing applications. As a solution to the harmonic collision problem, a class of signal subspace-based high-resolution sinusoidal parameter estimators is explored. Specifically, the direct matrix pencil method, or equivalently, the Estimation of Signal Parameters via Rotational Invariance Techniques (ESPRIT) method, is used with the goal of producing estimates of the salient parameters of individual harmonics that occupy equivalent time-frequency regions. This estimation method is adapted here to be applicable to time-varying signals such as musical audio. While high-resolution methods have been previously explored in the context of music signal processing, previous work has not addressed whether or not such methods truly produce high-resolution sinusoidal parameter estimates in real-world music audio signals. Therefore, this thesis answers the question of whether high-resolution sinusoidal parameter estimators are really high-resolution for real music signals. This work directly explores the capabilities of this form of sinusoidal parameter estimation to resolve collided harmonics. The capabilities of this analysis method are also explored in the context of music signal processing applications. Potential benefits of high-resolution sinusoidal analysis are examined in experiments involving multiple fundamental frequency estimation and audio source separation. This work shows that there are indeed benefits to high-resolution sinusoidal analysis in music signal processing applications, especially when compared to methods that produce sinusoidal parameter estimates based on more traditional time-frequency representations. The benefits of this form of sinusoidal analysis are made most evident in multiple fundamental frequency estimation applications, where substantial performance gains are seen. High-resolution analysis in the context of computational auditory scene analysis-based source separation shows similar performance to existing comparable methods

    A Robust and Computationally Efficient Subspace-based Fundamental Frequency Estimator

    Get PDF

    Joint High-Resolution Fundamental Frequency and Order Estimation

    Get PDF
    In this paper, we present a novel method for joint estimation of the fundamental frequency and order of a set of harmonically related sinusoids based on the MUltiple SIgnal Classification (MUSIC) estimation criterion. The presented method, termed HMUSIC, is shown to have an efficient implementation using fast Fourier transforms (FFTs). Furthermore, refined estimates can be obtained using a gradient-based method. Illustrative examples of the application of the algorithm to real-life speech and audio signals are given, and the statistical performance of the estimator is evaluated using synthetic signals, demonstrating its good statistical properties

    Time Delay Estimation from Low Rate Samples: A Union of Subspaces Approach

    Full text link
    Time delay estimation arises in many applications in which a multipath medium has to be identified from pulses transmitted through the channel. Various approaches have been proposed in the literature to identify time delays introduced by multipath environments. However, these methods either operate on the analog received signal, or require high sampling rates in order to achieve reasonable time resolution. In this paper, our goal is to develop a unified approach to time delay estimation from low rate samples of the output of a multipath channel. Our methods result in perfect recovery of the multipath delays from samples of the channel output at the lowest possible rate, even in the presence of overlapping transmitted pulses. This rate depends only on the number of multipath components and the transmission rate, but not on the bandwidth of the probing signal. In addition, our development allows for a variety of different sampling methods. By properly manipulating the low-rate samples, we show that the time delays can be recovered using the well-known ESPRIT algorithm. Combining results from sampling theory with those obtained in the context of direction of arrival estimation methods, we develop necessary and sufficient conditions on the transmitted pulse and the sampling functions in order to ensure perfect recovery of the channel parameters at the minimal possible rate. Our results can be viewed in a broader context, as a sampling theorem for analog signals defined over an infinite union of subspaces

    Model-based Analysis and Processing of Speech and Audio Signals

    Get PDF

    Carrier frequency offset recovery for zero-IF OFDM receivers

    Get PDF
    As trends in broadband wireless communications applications demand faster development cycles, smaller sizes, lower costs, and ever increasing data rates, engineers continually seek new ways to harness evolving technology. The zero intermediate frequency receiver architecture has now become popular as it has both economic and size advantages over the traditional superheterodyne architecture. Orthogonal Frequency Division Multiplexing (OFDM) is a popular multi-carrier modulation technique with the ability to provide high data rates over echo ladened channels. It has excellent robustness to impairments caused by multipath, which includes frequency selective fading. Unfortunately, OFDM is very sensitive to the carrier frequency offset (CFO) that is introduced by the downconversion process. The objective of this thesis is to develop and to analyze an algorithm for blind CFO recovery suitable for use with a practical zero-Intermediate Frequency (zero-IF) OFDM telecommunications system. A blind CFO recovery algorithm based upon characteristics of the received signal's power spectrum is proposed. The algorithm's error performance is mathematically analyzed, and the theoretical results are verified with simulations. Simulation shows that the performance of the proposed algorithm agrees with the mathematical analysis. A number of other CFO recovery techniques are compared to the proposed algorithm. The proposed algorithm performs well in comparison and does not suffer from many of the disadvantages of existing blind CFO recovery techniques. Most notably, its performance is not significantly degraded by noisy, frequency selective channels

    A Parametric Method for Multi-Pitch Estimation

    Get PDF
    This thesis proposes a novel method for multi-pitch estimation. The method operates by posing pitch estimation as a sparse recovery problem which is solved using convex optimization techniques. In that respect, it is an extension of an earlier presented estimation method based on the group-LASSO. However, by introducing an adaptive total variation penalty, the proposed method requires fewer user supplied parameters, thereby simplifying the estimation procedure. The method is shown to have comparable to superior performance in low noise environments when compared to three standard multi-pitch estimation methods as well as the predecessor method. Also presented is a scheme for automatic selection of the regularization parameters, thereby making the method more user friendly. Used together with this scheme, the proposed method is shown to yield accurate, although not statistically efficent, pitch Estimates when evaluated on synthetic speech data
    • …
    corecore