1,791 research outputs found
PESTO: Pitch Estimation with Self-supervised Transposition-equivariant Objective
In this paper, we address the problem of pitch estimation using Self
Supervised Learning (SSL). The SSL paradigm we use is equivariance to pitch
transposition, which enables our model to accurately perform pitch estimation
on monophonic audio after being trained only on a small unlabeled dataset. We
use a lightweight ( 30k parameters) Siamese neural network that takes as
inputs two different pitch-shifted versions of the same audio represented by
its Constant-Q Transform. To prevent the model from collapsing in an
encoder-only setting, we propose a novel class-based transposition-equivariant
objective which captures pitch information. Furthermore, we design the
architecture of our network to be transposition-preserving by introducing
learnable Toeplitz matrices.
We evaluate our model for the two tasks of singing voice and musical
instrument pitch estimation and show that our model is able to generalize
across tasks and datasets while being lightweight, hence remaining compatible
with low-resource devices and suitable for real-time applications. In
particular, our results surpass self-supervised baselines and narrow the
performance gap between self-supervised and supervised methods for pitch
estimation
An Adaptive Penalty Approach to Multi-Pitch Estimation
This work treats multi-pitch estimation, and in particular the common misclassification issue wherein the pitch at half of the true fundamental frequency, here referred to as a sub-octave, is chosen instead of the true pitch. Extending on current methods which use an extension of the Group LASSO for pitch estimation, this work introduces an adaptive total variation penalty, which both enforce group- and block sparsity, and deal with errors due to sub-octaves. The method is shown to outperform current state-of-the-art sparse methods, where the model orders are unknown, while also requiring fewer tuning parameters than these. The method is also shown to outperform several conventional pitch estimation methods, even when these are virtued with oracle model orders
PITCH ESTIMATION FOR NOISY SPEECH
In this dissertation a biologically plausible system of pitch estimation is proposed. The system is
designed from the bottom up to be robust to challenging noise conditions. This robustness to
the presence of noise in the signal is achieved by developing a new representation of the speech
signal, based on the operation of damped harmonic oscillators, and temporal mode analysis of
their output. This resulting representation is shown to possess qualities which are not degraded
in presence of noise. A harmonic grouping based system is used to estimate the pitch frequency.
A detailed statistical analysis is performed on the system, and performance compared with some
of the most established and recent pitch estimation and tracking systems. The detailed analysis
includes results of experiments with a variety of noises with a large range of signal to noise ratios,
under different signal conditions. Situations where the interfering "noise" is speech from another
speaker are also considered. The proposed system is able to estimate the pitch of both the main
speaker, and the interfering speaker, thus emulating the phenomena of auditory streaming and
"cocktail party effect" in terms of pitch perception. The results of the extensive statistical analysis
show that the proposed system exhibits some very interesting properties in its ability of handling
noise. The results also show that the proposed system’s overall performance is much better than
any of the other systems tested, especially in presence of very large amounts of noise. The system
is also shown to successfully simulate some very interesting psychoacoustical pitch perception
phenomena. Through a detailed and comparative computational requirements analysis, it is also
demonstrated that the proposed system is comparatively inexpensive in terms of processing and
memory requirements
A Parametric Method for Multi-Pitch Estimation
This thesis proposes a novel method for multi-pitch estimation. The method operates by posing pitch estimation as a sparse recovery problem which is solved using convex optimization techniques. In that respect, it is an extension of an earlier presented estimation method based on the group-LASSO. However, by introducing an adaptive total variation penalty, the proposed method requires fewer user supplied parameters, thereby simplifying the estimation procedure. The method is shown to have comparable to superior performance in low noise environments when compared to three standard multi-pitch estimation methods as well as the predecessor method. Also presented is a scheme for automatic selection of the regularization parameters, thereby making the method more user friendly. Used together with this scheme, the proposed method is shown to yield accurate, although not statistically efficent, pitch Estimates when evaluated on synthetic speech data
Maximum Likelihood Pitch Estimation Using Sinusoidal Modeling
The aim of the work presented in this thesis is to automatically extract the fundamental frequency of a periodic signal from noisy observations, a task commonly referred to as pitch estimation. An algorithm for optimal pitch estimation using a maximum likelihood formulation is presented. The speech waveform is modeled using sinusoidal basis functions that are harmonically tied together to explicitly capture the periodic structure of voiced speech. The problem of pitch estimation is casted as a model selection problem and the Akaike Information Criterion is used to estimate the pitch. The algorithm is compared with several existing pitch detection algorithms (PDAs) on a reference pitch database. The results indicate the superior performance of the algorithm in comparison with most of the PDAs. The application of parametric modeling in single channel speech segregation and the use of mel-frequency cepstral coefficients for sequential grouping are analyzed in the speech separation challenge database
A geometric framework for pitch estimation on acoustic musical signals
This paper presents a geometric approach to pitch estimation (PE)-an
important problem in Music Information Retrieval (MIR), and a precursor to a
variety of other problems in the field. Though there exist a number of
highly-accurate methods, both mono-pitch estimation and multi-pitch estimation
(particularly with unspecified polyphonic timbre) prove computationally and
conceptually challenging. A number of current techniques, whilst incredibly
effective, are not targeted towards eliciting the underlying mathematical
structures that underpin the complex musical patterns exhibited by acoustic
musical signals. Tackling the approach from both a theoretical and experimental
perspective, we present a novel framework, a basis for further work in the
area, and results that (whilst not state of the art) demonstrate relative
efficacy. The framework presented in this paper opens up a completely new way
to tackle PE problems, and may have uses both in traditional analytical
approaches, as well as in the emerging machine learning (ML) methods that
currently dominate the literature
- …