77 research outputs found
A parametric method for pitch estimation of piano tones
The efficiency of most pitch estimation methods declines when the analyzed frame is shortened and/or when a wide fundamental frequency (F0) range is targeted. The technique proposed herein jointly uses a periodicity analysis and a spectral matching process to improve the F0 estimation performance in such an adverse context: a 60ms-long data frame together with the whole, 7 1 /4-octaves, piano tessitura. The enhancements are obtained thanks to a parametric approach which, among other things, models the inharmonicity of piano tones. The performance of the algorithm is assessed, is compared to the results obtained from other estimators and is discussed in order to characterize their behavior and typical misestimations. Index Terms â audio processing, pitch estimation 1
Recovery and convergence rate of the Frank-Wolfe Algorithm for the m-EXACT-SPARSE Problem
We study the properties of the Frank-Wolfe algorithm to solve the
m-EXACT-SPARSE reconstruction problem, where a signal y must be expressed as a
sparse linear combination of a predefined set of atoms, called dictionary. We
prove that when the signal is sparse enough with respect to the coherence of
the dictionary, then the iterative process implemented by the Frank-Wolfe
algorithm only recruits atoms from the support of the signal, that is the
smallest set of atoms from the dictionary that allows for a perfect
reconstruction of y. We also prove that under this same condition, there exists
an iteration beyond which the algorithm converges exponentially
Convex nonnegative matrix factorization with missing data
International audienceConvex nonnegative matrix factorization (CNMF) is a variant of nonnegative matrix factorization (NMF) in which the components are a convex combination of atoms of a known dictionary. In this contribution, we propose to extend CNMF to the case where the data matrix and the dictionary have missing entries. After a formulation of the problem in this context of missing data, we propose a majorization-minimization algorithm for the solving of the optimization problem incurred. Experimental results with synthetic data and audio spectrograms highlight an improvement of the performance of reconstruction with respect to standard NMF. The performance gap is particularly significant when the task of reconstruction becomes arduous, e.g. when the ratio of missing data is high, the noise is steep, or the complexity of data is high
An investigation of discrete-state discriminant approaches to single-sensor source separation
International audienceThis paper investigated a new scheme for single-sensor audio source separation. This framework is introduced comparatively to the existing Gaussian mixture model generative approach and is focusing on the mixture states rather than on the source states, resulting in a discrete, joint state discriminant approach. The study establishes the theoretical performance bounds of the proposed scheme and an actual source separation system is designed. The performance is computed on a set of musical recordings and a discussion is proposed, including the question of the source correlation and the possible drawbacks of the method
QuicK-means: Acceleration of K-means by learning a fast transform
K-means -- and the celebrated Lloyd algorithm -- is more than the clustering method it was originally designed to be. It has indeed proven pivotal to help increase the speed of many machine learning and data analysis techniques such as indexing, nearest-neighbor search and prediction, data compression, Radial Basis Function networks; its beneficial use has been shown to carry over to the acceleration of kernel machines (when using the Nyström method). Here, we propose a fast extension of K-means, dubbed QuicK-means, that rests on the idea of expressing the matrix of the centroids as a product of sparse matrices, a feat made possible by recent results devoted to find approximations of matrices as a product of sparse factors. Using such a decomposition squashes the complexity of the matrix-vector product between the factorized centroid matrix and any vector from to , with and , where is the dimension of the training data. This drastic computational saving has a direct impact in the assignment process of a point to a cluster, meaning that it is not only tangible at prediction time, but also at training time, provided the factorization procedure is performed during Lloyd's algorithm. We precisely show that resorting to a factorization step at each iteration does not impair the convergence of the optimization scheme and that, depending on the context, it may entail a reduction of the training time. Finally, we provide discussions and numerical simulations that show the versatility of our computationally-efficient QuicK-means algorithm
Optimal spectral transportation with application to music transcription
International audienceMany spectral unmixing methods rely on the non-negative decomposition of spectral data onto a dictionary of spectral templates. In particular, state-of-the-art music transcription systems decompose the spectrogram of the input signal onto a dictionary of representative note spectra. The typical measures of fit used to quantify the adequacy of the decomposition compare the data and template entries frequency-wise. As such, small displacements of energy from a frequency bin to another as well as variations of timbre can disproportionally harm the fit. We address these issues by means of optimal transportation and propose a new measure of fit that treats the frequency distributions of energy holistically as opposed to frequency-wise. Building on the harmonic nature of sound, the new measure is invariant to shifts of energy to harmonically-related frequencies, as well as to small and local displacements of energy. Equipped with this new measure of fit, the dictionary of note templates can be considerably simplified to a set of Dirac vectors located at the target fundamental frequencies (musical pitch values). This in turns gives ground to a very fast and simple decomposition algorithm that achieves state-of-the-art performance on real musical data. 1 Context Many of nowadays spectral unmixing techniques rely on non-negative matrix decompositions. This concerns for example hyperspectral remote sensing (with applications in Earth observation, astronomy, chemistry, etc.) or audio signal processing. The spectral sample v n (the spectrum of light observed at a given pixel n, or the audio spectrum in a given time frame n) is decomposed onto a dictionary W of elementary spectral templates, characteristic of pure materials or sound objects, such that v n â Wh n. The composition of sample n can be inferred from the non-negative expansion coefficients h n. This paradigm has led to state-of-the-art results for various tasks (recognition, classification, denoising, separation) in the aforementioned areas, and in particular in music transcription, the central application of this paper. In state-of-the-art music transcription systems, the spectrogram V (with columns v n) of a musical signal is decomposed onto a dictionary of pure notes (in so-called multi-pitch estimation) or chords. V typically consists of (power-)magnitude values of a regular short-time Fourier transform (Smaragdis and Brown, 2003). It may also consists of an audio-specific spectral transform such as the Mel-frequency transform, like in (Vincent et al., 2010), or the Q-constant based transform, like in (Oudre et al., 2011). The success of the transcription system depends of course on the adequacy of the time-frequency transform & the dictionary to represent the data V
Dynamic Screening: Accelerating First-Order Algorithms for the Lasso and Group-Lasso
Recent computational strategies based on screening tests have been proposed
to accelerate algorithms addressing penalized sparse regression problems such
as the Lasso. Such approaches build upon the idea that it is worth dedicating
some small computational effort to locate inactive atoms and remove them from
the dictionary in a preprocessing stage so that the regression algorithm
working with a smaller dictionary will then converge faster to the solution of
the initial problem. We believe that there is an even more efficient way to
screen the dictionary and obtain a greater acceleration: inside each iteration
of the regression algorithm, one may take advantage of the algorithm
computations to obtain a new screening test for free with increasing screening
effects along the iterations. The dictionary is henceforth dynamically screened
instead of being screened statically, once and for all, before the first
iteration. We formalize this dynamic screening principle in a general
algorithmic scheme and apply it by embedding inside a number of first-order
algorithms adapted existing screening tests to solve the Lasso or new screening
tests to solve the Group-Lasso. Computational gains are assessed in a large set
of experiments on synthetic data as well as real-world sounds and images. They
show both the screening efficiency and the gain in terms running times
A Dynamic Screening Principle for the Lasso
International audienceThe Lasso is an optimization problem devoted to finding a sparse representation of some signal with respect to a predefined dictionary. An original and computationally-efficient method is proposed here to solve this problem, based on a dynamic screening principle. It makes it possible to accelerate a large class of optimization algorithms by iteratively reducing the size of the dictionary during the optimization process, discarding elements that are provably known not to belong to the solution of the Lasso. The iterative reduction of the dictionary is what we call dynamic screening. As this screening step is inexpensive, the computational cost of the algorithm using our dynamic screening strategy is lower than that of the base algorithm. Numerical experiments on synthetic and real data support the relevance of this approach
- âŠ