16,291 research outputs found
2-D Prony-Huang Transform: A New Tool for 2-D Spectral Analysis
This work proposes an extension of the 1-D Hilbert Huang transform for the
analysis of images. The proposed method consists in (i) adaptively decomposing
an image into oscillating parts called intrinsic mode functions (IMFs) using a
mode decomposition procedure, and (ii) providing a local spectral analysis of
the obtained IMFs in order to get the local amplitudes, frequencies, and
orientations. For the decomposition step, we propose two robust 2-D mode
decompositions based on non-smooth convex optimization: a "Genuine 2-D"
approach, that constrains the local extrema of the IMFs, and a "Pseudo 2-D"
approach, which constrains separately the extrema of lines, columns, and
diagonals. The spectral analysis step is based on Prony annihilation property
that is applied on small square patches of the IMFs. The resulting 2-D
Prony-Huang transform is validated on simulated and real data.Comment: 24 pages, 7 figure
Recommended from our members
Non-Negative Tensor Factorization Applied to Music Genre Classification
Music genre classification techniques are typically applied to the data matrix whose columns are the feature vectors extracted from music recordings. In this paper, a feature vector is extracted using a texture window of one sec, which enables the representation of any 30 sec long music recording as a time sequence of feature vectors, thus yielding a feature matrix. Consequently, by stacking the feature matrices associated to any dataset recordings, a tensor is created, a fact which necessitates studying music genre classification using tensors. First, a novel algorithm for non-negative tensor factorization (NTF) is derived that extends the non-negative matrix factorization. Several variants of the NTF algorithm emerge by employing different cost functions from the class of Bregman divergences. Second, a novel supervised NTF classifier is proposed, which trains a basis for each class separately and employs basis orthogonalization. A variety of spectral, temporal, perceptual, energy, and pitch descriptors is extracted from 1000 recordings of the GTZAN dataset, which are distributed across 10 genre classes. The NTF classifier performance is compared against that of the multilayer perceptron and the support vector machines by applying a stratified 10-fold cross validation. A genre classification accuracy of 78.9% is reported for the NTF classifier demonstrating the superiority of the aforementioned multilinear classifier over several data matrix-based state-of-the-art classifiers
Using the beat histogram for speech rhythm description and language identification
In this paper we present a novel approach for the description of speech rhythm and the extraction of rhythm-related features for automatic language identification (LID). Previous methods have extracted speech rhythm through the calculation of features based on salient elements of speech such as consonants, vowels and syllables. We present how an automatic rhythm extraction method borrowed from music information retrieval, the beat histogram, can be adapted for the analysis of speech rhythm by defining the most relevant novelty functions in the speech signal and extracting features describing their periodicities. We have evaluated those features in a rhythm-based LID task for two multilingual speech corpora using support vector machines, including feature selection methods to identify the most informative descriptors. Results suggest that the method is successful in describing speech rhythm and provides LID classification accuracy comparable to or better than that of other approaches, without the need for a preceding segmentation or annotation of the speech signal. Concerning rhythm typology, the rhythm class hypothesis in its original form seems to be only partly confirmed by our results
- …