14 research outputs found
Searching for dominant high-level features for music information retrieval
Music Information Retrieval systems are often based on the analysis of a large number of low-level audio features. When dealing with problems of musical genre description and visualization, however, it would be desirable to work with a very limited number of highly informative and discriminant macro-descriptors. In this paper we focus on a specific class of training-based descriptors, which are obtained as the loglikelihood of a Gaussian Mixture Model trained with short musical excerpts that selectively exhibit a certain semantic homogeneity. As these descriptors are critically dependent on the training sets, we approach the problem of how to automatically generate suitable training sets and optimize the associated macro-features in terms of discriminant power and informative impact. We then show the application of a set of three identified macro-features to genre visualization, tracking and classification
TensorLy: Tensor Learning in Python
Tensors are higher-order extensions of matrices. While matrix methods form the cornerstone of traditional machine learning and data analysis, tensor methods have been gaining increasing traction. However, software support for tensor operations is not on the same footing. In order to bridge this gap, we have developed TensorLy, a Python library that provides a high-level API for tensor methods and deep tensorized neural networks. TensorLy aims to follow the same standards adopted by the main projects of the Python scientific community, and to seamlessly integrate with them. Its BSD license makes it suitable for both academic and commercial applications. TensorLy's backend system allows users to perform computations with several libraries such as NumPy or PyTorch to name but a few. They can be scaled on multiple CPU or GPU machines. In addition, using the deep-learning frameworks as backend allows to easily design and train deep tensorized neural networks. TensorLy is available at https://github.com/tensorly/tensorl
The GTZAN dataset: Its contents, its faults, their effects on evaluation, and its future use
The GTZAN dataset appears in at least 100 published works, and is the
most-used public dataset for evaluation in machine listening research for music
genre recognition (MGR). Our recent work, however, shows GTZAN has several
faults (repetitions, mislabelings, and distortions), which challenge the
interpretability of any result derived using it. In this article, we disprove
the claims that all MGR systems are affected in the same ways by these faults,
and that the performances of MGR systems in GTZAN are still meaningfully
comparable since they all face the same faults. We identify and analyze the
contents of GTZAN, and provide a catalog of its faults. We review how GTZAN has
been used in MGR research, and find few indications that its faults have been
known and considered. Finally, we rigorously study the effects of its faults on
evaluating five different MGR systems. The lesson is not to banish GTZAN, but
to use it with consideration of its contents.Comment: 29 pages, 7 figures, 6 tables, 128 reference
Low-Rank and Sparse Decomposition for Hyperspectral Image Enhancement and Clustering
In this dissertation, some new algorithms are developed for hyperspectral imaging analysis enhancement. Tensor data format is applied in hyperspectral dataset sparse and low-rank decomposition, which could enhance the classification and detection performance. And multi-view learning technique is applied in hyperspectral imaging clustering. Furthermore, kernel version of multi-view learning technique has been proposed, which could improve clustering performance. Most of low-rank and sparse decomposition algorithms are based on matrix data format for HSI analysis. As HSI contains high spectral dimensions, tensor based extended low-rank and sparse decomposition (TELRSD) is proposed in this dissertation for better performance of HSI classification with low-rank tensor part, and HSI detection with sparse tensor part. With this tensor based method, HSI is processed in 3D data format, and information between spectral bands and pixels maintain integrated during decomposition process. This proposed algorithm is compared with other state-of-art methods. And the experiment results show that TELRSD has the best performance among all those comparison algorithms. HSI clustering is an unsupervised task, which aims to group pixels into different groups without labeled information. Low-rank sparse subspace clustering (LRSSC) is the most popular algorithms for this clustering task. The spatial-spectral based multi-view low-rank sparse subspace clustering (SSMLC) algorithms is proposed in this dissertation, which extended LRSSC with multi-view learning technique. In this algorithm, spectral and spatial views are created to generate multi-view dataset of HSI, where spectral partition, morphological component analysis (MCA) and principle component analysis (PCA) are applied to create others views. Furthermore, kernel version of SSMLC (k-SSMLC) also has been investigated. The performance of SSMLC and k-SSMLC are compared with sparse subspace clustering (SSC), low-rank sparse subspace clustering (LRSSC), and spectral-spatial sparse subspace clustering (S4C). It has shown that SSMLC could improve the performance of LRSSC, and k-SSMLC has the best performance. The spectral clustering has been proved that it equivalent to non-negative matrix factorization (NMF) problem. In this case, NMF could be applied to the clustering problem. In order to include local and nonlinear features in data source, orthogonal NMF (ONMF), graph-regularized NMF (GNMF) and kernel NMF (k-NMF) has been proposed for better clustering performance. The non-linear orthogonal graph NMF combine both kernel, orthogonal and graph constraints in NMF (k-OGNMF), which push up the clustering performance further. In the HSI domain, kernel multi-view based orthogonal graph NMF (k-MOGNMF) is applied for subspace clustering, where k-OGNMF is extended with multi-view algorithm, and it has better performance and computation efficiency
Non-negative multilinear principal component analysis of auditory temporal modulations for music genre classification
Motivated by psychophysiological investigations on the human auditory system, a bio-inspired two-dimensional auditory representation of music signals is exploited, that captures the slow temporal modulations. Although each recording is represented by a second-order tensor (i.e., a matrix), a third-order tensor is needed to represent a music corpus. Non- negative multilinear principal component analysis (NMPCA) is proposed for the unsupervised dimensionality reduction of the third-order tensors. The NMPCA maximizes the total tensor scatter while preserving the non-negativity of auditory representations. An algorithm for NMPCA is derived by exploiting the structure of the Grassmann manifold. The NMPCA is compared against three multilinear subspace analysis techniques, namely the non-negative tensor factorization, the high-order singular value decomposition, and the multilinear principal component analysis as well as their linear counterparts, i.e., the non-negative matrix factorization, the singular value decomposition, and the principal components analysis in extracting features that are subsequently classified by either support vector machine or nearest neighbor classifiers.Three different sets of experiments conducted on the GTZAN and the ISMIR2004 Genre datasets demonstrate the superiority of NMPCA against the aforementioned subspace analysis techniques in extracting more discriminating features, especially when the training set has small cardinality. The best classification accuracies reported in the paper exceed those obtained by the state-of the-art music genre classification algorithms applied to both datasets