2,300 research outputs found
Recommended from our members
Improving music genre classification using automatically induced harmony rules
We present a new genre classification framework using both low-level signal-based features and high-level harmony features. A state-of-the-art statistical genre classifier based on timbral features is extended using a first-order random forest containing for each genre rules derived from harmony or chord sequences. This random forest has been automatically induced, using the first-order logic induction algorithm TILDE, from a dataset, in which for each chord the degree and chord category are identified, and covering classical, jazz and pop genre classes. The audio descriptor-based genre classifier contains 206 features, covering spectral, temporal, energy, and pitch characteristics of the audio signal. The fusion of the harmony-based classifier with the extracted feature vectors is tested on three-genre subsets of the GTZAN and ISMIR04 datasets, which contain 300 and 448 recordings, respectively. Machine learning classifiers were tested using 5 Ă 5-fold cross-validation and feature selection. Results indicate that the proposed harmony-based rules combined with the timbral descriptor-based genre classification system lead to improved genre classification rates
Recommended from our members
Improving music genre classification using automatically induced harmony rules
We present a new genre classification framework using both low-level signal-based features and high-level harmony features. A state-of-the-art statistical genre classifier based on timbral features is extended using a first-order random forest containing for each genre rules derived from harmony or chord sequences. This random forest has been automatically induced, using the first-order logic induction algorithm TILDE, from a dataset, in which for each chord the degree and chord category are identified, and covering classical, jazz and pop genre classes. The audio descriptor-based genre classifier contains 206 features, covering spectral, temporal, energy, and pitch characteristics of the audio signal. The fusion of the harmony-based classifier with the extracted feature vectors is tested on three-genre subsets of the GTZAN and ISMIR04 datasets, which contain 300 and 448 recordings, respectively. Machine learning classifiers were tested using 5 Ă 5-fold cross-validation and feature selection. Results indicate that the proposed harmony-based rules combined with the timbral descriptor-based genre classification system lead to improved genre classification rates
IRIM at TRECVID2009: High Level Feature Extraction
International audienceThe IRIM group is a consortium of French teams working on Multimedia Indexing and Retrieval. This paper describes our participation to the TRECVID 2009 High Level Features detection task. We evaluated a large number of different descriptors (on TRECVID 2008 data) and tried different fusion strategies, in particular hierarchical fusion and genetic fusion. The best IRIM run has a Mean Inferred Average Precision of 0.1220, which is significantly above TRECVID 2009 HLF detection task median performance. We found that fusion of the classification scores from different classifier types improves the performance and that even with a quite low individual performance, audio descriptors can help
Ordinal HyperPlane Loss
This research presents the development of a new framework for analyzing ordered class data, commonly called âordinal classâ data. The focus of the work is the development of classifiers (predictive models) that predict classes from available data. Ratings scales, medical classification scales, socio-economic scales, meaningful groupings of continuous data, facial emotional intensity and facial age estimation are examples of ordinal data for which data scientists may be asked to develop predictive classifiers. It is possible to treat ordinal classification like any other classification problem that has more than two classes. Specifying a model with this strategy does not fully utilize the ordering information of classes. Alternatively, the researcher may choose to treat the ordered classes as though they are continuous values. This strategy imposes a strong assumption that the real âdistanceâ between two adjacent classes is equal to the distance between two other adjacent classes (e.g., a rating of â0â versus â1,â on an 11-point scale is the same distance as a â9â versus a â10â). For Deep Neural Networks (DNNs), the problem of predicting k ordinal classes is typically addressed by performing k-1 binary classifications. These models may be estimated within a single DNN and require an evaluation strategy to determine the class prediction. Another common option is to treat ordinal classes as continuous values for regression and then adjust the cutoff points that represent class boundaries that differentiate one class from another. This research reviews a novel loss function called Ordinal Hyperplane Loss (OHPL) that is particularly designed for data with ordinal classes. OHPLnet has been demonstrated to be a significant advancement in predicting ordinal classes for industry standard structured datasets. The loss function also enables deep learning techniques to be applied to the ordinal classification problem of unstructured data. By minimizing OHPL, a deep neural network learns to map data to an optimal space in which the distance between points and their class centroids are minimized while a nontrivial ordering relationship among classes are maintained. The research reported in this document advances OHPL loss, from a minimally viable loss function, to a more complete deep learning methodology. New analysis strategies were developed and tested that improve model performance as well as algorithm consistency in developing classification models. In the applications chapters, a new algorithm variant is introduced that enables OHPLall to be used when large data records cause a severe limitation on batch size when developing a related Deep Neural Network
COST292 experimental framework for TRECVID 2008
In this paper, we give an overview of the four tasks submitted to TRECVID 2008 by COST292. The high-level feature extraction framework comprises four systems. The first system transforms a set of low-level descriptors into the semantic space using Latent Semantic Analysis and utilises neural networks for feature detection. The second system uses a multi-modal classifier based on SVMs and several descriptors. The third system uses three image classifiers based on ant colony optimisation, particle swarm optimisation and a multi-objective learning algorithm. The fourth system uses a Gaussian model for singing detection and a person detection algorithm. The search task is based on an interactive retrieval application combining retrieval functionalities in various modalities with a user interface supporting automatic and interactive search over all queries submitted. The rushes task submission is based on a spectral clustering approach for removing similar scenes based on eigenvalues of frame similarity matrix and and a redundancy removal strategy which depends on semantic features extraction such as camera motion and faces. Finally, the submission to the copy detection task is conducted by two different systems. The first system consists of a video module and an audio module. The second system is based on mid-level features that are related to the temporal structure of videos
- âŠ