97,155 research outputs found

    Improving music genre classification using automatically induced harmony rules

    Get PDF
    We present a new genre classification framework using both low-level signal-based features and high-level harmony features. A state-of-the-art statistical genre classifier based on timbral features is extended using a first-order random forest containing for each genre rules derived from harmony or chord sequences. This random forest has been automatically induced, using the first-order logic induction algorithm TILDE, from a dataset, in which for each chord the degree and chord category are identified, and covering classical, jazz and pop genre classes. The audio descriptor-based genre classifier contains 206 features, covering spectral, temporal, energy, and pitch characteristics of the audio signal. The fusion of the harmony-based classifier with the extracted feature vectors is tested on three-genre subsets of the GTZAN and ISMIR04 datasets, which contain 300 and 448 recordings, respectively. Machine learning classifiers were tested using 5 × 5-fold cross-validation and feature selection. Results indicate that the proposed harmony-based rules combined with the timbral descriptor-based genre classification system lead to improved genre classification rates

    Robustness of Random Forest-based gene selection methods

    Get PDF
    Gene selection is an important part of microarray data analysis because it provides information that can lead to a better mechanistic understanding of an investigated phenomenon. At the same time, gene selection is very difficult because of the noisy nature of microarray data. As a consequence, gene selection is often performed with machine learning methods. The Random Forest method is particularly well suited for this purpose. In this work, four state-of-the-art Random Forest-based feature selection methods were compared in a gene selection context. The analysis focused on the stability of selection because, although it is necessary for determining the significance of results, it is often ignored in similar studies. The comparison of post-selection accuracy in the validation of Random Forest classifiers revealed that all investigated methods were equivalent in this context. However, the methods substantially differed with respect to the number of selected genes and the stability of selection. Of the analysed methods, the Boruta algorithm predicted the most genes as potentially important. The post-selection classifier error rate, which is a frequently used measure, was found to be a potentially deceptive measure of gene selection quality. When the number of consistently selected genes was considered, the Boruta algorithm was clearly the best. Although it was also the most computationally intensive method, the Boruta algorithm's computational demands could be reduced to levels comparable to those of other algorithms by replacing the Random Forest importance with a comparable measure from Random Ferns (a similar but simplified classifier). Despite their design assumptions, the minimal optimal selection methods, were found to select a high fraction of false positives

    RFDCR:Automated brain lesion segmentation using cascaded random forests with dense conditional random fields

    Get PDF
    Segmentation of brain lesions from magnetic resonance images (MRI) is an important step for disease diagnosis, surgical planning, radiotherapy and chemotherapy. However, due to noise, motion, and partial volume effects, automated segmentation of lesions from MRI is still a challenging task. In this paper, we propose a two-stage supervised learning framework for automatic brain lesion segmentation. Specifically, in the first stage, intensity-based statistical features, template-based asymmetric features, and GMM-based tissue probability maps are used to train the initial random forest classifier. Next, the dense conditional random field optimizes the probability maps from the initial random forest classifier and derives the whole tumor regions referred as the region of interest (ROI). In the second stage, the optimized probability maps are further intergraded with features from the intensity-based statistical features and template-based asymmetric features to train subsequent random forest, focusing on classifying voxels within the ROI. The output probability maps will be also optimized by the dense conditional random fields, and further used to iteratively train a cascade of random forests. Through hierarchical learning of the cascaded random forests and dense conditional random fields, the multimodal local and global appearance information is integrated with the contextual information, and the output probability maps are improved layer by layer to finally obtain optimal segmentation results. We evaluated the proposed method on the publicly available brain tumor datasets BRATS 2015 &amp; BRATS 2018, as well as the ischemic stroke dataset ISLES 2015. The results have shown that our framework achieves competitive performance compared to the state-of-the-art brain lesion segmentation methods. In addition, contralateral difference and skewness were identified as the important features in the brain tumor and ischemic stroke segmentation tasks, which conforms to the knowledge and experience of medical experts, further reflecting the reliability and interpretability of our framework.</p

    Enhancing random forests performance in microarray data classification

    Get PDF
    Random forests are receiving increasing attention for classification of microarray datasets. We evaluate the effects of a feature selection process on the performance of a random forest classifier as well as on the choice of two critical parameters, i.e. the forest size and the number of features chosen at each split in growing trees. Results of our experiments suggest that parameters lower than popular default values can lead to effective and more parsimonious classification models. Growing few trees on small subsets of selected features, while randomly choosing a single variable at each split, results in classification performance that compares well with state-of-art studies
    corecore