23 research outputs found

    KFHE-HOMER: A multi-label ensemble classification algorithm exploiting sensor fusion properties of the Kalman filter

    Full text link
    Multi-label classification allows a datapoint to be labelled with more than one class at the same time. In spite of their success in multi-class classification problems, ensemble methods based on approaches other than bagging have not been widely explored for multi-label classification problems. The Kalman Filter-based Heuristic Ensemble (KFHE) is a recent ensemble method that exploits the sensor fusion properties of the Kalman filter to combine several classifier models, and that has been shown to be very effective. This article proposes KFHE-HOMER, an extension of the KFHE ensemble approach to the multi-label domain. KFHE-HOMER sequentially trains multiple HOMER multi-label classifiers and aggregates their outputs using the sensor fusion properties of the Kalman filter. Experiments described in this article show that KFHE-HOMER performs consistently better than existing multi-label methods including existing approaches based on ensembles.Comment: The paper is under consideration at Pattern Recognition Letters, Elsevie

    A triple-random ensemble classification method for mining multi-label data

    Full text link
    This paper presents a triple-random ensemble learning method for handling multi-label classification problems. The proposed method integrates and develops the concepts of random subspace, bagging and random k-label sets ensemble learning methods to form an approach to classify multi-label data. It applies the random subspace method to feature space, label space as well as instance space. The devised subsets selection procedure is executed iteratively. Each multi-label classifier is trained using the randomly selected subsets. At the end of the iteration, optimal parameters are selected and the ensemble MLC classifiers are constructed. The proposed method is implemented and its performance compared against that of popular multi-label classification methods. The experimental results reveal that the proposed method outperforms the examined counterparts in most occasions when tested on six small to larger multi-label datasets from different domains. This demonstrates that the developed method possesses general applicability for various multi-label classification problems.<br /

    An Ensemble Multilabel Classification for Disease Risk Prediction

    Get PDF
    It is important to identify and prevent disease risk as early as possible through regular physical examinations. We formulate the disease risk prediction into a multilabel classification problem. A novel Ensemble Label Power-set Pruned datasets Joint Decomposition (ELPPJD) method is proposed in this work. First, we transform the multilabel classification into a multiclass classification. Then, we propose the pruned datasets and joint decomposition methods to deal with the imbalance learning problem. Two strategies size balanced (SB) and label similarity (LS) are designed to decompose the training dataset. In the experiments, the dataset is from the real physical examination records. We contrast the performance of the ELPPJD method with two different decomposition strategies. Moreover, the comparison between ELPPJD and the classic multilabel classification methods RAkEL and HOMER is carried out. The experimental results show that the ELPPJD method with label similarity strategy has outstanding performance

    Multi-label learning by extended multi-tier stacked ensemble method with label correlated feature subset augmentation

    Get PDF
    Classification is one of the basic and most important operations that can be used in data science and machine learning applications. Multi-label classification is an extension of the multi-class problem where a set of class labels are associated with a particular instance at a time. In a multiclass problem, a single class label is associated with an instance at a time. However, there are many different stacked ensemble methods that have been proposed and because of the complexity associated with the multi-label problems, there is still a lot of scope for improving the prediction accuracy. In this paper, we are proposing the novel extended multi-tier stacked ensemble (EMSTE) method with label correlationby feature subset selection technique and then augmenting those feature subsets while constructing the intermediate dataset for improving the prediction accuracy in the generalization phase of the stacking. The performance effect of the proposed method has been compared with existing methods and showed that our proposed method outperforms the other methods

    Large scale biomedical texts classification: a kNN and an ESA-based approaches

    Full text link
    With the large and increasing volume of textual data, automated methods for identifying significant topics to classify textual documents have received a growing interest. While many efforts have been made in this direction, it still remains a real challenge. Moreover, the issue is even more complex as full texts are not always freely available. Then, using only partial information to annotate these documents is promising but remains a very ambitious issue. MethodsWe propose two classification methods: a k-nearest neighbours (kNN)-based approach and an explicit semantic analysis (ESA)-based approach. Although the kNN-based approach is widely used in text classification, it needs to be improved to perform well in this specific classification problem which deals with partial information. Compared to existing kNN-based methods, our method uses classical Machine Learning (ML) algorithms for ranking the labels. Additional features are also investigated in order to improve the classifiers' performance. In addition, the combination of several learning algorithms with various techniques for fixing the number of relevant topics is performed. On the other hand, ESA seems promising for this classification task as it yielded interesting results in related issues, such as semantic relatedness computation between texts and text classification. Unlike existing works, which use ESA for enriching the bag-of-words approach with additional knowledge-based features, our ESA-based method builds a standalone classifier. Furthermore, we investigate if the results of this method could be useful as a complementary feature of our kNN-based approach.ResultsExperimental evaluations performed on large standard annotated datasets, provided by the BioASQ organizers, show that the kNN-based method with the Random Forest learning algorithm achieves good performances compared with the current state-of-the-art methods, reaching a competitive f-measure of 0.55% while the ESA-based approach surprisingly yielded reserved results.ConclusionsWe have proposed simple classification methods suitable to annotate textual documents using only partial information. They are therefore adequate for large multi-label classification and particularly in the biomedical domain. Thus, our work contributes to the extraction of relevant information from unstructured documents in order to facilitate their automated processing. Consequently, it could be used for various purposes, including document indexing, information retrieval, etc.Comment: Journal of Biomedical Semantics, BioMed Central, 201

    Gravitation Theory Based Model for Multi-Label Classification

    Get PDF
    The past decade has witnessed the growing popularity in multi-label classification algorithms in the fields like text categorization, music information retrieval, and the classification of videos and medical proteins. In the meantime, the methods based on the principle of universal gravitation have been extensively used in the classification of machine learning owing to simplicity and high performance. In light of the above, this paper proposes a novel multi-label classification algorithm called the interaction and data gravitation-based model for multi-label classification (ITDGM). The algorithm replaces the interaction between two objects with the attraction between two particles. The author carries out a series of experiments on five multi-label datasets. The experimental results show that the ITDGM performs better than some well-known multi-label classification algorithms. The effect of the proposed model is assessed by the example-based F1-Measure and Label-based micro F1-measure

    Why do Sequence Signatures Predict Enzyme Mechanism?:Homology versus Chemistry

    Get PDF
    We identify, firstly, InterPro sequence signatures representing evolutionary relatedness and, secondly, signatures identifying specific chemical machinery. Thus, we predict the chemical mechanisms of enzyme catalysed reactions from “catalytic” and “non-catalytic” subsets of InterPro signatures. We first scanned our 249 sequences with InterProScan and then used the MACiE database to identify those amino acid residues which are important for catalysis. The sequences were mutated in silico to replace these catalytic residues with glycine, and then again scanned with InterProScan. Those signature matches from the original scan which disappeared on mutation were called “catalytic”. Mechanism was predicted using all signatures, only the 78 “catalytic” signatures, or only the 519 “non-catalytic” signatures. The noncatalytic signatures gave results indistinguishable from those for the whole feature set, with precision of 0.991 and sensitivity of 0.970. The catalytic signatures alone gave less impressive predictivity, with precision and sensitivity of 0.791 and 0.735, respectively. These results show that our successful prediction of enzyme mechanism is mostly by homology rather than by identifying catalytic machinery.Publisher PDFPeer reviewe
    corecore