139,615 research outputs found

    ML-k’sNN: Label Dependent k Values for Multi-Label k-Nearest Neighbor Rule

    Get PDF
    Multi-label classification as a data mining task has recently attracted increasing interest from researchers. Many current data mining applications address problems with instances that belong to more than one category. These problems require the development of new, efficient methods. Multi-label k-nearest neighbors rule, ML-kNN, is among the best-performing methods for multi-label problems. Current methods use a unique k value for all labels, as in the single-label method. However, the distributions of the labels are frequently very different. In such scenarios, a unique k value for the labels might be suboptimal. In this paper, we propose a novel approach in which each label is predicted with a different value of k. Obtaining the best k for each label is stated as an optimization problem. Three different algorithms are proposed for this task, depending on which multi-label metric is the target of our optimization process. In a large set of 40 real-world multi-label problems, our approach improves the results of two different tested ML-kNN implementations

    Exploiting Anti-monotonicity of Multi-label Evaluation Measures for Inducing Multi-label Rules

    Full text link
    Exploiting dependencies between labels is considered to be crucial for multi-label classification. Rules are able to expose label dependencies such as implications, subsumptions or exclusions in a human-comprehensible and interpretable manner. However, the induction of rules with multiple labels in the head is particularly challenging, as the number of label combinations which must be taken into account for each rule grows exponentially with the number of available labels. To overcome this limitation, algorithms for exhaustive rule mining typically use properties such as anti-monotonicity or decomposability in order to prune the search space. In the present paper, we examine whether commonly used multi-label evaluation metrics satisfy these properties and therefore are suited to prune the search space for multi-label heads.Comment: Preprint version. To appear in: Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) 2018. See http://www.ke.tu-darmstadt.de/bibtex/publications/show/3074 for further information. arXiv admin note: text overlap with arXiv:1812.0005

    Scikit-Multiflow: A Multi-output Streaming Framework

    Full text link
    Scikit-multiflow is a multi-output/multi-label and stream data mining framework for the Python programming language. Conceived to serve as a platform to encourage democratization of stream learning research, it provides multiple state of the art methods for stream learning, stream generators and evaluators. scikit-multiflow builds upon popular open source frameworks including scikit-learn, MOA and MEKA. Development follows the FOSS principles and quality is enforced by complying with PEP8 guidelines and using continuous integration and automatic testing. The source code is publicly available at https://github.com/scikit-multiflow/scikit-multiflow.Comment: 5 pages, Open Source Softwar

    Multi-Label Classification Using Higher-Order Label Clusters

    Get PDF
    Multi-label classification (MLC) is one of the major classification approaches in the context of data mining where each instance in the dataset is annotated with a set of labels. The nature of multiple labels associated with one instance often demands higher computational power compared to conventional single-label classification tasks. A multi-label classification is often simplified by decomposing the task into single-label classification which ignores correlations among labels. Incorporating label correlations into classification task can be hard since correlations may be missing, or may exist among a pair or a large subset of labels. In this study, a novel MLC approach is introduced called Multi-Label Classification with Label Clusters (MLC–LC), which incorporates label correlations into a multi-label learning task using label clusters. MLC–LC uses the well-known Cover-coefficient based Clustering Methodology (C3M) to partition the set of labels into clusters and then employs either the binary relevance or the label powerset method to learn a classifier for each label cluster independently. A test instance is given to each of the classifiers and the label predictions are unioned to obtain a multi-label assignment. The C3M method is especially suited for constructing label clusters since the number of clusters appropriate for a label set as well the initial cluster seeds are automatically computed from the data set. The predictive of MLC–LC is compared with many of the matured and well known multi-label classification techniques on a wide variety of data sets. In all experimental settings, MLC–LC outperformed the other algorithms

    Learning Multi-label Alternating Decision Trees from Texts and Data

    Get PDF
    International audienceMulti-label decision procedures are the target of the supervised learning algorithm we propose in this paper. Multi-label decision procedures map examples to a finite set of labels. Our learning algorithm extends Schapire and Singer?s Adaboost.MH and produces sets of rules that can be viewed as trees like Alternating Decision Trees (invented by Freund and Mason). Experiments show that we take advantage of both performance and readability using boosting techniques as well as tree representations of large set of rules. Moreover, a key feature of our algorithm is the ability to handle heterogenous input data: discrete and continuous values and text data. Keywords boosting - alternating decision trees - text mining - multi-label problem

    Multi-Label Classification Using Noise Reduction Technique

    Get PDF
    In domain of data mining and machine learning, multi-label classification is widely studied research problem. The goal of multi-label classification is to predict the absence or presence certain labels of a particular applications those are associated with different classes. In this paper, IML-Forest method is presentedwith goal of improving the performance of multi-label classification over different types of datasets. IML-Forest is based on existing ML-Forest technique. In this paper the construction of set of hierarchical trees and designed the label transfer mechanism in order to identify multiple relevant labels in hierarchical way is proposedto solve the problem of label dependencies in multi label classification. Basically relevant labels at higher levels of trees capture the more discriminable label concepts; next they will be shifted at lower level nodes. From the hierarchy the relevant labels are further aggregated in order to compute the label dependency and make the classification prediction. The problem with ML-Forest method is that noise considerations not yet addressed as collected multi-label dataset may be noisy and imbalanced. This can degrade the performance of learning and accuracy. Noise reduction method is proposed on multi-label dataset to solve the problem of noisy and imbalanced dataset. In this paper the text noises related to low-level data errors are handled
    • 

    corecore