323,104 research outputs found

    Efficient multi-label classification for evolving data streams

    Get PDF
    Many real world problems involve data which can be considered as multi-label data streams. Efficient methods exist for multi-label classification in non streaming scenarios. However, learning in evolving streaming scenarios is more challenging, as the learners must be able to adapt to change using limited time and memory. This paper proposes a new experimental framework for studying multi-label evolving stream classification, and new efficient methods that combine the best practices in streaming scenarios with the best practices in multi-label classification. We present a Multi-label Hoeffding Tree with multilabel classifiers at the leaves as a base classifier. We obtain fast and accurate methods, that are well suited for this challenging multi-label classification streaming task. Using the new experimental framework, we test our methodology by performing an evaluation study on synthetic and real-world datasets. In comparison to well-known batch multi-label methods, we obtain encouraging results

    Large-Scale Multi-Label Learning with Incomplete Label Assignments

    Full text link
    Multi-label learning deals with the classification problems where each instance can be assigned with multiple labels simultaneously. Conventional multi-label learning approaches mainly focus on exploiting label correlations. It is usually assumed, explicitly or implicitly, that the label sets for training instances are fully labeled without any missing labels. However, in many real-world multi-label datasets, the label assignments for training instances can be incomplete. Some ground-truth labels can be missed by the labeler from the label set. This problem is especially typical when the number instances is very large, and the labeling cost is very high, which makes it almost impossible to get a fully labeled training set. In this paper, we study the problem of large-scale multi-label learning with incomplete label assignments. We propose an approach, called MPU, based upon positive and unlabeled stochastic gradient descent and stacked models. Unlike prior works, our method can effectively and efficiently consider missing labels and label correlations simultaneously, and is very scalable, that has linear time complexities over the size of the data. Extensive experiments on two real-world multi-label datasets show that our MPU model consistently outperform other commonly-used baselines

    Multi-Label Classification Using Higher-Order Label Clusters

    Get PDF
    Multi-label classification (MLC) is one of the major classification approaches in the context of data mining where each instance in the dataset is annotated with a set of labels. The nature of multiple labels associated with one instance often demands higher computational power compared to conventional single-label classification tasks. A multi-label classification is often simplified by decomposing the task into single-label classification which ignores correlations among labels. Incorporating label correlations into classification task can be hard since correlations may be missing, or may exist among a pair or a large subset of labels. In this study, a novel MLC approach is introduced called Multi-Label Classification with Label Clusters (MLC–LC), which incorporates label correlations into a multi-label learning task using label clusters. MLC–LC uses the well-known Cover-coefficient based Clustering Methodology (C3M) to partition the set of labels into clusters and then employs either the binary relevance or the label powerset method to learn a classifier for each label cluster independently. A test instance is given to each of the classifiers and the label predictions are unioned to obtain a multi-label assignment. The C3M method is especially suited for constructing label clusters since the number of clusters appropriate for a label set as well the initial cluster seeds are automatically computed from the data set. The predictive of MLC–LC is compared with many of the matured and well known multi-label classification techniques on a wide variety of data sets. In all experimental settings, MLC–LC outperformed the other algorithms

    Multi-label Node Classification On Graph-Structured Data

    Full text link
    Graph Neural Networks (GNNs) have shown state-of-the-art improvements in node classification tasks on graphs. While these improvements have been largely demonstrated in a multi-class classification scenario, a more general and realistic scenario in which each node could have multiple labels has so far received little attention. The first challenge in conducting focused studies on multi-label node classification is the limited number of publicly available multi-label graph datasets. Therefore, as our first contribution, we collect and release three real-world biological datasets and develop a multi-label graph generator to generate datasets with tunable properties. While high label similarity (high homophily) is usually attributed to the success of GNNs, we argue that a multi-label scenario does not follow the usual semantics of homophily and heterophily so far defined for a multi-class scenario. As our second contribution, besides defining homophily for the multi-label scenario, we develop a new approach that dynamically fuses the feature and label correlation information to learn label-informed representations. Finally, we perform a large-scale comparative study with 1010 methods and 99 datasets which also showcase the effectiveness of our approach. We release our benchmark at \url{https://anonymous.4open.science/r/LFLF-5D8C/}
    • 

    corecore